Scientific Computation ECS 231
Popular in Course
Popular in Engineering Computer Science
This 44 page Class Notes was uploaded by Ashleigh Dare on Tuesday September 8, 2015. The Class Notes belongs to ECS 231 at University of California - Davis taught by Zhaojun Bai in Fall. Since its upload, it has received 78 views. For similar materials see /class/187717/ecs-231-university-of-california-davis in Engineering Computer Science at University of California - Davis.
Reviews for Scientific Computation
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/08/15
ECS231 Handout 5 April 9 2009 Vector and Matrix Norms 1 A vector norm on C is a mapping that maps each x E C to a real number satisfying a gt 0 for z 7 O7 and 0 positive de nite property 10 HazH 04 for 04 E C absolute homogeneity e HwyH M M triangle inequality 2 Vector p norms let x 1727 7znT7 then a vector p norm7 1 g p g 007 is de ned as d f n 117 e Her EM i1 3 Commonly used vector p norms 7L Z Manhattan or taxi cab77 norm i1 n 12 Euclidean length i1 HmHoo gagle 4 The geometry of the closed unit ball z 6 C2 Hme g 1 for p 172700 5 Norm equivalence Let Ha and Hg be any two vector norms There are constants 0102 gt 0 such that 01H Ha S H H19 S 02H Ha For examples7 it can be easily proved that HzHl S x Hszy M2 S M17 M1 S nHzHooy HmHoo S M17 M2 S x HzHooy Moo S M2 6 Cauchy Schwarz inequality H 90 9 S HmHzHsz ln general7 we have Holder inequality H lm 34 g Hmllpllyllm for1 p7qltooand1 A matrix norm on Cm is a mapping that maps each A E Cm to a real number HAM satisfying 1 a gt 0 for A 7 O and 0 positive de nite property b HaAH a for 04 E C absolute homogeneity c HA BH S triangle inequality 00 The F robenius Norm for A aij E Cm m n 12 llAllF 13 2 trAHA i1j1 The induced operator norm Any vector norm that is generic induced a matrix norm on Cm denoted by the same notation to A MAN d f max ll 3 max HAzH 0 HxH HwH1 Then it can be veri ed that so de ned is indeed a norm on Cm 10 Useful property g Therefore is the maximal factor by which A can strech a vector 11 HAHl HAHZ and HAHOO are frequently used induced norm induced by the vector 1 2 and oo norms respectively m HAHl lmjagxn lam max absolute column sum 1 HAHZ xthe largest eigenvalue of AA the largest singular value of A 71 HAHOQ max lam max absolute row sum lglgmj1 12 MAE HAHlHAHoo 1 to DJ Frequently used matrix factorizations LU decomposition Gaussian Elimination If A is nonsingular7 then there exist permutations P7 a unit lower triangular matrix L7 and a nonsingular upper triangular matrix U such that PA LU Special cases a Cholesky decomposition A matrix A is symmetric positive de nite if and only if there exists a unique nonsingular upper triangular matrix R7 with posi tive diagonal entries7 such that A RTE b LDLT factorization lf AT A is nonsingular7 then there exists a permu tation P7 a unit lower triangular matrix L7 and a block diagonal matrix D with 1 by 1 and 2 by 2 blocks such that PAPT LDLT Applications 0 Solve Am b 0 Compute detA QR decomposition Gram Schmidt orthogonalization Let A be m by n with m 2 n Suppose that A has full column rank Then there exist a unique m by n orthogonal matrix Q QTQ I and a unique n by n upper triangular matrix R with positive diagonal m gt 0 such that A QR Applications 0 Find an orthonormal basis of the subspace spanned by the columns of A o Solve the linear least squares problem minz llAm 7 bllg Singular Value Decomposition SVD Let A be an m by n matrix with m 2 n Then we can write A UEVT where U is m by m orthogonal matrix UTU 1m and V is n by n orthogonal matrix VTV In7 and E diagz71z72an7 where 01 2 02 2 2 an 2 0 If m lt n7 the SVD can be de ned by considering AT The columns u17u27 Wu of U are called left singular uectors of A The columns 121122 712 of V are called right singular uectors The 01 02 are called singular ualues vo39n Applications 0 Suppose that A is m by n with m 2 n and has full rank with A UEVT being A s SVD Then the pseudo inverse can also be written as AT 2 ATA 1AT VE lUT If m lt n then AT ATAAT 1 0 Suppose that 01203922ZO39Tgt039T1039LO7 Then the rank of A is r The range space of A is spanu1 U2 u and the null space of A is spanvr1 UT vn HAHZ 013 amax o LetAbemxnwithmZn Then a eigenvalues of ATA are 02 i 1 2 n The corresponding eigen vectors are the right singular vectors 12 i 1 2 n b eigenvalues of AAT are 0 i 1 2 n and m 7 n zeros The left singular vectors m i 1 2 n are corresponding eigenvectors for the eigenvalues 0 One can take any m 7 71 other orthogonal vectors that are orthogonal to u1u2 un as the eigenvectors for the eigenvalues 0 0 Principal components The SVD of A can be rewritten as AE1E2Ep where p minm n and Ek is a rank one matrix of the form Ek mum Ek are referred to as component matrices and are orthogonal to each other in the sense that EjEkT0 ja k Since Q the contribution each Ek makes to reproduce A is determined by the size of the singular value 0k 0 Optimal rank k approximation min HAi BHZ HAi Akllz 0k17 B m x n rankB k where AkU2kVT7E1E2Ek and Elk diag0102 0k 0 0 Data compzesslon Note that me opumal mka appxoxlmauon Aw can be wmcen m a compact fozm as Ale 031ch Wheze Uh and Vk axe me zsc k columns of U and V zespectwely 20 k k nk d1aglt0101 010 Thezefoze Ah ls zepzesented by m m n 1k elements In conuasc A15 zepzesenced by m n elemean mn1k compzesslon who Mahlab s Mrscupts gtgt load clownmat gt mgtnslzeX gtgt f1gurelt1 gtgt eolormapltmap gtgt 1magX gt UgtSgtVsvdX gtgt k gtgt x20 Ugt1ks1kgt1kvgt1k gtgt 1gure gt gtgt mageltx2o gt eompressmn uo mmmm memes m m 150 mm 250 Jon m 4 Schur decomposition Let A be of order n Then there is an n x n unitary matrix U UHU I such that A UTUH where T is upper triangular By appropriate choice of U7 the eigenvalues of A7 which are the diagonal elements of T7 may be made to appear in any order Applications and soannd 39 J r quot39 0 Compute A XAX l 0 Compute matrix function fA ECS231 Handout 9 May 19 2009 1 In the convergence analysis of CG and GMRES algorithms we saw that the convergence rate of the CG depends on the condition number of A or more generally the distribution of As eigenvalues Other Krylov subspace methods have the similar property Preconditioning means replacing the system Am b with the modi ed systems M lAz M lb 1 or AM li b m M li 2 These are referred to as left and right preconditioning respectively If M is SPD then one can precondition symmetrically and solve the modi ed linear system L lAL Ty L lb m L Ty 3 where M LLT The matrix L could be the Cholesky factor of M or any other matrix satisfying M LLT The desired preconditioner M should be chosen so that a M lA or L lAL T is well conditioned or approximates the identity matrix b linear systems with coef cient matrix M are easy to solve A careful problem dependent choice of M can often make the condition number of the mod i ed system much smaller than the condition number of the original one and thus accelerate convergence dramatically Indeed a good preconditioner is often necessary for an iterative method to converge at all and much current research in iterative methods is directed at nding better preconditioners More speci cally a good preconditioner M depends on the iterative method being used c For CG and related methods one would like the condition number of the symmetrically preconditioned matrix L lAL T to be close to one in order for the error bound based on the Chebyshev polynomial to be small or alternatively has few extreme eigenvalues o For GMRES a preconditioned matrix that is close to normal and whose eigenvalues are tightly clustered around some point away from the origin would be good but other properties might also suf ce to de ne a good preconditioner 2 Preconditioned Conjugate Gradient If the CG algorithm is applied directly to the symmetric preconditioned system 3 the iterative kernels satisfy yj1 yj5m3j 7241 fji jLilAL T j 13m fj15 3j with ATA AT A TT39 A T39 T39 Ali 7 7 397 71 71 0 7 ATLilALiTA39 and B ATA39 Pj P7 737 De ning 7T A T A 90739 L 3473 77 Luv Pi L PJ39 The iterative kernels become 95m 90739 6Vij n1 U O jAPJ39 71 Pj1 M n1 5ij with T 1 T 1 Ti M 77 THlM 7741 04739 7134 and 3739 T 71 29 19739 7 M M We obtained the following preconditioned CG algorithm for solving Am b using the precon ditioner M LLT PRECONDITIONED CONJUGATE GRADIENT PCG 1 compute r0 b 7 Azo7 solve M20 To and p0 20 2 for j O7 17 27 7 until convergence do 3 W T21 pprj 4 90m 90739 am 5 Tj1 Tj 7 OtjApj 6 solve M2j1 Tj1 7 3739 T12j1TZj 8 Pj1 Zj1 5ij 9 endfor 3 Preconditioned GMRES The GMRES applies to the modi ed system 1 is straightforward PRECONDITIONED GMRES 1 compute r0 M 1b 7 Am B HTng and 121 7133 2 forj012mdo 3 compute w M lAvj 4 fori12jdo 5 hij Ul w 6 w w 7 hijwi 7 end do 8 compute hj1j Hsz and Uj1 whj 39 9 end do 10 let ym be the solution of miny H el 7 12 If satis ed7 Stop7 else set mo mm and GOTO 1 Note that in the above algorithm7 V 121122 vm and gm is a m 1 x m upper triangular matrix with the entries hij computed at steps 4 and 4 U Preconditioning Techniques The reliability and robustness of iterative techniques when dealing with various applications often depends much more on the quality of the preconditioner than on the particular Krylov subspace methods used Finding a good preconditioner to solve a given sparse linear system is oftne viewed as a combination of art and science Preconditioners can be divided roughly into three categories I Preconditioners designed for general classes of matrices eg Jacobi Gauss Seidel SOR incomplete LU factorization incomplete Cholesky decomposition approximate inverse gt gt Preconditioners designed for broad classes of underlying problems eg elliptic partial differential equations such as Poisson equation Examples are multigrid and domain decomposition preconditioners l gt gt Preconditioners designed for a speci c matrix or underlying problem eg for the trans port equation lLU Factorization Preconditioners Except for diagonal matrices the solution of the linear system with coef cient matrix M requires that we have a suitable decomposition of M In many instances this will be an LU decomposition The idea of an incomplete LU preconditioner is to perform an abbreviated sparse form of Gaussian elimination of A and to declare the production of the resulting factors to be M Since M is by construction already factorized system involving M will be easy to solve Let us rst introduce a sparsity set Z to control the patterns of zeros Speci cally let Z be a set of ordered pairs of integers from 1 2 n containing no pairs of the form i An incomplete LU factorization of A is a decomposition of the form ALUE 4 where L is unit lower triangular and U is upper triangular and L U and E have the following properties a If ij E Z with i gt j then 6 O b If 2 j E Z with 2 lt j then uij 0 c If ij Z then 57 0 In other words the elements of L and U are zero on the sparsity set Z and off the sparsity set the decomposition reproduces A It is instructive to consider two extreme cases 1 If the sparsity Z set is empty we get the LU decomposition of A ie we are using A as a preconditioner 2 If Z is everything except diagonal pairs of the form Li then we are effectively using the diagonal of A as a preconditioner Let us consider an lLU algorithm to generate L and U rowwise Suppose we have computed the rst k 7 1 rows of L and U and we wish to compute the kth row Write the rst k rows of 4 in the form In akk llk 1 0 Wk ekl 5 3 we need to compute llTk and Multiplying out we nd that liTkUn 551 151 5 and ng 55k agk liTkUik We then can solve these two systems in order 51m Q27 wgk ih Vim Vkk17 397Vkn39 ll 5k Suppose that we have computed k1 k2 kj1 lf kj E Z then set 6 0 If kj Z then 5k O and the equation 5 gives k4 akj Z ZkiVij Kiwi777 i1 from which we get 0 Eli13 Zkilij V11 39 The key observation here is that it does not matter how the values of the preceding 6 s and us were determined lf 6 is de ned in this way then when we compute LU its kj element will be 04 Thus we set is and us to zero on the sparsity set without interfering with the values of LU off the sparsity set A similar procedure applies to the determination of 111 Vkk1 lk 5M INcoMPLETELUFACTORIZATIONA Z 1 for k 1 to n 2 forj1tok71 3 if kj E Z 4 Lkj 5 else 6 mm WM 7 Law 471M Ult1j71jgtgtUltjjgt 7 end if 8 end forj 9 for j k to n 10 if kj E Z 11 Ukj 0 12 else 13 UkjAkj7Lk1k71U1k71j 14 end if 15 end forj 16 end for k The algorithm can be carried to completion provided the quantities Uj j are all nonzero in which case the decomposition is unique Whether or not the Uj j are nonzero will depend on the matrix in question For the following two classes of matrices the algorithm always works a If A is nonsingular diagonally dominant matrix7 then A has an incomplete LU factoriza tion for any sparsity set Z Note A matrix A of order n is diagonally dominant if TL 412 Z laij 7 fori12n 7514344 It is strictly diagonally dominant if strictly inequality holds for all j It can be shown that A strictly diagonally dominant matrix is nonsingular Be aware that diagonal dominance alone does not imply either nonsingularity or singularity For examples7 let 2 71 0 1 1 0 A 71 2 71 7 B 1 1 71 0 71 2 1 2 4 Then A is nonsingular On the other hand7 B is singular The incomplete LU factorization also exists for any M matrix Note A matrix is said to be an M matrix if it satis es the following properties A T V 1 aiigt0fori1n7 2 aij ofOrl7395j7l7j17n7 3 A is nonsingular and 4 A 1 is a nonnegative matrix all entries are nonnegative 6 The following gure shows the LU factorization of a sparse 20 by 20 matrix matrix A matrix L quotdense LUquot n n v II I I I I v II I II I I I II n n a o 5 o o u o o I I I I I I I I 10 o n I I II I I I II I I I I I 15 I t o a I I I I I I I I I I I I II 20 o o 0 5 10 15 20 nz72 matrix U quotdense LUquot n V 00 o a o I o 00 u no u u I II I I I I 5 I I 10 15 20 Then an lLU factorization and E factor of the same matrix matrix A matrix L quotsparse LUquot I II I I 5 I I I I I I 10 I I I I II I I II I I I 15 I I I I I I I I I I I I I I I 20 I I 0 5 10 15 20 0 5 10 15 20 nz72 nz42 matrix U quotsparse LUquot matrix A LU A II I I I I V I I II I I I II I I I I 5 I I II 5 I I I I 0 I 10 0 10 I I I I 15 I I 15 I I 0 20 20 0 5 10 15 20 0 5 10 15 20 nZ50 244 7 Block preconditioner is a popular technique for block tridiagonal matrices arising from the 00 discretization of elliptic problems such as Poisson s equation It can be also be generalized to other sparse matrices For example the matrix arises in the solution of 2D Poisson s equation has the form T 7 7 T 7 7 T 7 7 T where T is a symmetric tridiagonal matrix with diagonal entres all 4 and off diagonal entries all 71 In this case a natural preconditioner is M diagTTT The following gure shows the convergence history of GMRES with and without precon ditioning for solVing a linear system of equations arising from a discretization of a model convection diffusion equation The preconditioner used here is lLU0 ie lLU factorization with the same sparsity pattern of A GMRES xqu P GMRES me nu restarting i i i lag of residual norm 9 Iterative methods in Matlab functions methods pcg Preconditioned Conjugate Gradients Method gmres Generalized Minimum Residual Method bicg BiConjugate Gradients Method bicgstab BiConjugate Gradients Stabilized Method Cgs Conjugate Gradients Squared Method minres Minimum Residual Method qmr Quasi Minimal Residual Method symmlq Symmetric LQ Method Preconditioners functions l preconditioners luinc Incomplete LU factorization choline Incomplete Cholesky factorization 10 Further Reading 0 Yousef Saad7 Iterative Methods for Sparse Linear Systems7 2nd Edition7 SIAM7 2003 o A Greenbaum7 Iterative Methods for Solving Linear Systems SIAM7 1997 o H van der Vorst7 Iterative Krylov Methods for Large Linear Systems7 Cambridge Univ Press7 2003 o R Barrett et al7 Templates for the Solution of Linear Systems Building Blocks for Iterative Methods7 SIAM7 1994 linked to our class website ECS231 Handout 3 Rounding Error Analysis April 2 2009 1 to Let i and 37 be the oating point numbers and that iz17391 and 3737173927 forT Tltlt1 where 739 could be the relative errors in the process of collectinggetting the data from the original source or the previous operations Question how do the four basic arithmetic operations behave Addition and subtraction 1 H MHz it Wt W E z17 116y1739216 m 37 1 M71 1 6 OTE 377392 6 OTE zylt1y yT26OT6 E m y1 6 where 6 can be bounded as follows M 196 24 w Aw 7391 1 6 OTE 8 ltT eon Three possible cases 0 If x and 37 have the same sign7 ie7 my gt 07 then z y this implies A 1 161 T e 076 ltlt 1 Thus i 37 approximates z 37 well 0 If x x 737 z 37 z 07 then y gt 1 this implies that 6 could be nearly or much bigger than 1 Thus i 37 may turn out to have nothing to do with the true z 37 This is so called catastrophic cancellation which happens when a oating point number is subtracted from another nearly equal oating point number Cancellation causes relative errors or uncertainties already presented in i and 37 to be magm ed o In general7 if 37 is not too big7 i 37 provides a good approximation to z 37 3 Multiplication and Division are very well behaved i gtk i X 6 371 73911 73921 6 E zy1 6X7 39319 iZJW t 5 y1 T11 T2711 5 E 191 597 6X Tl7392607396 8T17T26OTE Thus g 2739 e 07396 and g 2739 e 07396 1 4 01 Examples of catastrophic cancellation EXAMPLE 1 Computing xn 7 straightforward causes substantial loss of signi cant digits for large n n WW mm mm 7 we 999994416721 16518e06 100e10 1 1 4 100e11 310 I 3 1 100e12 1 1 393 100e13 310 I 310 I 1b25b91 I 1 1 1 393 1 3 3 1 1 1 1 0 00e14 00e15 00e16 10 I I lbbUibiSJl517e07 quot quot r Catastrophic cancellation can sometimes be avoided if a formula is properly reformulated In the present case7 one can compute xn 7 almost to full precision by using the equality 1 W W W39 Consequently7 the computed results are 1 1Vn1m 100e10 4 1 00e11 1 581138830080237e06 1 00e12 4 999999999998749407 1 00e13 1 581138830084150eio7 1 00e14 4 999999999999987e08 100e15 1 581138830084189e08 1 00e16 5 000000000000000e09 ln fact7 one can show that 1xn him771 xn 1777167 where 6 55 052 try it EXAMPLE 2 Consider the function licosm ms m2 xQ Note that 0 g fzlt12 for all as 7 0 Compare the computed values for z 12 x 10 5 using the above two expres sions assume that the value of cosz rounded to 10 signi cant gures Forward and backward error analysis We illustrate the basic idea through a simple example Consider the compu tation of an inner product of two vector m y E R3 d f QETZJ 5 95151 95232 953337 assuming already as and yj s are oating point numbers It is likely that z g is computed in the following order 390Ty 1y1 2y2 3903y3 Adopting the oating point arithmetic model we have mTy m1y11 61 z2y21 62 z3y31 63 m1y11 61 z2y21 621 51 z3y31 63 1911 61 902242lt1 621 51 903243lt1 631 52 z1y11 ei1 611 62 z2y21 621 611 62 z3y31 31 52 where g e and 1671 e Now there are two ways to interpret the errors in the computed zTy a We have 395Ty QETy E where E m1y161 61 62 2y262 61 62 3243ltES 1 62 1 052 It implies that 1 3 1E1 54312011111 3l2y21 21z3y31gt 08 lt eelmlT1y1 08 This bound on E tells the worst case difference between the exact z y and its computed value Such an error analysis is so called Forward Error Analysis We can also write A T V WTy iT 90 A96Ty A24 where1 931 11617 221 y1151152 E y11 17 i2 21627 292 y2151152 E y21 27 is 31637 233 y3162 E y3163 It can be seen that g e 1 062 and g e This says the computed value 11mTy is the exact inner product of a slightly perturbed i and 37 Such an error analysis is so called Backward Error Analysis 6 Further Reading A classical book on error analysis where the notion of backward error analysis is invented is JH Wilkinson Rounding Errors in Algebraic Process Prentice Hall Englewood NJ 1964 Reprinted by Dover New York 1994 A contemporary treatment of error analysis and its applications to numerical analysis is NJ Higham Accuracy and stability of Numerical Algorithms second edition SIAM Philadelphia 2002 1There are many ways to distribute factors 1 5239 and 1 6139 t0 xi and yj In this case it is even possible to make either i E x or 1 E y ECS231 Handout 10 May 21 2009 Basic concepts 1 Let A 6 CW a A scalar A is an eigenvalue of an n x n A and a nonzero vector x E C is a corresponding right eigenvector if Ax Az 14m d5f z Am Ax is an eigenspace of A corresponding to the eigenvalue A b A nonzero vector y such that is a left eigenvector c The set AA of all eigenvalues of A is called the spectrum of A d pAA d5f detAI 7 A7 a polynomial of degree n is called characteristic polynomial of A The following is a list of properties straightforwardly from the de nition a A is As eigenvalue Q A 7 A is singular Q detAI 7 A 0 Q pAA O b There is at least one eigenvector z associated with As eigenvalue A in the other word7 the dimension dim A 2 1 c 14m is a sulospace7 ie7 it has the following two properties 1 z 6 EA gt 04 6 EA for all 04 E C 2 172 6 EA Q 1 m2 6 EA d Suppose A is real A is As eigenvalue Q conjugate A is also A s eigenvalue e A is singular Q 0 is As eigenvalue f If A is upper or lower triangular7 then its eigenvalues consist of its diagonal entries 2 A E Cmm is simple if it has n linearly independent eigenvectors otherwise it is defective Examples a I and any diagonal matrices is simple e17 e27 7en are n linearly independent eigen vectors b i g gt is simple It has two different eigenvalues 71 and 5 By the fact that each eigenvalue corresponds to at least one eigenvector7 it must have 2 linearly independent eigenvectors c If A E Cmm has it different eigenvalues7 then A is simple d lt 3 i gt is defective It has two repeated eigenvalues 27 but only one eigenvector 51 1 0T 03 U a 1 00 to Let 1 2 vectors then An be the eigenvalues of A and m1 m2 mn be a set of corresponding eigen AX XA zn and A diag1 A2 An If A is simple namely the eigenvectors are linearly independent then X 1 exists and where X 1 2 A XAX l This is known as the eigenvalue decomposition of the matrix A An invariant subspace of A is a subspace V of R with the property that U E V implies that AU 6 V We also write this as AV Q V Examples 1 The simplest one dimensional invariant subspace is the set spanm of all scalar multiples of an eigenvector m 2 Let m1 m2 mm be any set of independent eigenvectors with eigenvalues 1 A2 Then X spanm1 m2 is an invariant subspace 7m Let A be n by n let V 121122 vm be any n by m matrix with linearly independent columns and let V spanV the m dimensional space spanned by the columns of V Then V is an invariant subspace if and only if there is an m by m matrix B such that AVVB In this case the m eigenvalues of B are also eigenvalues of A Similarity transformations n x n matrices A and B are similar if there is an n x n non singular matrix P such that B P lAP We also say A is similar to B and likewise B is similar to A P is a similarity transformation A is unitarily similar to B if P is unitary Suppose that A and B are similar B P lAP a A and B have the same eigenvalues In fact pA E pB b Ax Am BP 1z P 1m c Bu w APw Pw Schur decomposition Let A be of order n Then there is an nxn unitary matrix U UHU I such that A UTUH where T is upper triangular By appropriate choice of U the eigenvalues of A which are the diagonal elements of T may be made to appear in any order Real Schur Decomposition If A is real there is an orthogonal matrix Q such that A QTQT where T is block triangular with 1 x 1 and 2 x 2 blocks on its diagonal The 1 x 1 blocks contain the real eigenvalues of A and the eigenvalues of the 2 x 2 blocks are pairs of complex conjugate eigenvalues The Power Method and Inverse Iteration 1 The Power method is based on the following simple analysis Assume that A XAX 1 with X m17z27 7 and A diag17 A2 7 A and eigenvalues j are ordered such that lAll gt lAZl 2 2 Then we can show that 7 AW L 4 a 7quot HAWSH H M as J H 0039 b 0 uJHAuj 7 1 as j 7 00 c Convergence rate depends on 2 Pseudocode Given an initial vector uo for j 17 27 until convergence w 7 Au 1 uj wHng approximate eigenvector 07 Auj approximate eigenvalue 3 Example Let 7261 209 749 A 7530 422 798 7800 631 7144 Then A A1 A2 A3 1047 3 Let uo 51 by the power method7 we have 01199449 130606 1007191 100002 The drawback of the power method is that if if is close to 17 then the PM could be very slow convergent or doesn t converge at all F lnverse iteration a overcome the drawbacks of the power method slow convergence b nd an eigenvalue closest to a particular given number a referred to as a shift 0 Spectral transformation if is an eigenvalue of A7 then 1 7 a is an eigenvalue of A 7 a 2 is an eigenvalue of A 7 a 1 Shift and invert spectral transformation 6 T 00 11ambdasigma a m Pseudo code of the inverse iteration Given an initial vector uo and a shift a for j 17 27 until convergence w A 7 071uj71 uj W approximate eigenvector 7 7 AUJ39 p 7 u approximate eigenvalue Approximate eigenvalue 0 of A is given by 0739 a Assume k is the eigenvalue cloest to the shift a It can be shown that a uj converges to where 5k Sek j 7 00 b 07 converges to k as j 7 00 A 7 c Convergence rate depends on maxi7 The advantage of inverse iteration over the power method is the ability to converge to any desired eigenvalue the one nearest to the shift a By choosing a very close to a desired eigenvalue7 the method converges very quickly and thus not be as limited by the proximity of nearby eigenvalues as is the original power method The method is particularly effective when we have a good approximation to an eigenvalue and want only its corresponding eigenvector The inverse iteration is expensive in general It requires solving A 7 0w jj for u One sparse LU factorization of A 7 a is required7 which could be very expensive in memory requirements In addition7 the same as the power method7 inverse iteration can only compute one eigenpair Framework of Subspace Projection Methods 1 to 03 Rayleight Ritz procedure is a framework of the orthogonal projection methods for solving large scale eigenvalue problems Let A be an n x 71 real matrix and C be an m dimensional subspace of R An orthogonal projection technique seeks an approximate eigenpair Xe with X e C and a 6 K by imposing the following so called Galerkin condition A 7 Ra T K 1 or equivalently UTA Lft0 V1 6K 2 397Um Vy Then equation 2 becomes To translate this into a matrix problem assume that an orthonormal basis 121122 of 1C is available Denote V 121122 vm and let 71 VTAVy 7 SW 0 Therefore y and must satisfy Bmy Ag 3 with Bm VHAV The eigenvalues of Bm are called Ritz value values and the vectors Vyl are called Ritz vector This procedure is known as the Rayleigh Ritz procedure RAYLEIGH RITZ PROCEDURE a Compute an orthonormal basis rah13quot ofthe subspace IC Let V 121122 vm b Compute Bm VTAV c Compute the eigenvalues of Bm and select the k desired ones Xhi 1 k where k g m d Compute the eigenvectors yi of Bm associated with e return A 71 Vyi as approximate eigenvectors of A The numerical solution of the m x m eigenvalue problem in steps c and d can be treated by standard algorithms for solving small dense eigenvalue problems An important note is that in step d one can replace eigenvectors by Schur vectors to get approximate Schur vectors 71 instead of approximate eigenvectors Schur vectors y can be obtained in a numerically stable way and in general eigenvectors are more sensitive to rounding errors than are Schur vectors Optimality Consider the case where A is real and symmetric Let Q sz QM be any n by n orthogonal matrix where Qk is n by k and Q is n by n 7 Let T T TQMQ Q NQACWkQMj Li QEAQk QEAQu When k 1 Tk is just called the Rayleigh quotient So far k gt 1 Tk is called a generalization of the Rayleigh quotient Tu k Tu o The Rayleigh Ritz procedure is to approximate the eigenvalues of A by the eigenvalues of Tk ogAQk 0 These approximations are called the Ritz ualues 0 Let Tk VAVT be the eigendecomposition of Tk The corresponding eigenvector ap proximations are the columns of QkV and are called Ritz uectors The Ritz values and Ritz vectors are considered optimal approximations to the eigenvalues and eigenvectors of A as justi ed by the following theorem Theorem 1 The minimum of 7 QkRHZ ouer all k by k symmetric matrices R is at tained by R Tk in which case 7 QkaHZ HTkuHZ PROOF Let R Tk Z to proof the theorem we just want to show that HAQk 7 QkRHZ is minimized when Z O This is shown by the following sequence of derivation HAch 7 Qlel 7 Am 14ch 7 QkRgtTltAQk 7 om 7 Arm AQk 7 QM ZgtgtTAc2k 7 QM W 7 AW mm 7 QkaTAQk 7 am 7 Am 7 QkaVwkz 7ltkagtTltAQk 7 am ltka Wm Am Aok 7 QkaTAQk 7 om 7 ongk 7 mz ZTQ AQ1 7 Tk ZTZ Am 14 7 QkaTAQk 7 Qka ZTZ 2 AW Aok 7 QkaTAQk 7 am 7 HAch 7 QkaH Furthermore it is easy to compute the minimum value llAQk QkaHz WC ka QuTku 7 QkaHz HQuTkuHZ HTkqu I Corollary 1 Let Tk YAYT be the eigendecomposition of Tk The minimum of HAPk 7 PkDH ouer all n by k orthogonal matrices Pk where spanPk spanQk and ouer all diag onal D is also HTkuHZ and is attained by Pk QkY and D A PROOF If we replace Qk with QkU in the above proof where U is another orthogonal matrix then the columns of Qk and QkU span the same space and llAQk 7 Qle HAQkU 7 QkRUHz HAQkU 7 QkUUTRUH2 These quantities are still minimized when R Tk and by choosing U Y so that UTTkU is diagonal I The Lanczos Algorithm 1 The Lanczos algorithm combines the Lanczos process for building a Krylov subspace with the Raleigh Ritz procedure for for nding a few eigenpairs of a symmetric matrix A First let us recall that the Lanczos process will generate an orthonormal basis of a Krylov subspace ICkA v d5f spanv Av Ak 1v spanq1q2 qk and yield a fundamental relation AQk C ka fee fk qu1 4 where Tk QgAQk tridiag j 047 3H1 Let a be an eigenvalue of Tk and y be a corre sponding eigenvector y ie Tkv Ma Mb 1 Apply y to the right of 4 to get AQky va Mefv way Magm Ia are Ritz values and Qky are Ritz vectors 2 Convergence o If fke y 0 for some k then the associated Ritz value a is an eigenvalue of A with the corresponding eigenvector Qky In general it is unlikely that fke y 0 but we hope that the residual norm kae yll2 may be small and when this happens we expect that a is going to be a good approximate to As eigenvalue Indeed we have Lemma 1 Let H be real symmetric and H2 7 Hz r and z 7 0 Then A7 lt Ag Pl 7 HrllzHzllz PROOF Let H UAUT be the eigen decomposition of H Then Hz 7 Iaz r yields H 7 MHz r UA7 MDUTz r a A7 MIUT2 UTr Notice that A 7 Ia is diagonal Thus UT A7IUT gt 7 UT 7 Hrllz H rllz H M 2H2Ag11r1l NH 2H2 gig Mlllzllz as expected I The following corollary is a consequence of above Lemma 1 Corollary 2 There is an eigenvalue of A such that l Ml S kaltefvgtll2 l kl lefvl 3 F In summary we have the following Lanczos algorithm in the simplest form LANCZOS ALGORITHM for nding eigenvalues and eigenvectors of A AT ql Ullvlla 50 0 qo 0 forj1tokdo w qu a qfw w w i am 57461171 5739 lelz if B O quit qj1 wBj Compute eigenvalues and eigenvectors of T 0 Test for convergence 11 EndDo 004 UIHgtOJMH H Caveat All the discussion in this lecture is under the assumption of exact arithmetic In the presence of nite precision arithmetic the numerical behaviors of the Lanczos algorithm could be signi cantly different For example in nite precision arithmetic the orthogonality of the computed Lanczos vectors qj is lost when j is as small as 10 or 20 The simplest remedy and also the most expensive one is to implement the the full reorthogonalization namely after the step 5 do 73971 w w 7 2quiqi i1 This is called the Lanczos algorithm with full reorthogonalization Sometimes it may be needed to execute twice A more elaborate scheme necessary when convergence is slow and several eigenvalues are sought is to use the selective orthogonalization Example We illustrate the Lanczos algorithm by a running an example a 1000 by 1000 diagonal matrix A most of whose eigenvalues were chosen randomly from a normal Gaussian distribution To make the plot easy to understand we have also sorted the diagonal entries of A from largest to smallest so A am with the corresponding eigenvector 51 There are a few extreme eigenvalues and the rest cluster near the center of the spectrum The starting Lanczos vector v has all equal entries There is no loss in generality in experimenting with a diagonal matrix since running the Lanczos algorithm on A with starting vector ql UH UHZ is equivalent to running the Lanczos algorithm on QTAQ with starting vector Qqu The following gure illustrates convergence of the Lanczos algorithm for computing the eigen values of A In this gure the eigenvalues of each Tk are shown plotted in column k for k 1 23 30 with the eigenvalues of A plotted in an extra column at the rightmost column The column k has k 77 s one marking each eigenvalues of Tk U Ememalues We observe that 0 Extreme eigenvalues ie the largest and smallest ones converge rst and the interior eigenvalues converge last 0 Convergence is monotonic with the ith largest smallest eigenvalues of Tk increasing decreasing to the ith laregst smallest eigenvalue of A provided that the Lanczos algorithm does not stop prematurely with some Bk 0 An excellent reference to study the observation in theory is the book by B N Parlett The Symmetric Eigenvalue Problem reprinted by SIAM 1998 The Arnoldi Algorithm 1 to As we know the simplest eigenvalue problem is to compute just the largest eigenvalue in absolute value along with its eigenvector The power method is the simplest algorithm suitable for this task Starting with a given me k iterations of the power method produce a sequence of vectors 012 zk It is easy to see that these vectors span a Krylou Subspace Span m 9017 9027 WM Kk1A7 900 Span07 1490071429007 7Akmol Now rather than taking zk as out approximate eigenvector it is natural to ask for the best approximate eigenvector in ICk1A mo using the Rayleigh Ritz procedure We will see that the best eigenvector and eigenvalue approximations from ICk1A mo are much better than zk alone The Arnoldi algorithm for nding a few eigenpairs of a general matrix A combines the Arnoldi process for building a Krylov subspace with the Raleigh Ritz procedure First let us recall that the Arnoldi process generates an orthonormal basis of a Krylov subspace ICWL v def spam Av Ak1v spanq1q2 m and yield a fundamental relation AQk 2ka hk1kgk15 5 Any decomposition of the form 5 where Hk is Hessenberg Q2ng I and Qquk1 0 is called an Amoldz39 decomposition of length k If Hk is unreduced and hk 7 O the decomposition is uniquely determined by the starting vector v This is commonly called implicit Q Theorem Since Qquk1 0 we have Hk QgAQk Thus Hk is called the generalized Rayleigh Quotient corresponding to Qk eigenvalue of Hk and y be a corresponding eigenvector y ie Let u be an Hky 97 M2 1 Then the corresponding Ritz pair is p Qky Applying y to the right of 5 the residual vector of p Qky is given by AQky MQW hk1qu15 y Using the backward error interpretation we know that p Qky is an exact eigenpair of AE where lhk legyl This gives us a criterion for accepting the Ritz pair p Qky as approximate eigenpair of A 1 1Note that because of nonsymmetry of A we generally do not have the nice forward error estimation as discussed in the Lanczos algorithm for symmetric eigenproblem But a similar error bound involving the condition number of the corresponding eigenvalue exists ARNOLDI S METHOD 1 Choose a starting vector 1 Generate the Arnoldi decomposition of length k by the Arnoldi process Compute the Ritz pairs and decide which are acceptable If necessary increase k and repeat rPPJEO 3 Example We illustrate the above simplest Arnoldi algorithm by a running a 100 by 100 q random sparse matrix A with approximately 1000 normally distributed nonzero entries A sprandn100 100 0 1 All entries of the starting vector v are 1 The following gure illus trates typical convergence behavior of the Arnoldi algorithm for computing the eigenvalues In the gure 77 are the eigenvalues of matrix A computed by eigfullA and the o are the eigenvalues of upper Hessenberg matrix H30 computed by eigH30 Eigenvalues of A in and H3 in 0 i i i i i i a 3 as ea 93 a d o 0 27 J 9 a 7 93 0 e e 17 12 7 E e t C J t b 93 07 ea 0 0 j o 4r 7 9 ea E 9 717 a 7 9 69 ea 0 2 4 tr a 9 a 0 0 6 Qt Q a 3 e A i i i i i i i 7 73 72 1 0 2 3 4 RealPan We observe that Exterior eigenvalues converge rst This is the typical convergence phe nomenon of the Arnoldi algorithm in fact all Krylov subspace based methods There is a general theory for the convergence analysis of the Arnoldi algorithm The Arnoldi algorithm has two nice aspects a The matrix Hk is already in Hessenberg form so that we can immediately apply the QR algorithm to nd its eigenvalues b After we increase k say k p we only have to orthogonalize p vectors to compute the k pth Arnoldi decomposition The work we have done previously is not thrown away Unfortunately the algorithm has its drawbacks o If A is large we cannot increase k inde nitely since Qk requires 71 x k memory locations to store 0 We have little control over which eigenpairs the algorithm nds In a given application we will be interested in a certain set of eigenpairs For example eigenvalues lying near the imaginary axis There is nothing in the algorithm to force desired eigenvectors into the subspace or the discard undesired ones 11 ECSZ31 Spectral Partitioning May 28 2009 Definition of graph partitioning Given a graph G N E WN WE N nodes or vertices E edges 1 4 A WN node weights WE edge weights 5 0 Ex N tasks WN task costs E edge jk taskj sends WEQk words to task k Choose a partition N N1 U N2 U U Np such that The sum of the node weights in each Nj is about the same The sum of all edge weights of edges connecting all different pairs Nj and Nk is minimized 0 Ex balance the work load while minimizing communication Special case of N N1 U N2 graph bisection Applications Telephone network design Original application algorithm due to Kernighan Load balancing while minimizing communication VLSI layout N units on chip E wires WEjk wire length Data mining and clustering Physical mapping of DNA Sparse matrixvector multiplication N 12n jk in E if Ajk nonzero WNj nonzeros in row WEjk 1 Sparse Gaussian elimination Used to reorder rows and columns to increase parallelism decrease fillins S arse matrixvector multi lication Partitionng 3 Sparse Symmetric Matrix Cost of graph partitioning Many possible partitionings to search Sample Graph Partitioning n choose n2 sqrt2npi2n bisection possibilities Choosing optimal partitioning is NPcomplete Only known exact algorithms have cost exponentian Need good heuristics First heuristic regeated gragh bisection To partition N into 2k parts bisect graph recursively k times Henceforth discuss mostly graph bisection Spectral Partition Ins3 L xv r 39r 39 4 Air 5 39 s1 1g 4 58 eutedges Coordinatefree partitioning Several techniques for partitioning BreadthFirst Search simple but not great partition KernighanLin good corrector given reasonable partition Spectral Method good partitions but may be slow Multilevel methods Used to speed up problems that are too largeslow Coarsen partition expand improve Can be used with K L and Spectral methods and others Speedquality For load balancing of grids multilevel K L probably best For other partitioning problems vision clustering etc spectral may be better Good software available Coordinatefree spectral bisection Based on theory of Fiedler 19703 popularized by Pothen Simon Liou 1990 Basic definitions Motivation Implementation via the Lanczos algorithm Basic definitions De nition The incidence matrix nG of a graph GNE is an N by E matrix with one row for each node and one column for each edge If edge eij then column e of nG is zero except for the ith and jth entries which are 1 and 1 respectively Slightly ambiguous definition because multiplying column e of nG by 1 still satisfies the definition but this won t matter De nition The Laplacian matrix LG of a graph GNE is an N by N symmetric matrix with one row and column for each node It is defined by LG ii degree of node I number of incident edges LG ij 1 ifi lj and there is an edge ij LG ij 0 otherwise Example of nG and LG Nudesnnmhemd in black Edges numbered In hhle WM JO IUIAUIUD Incidence and Laplacian Matrices Incidence Matrix 111G 1 2 3 4 1 1 2 1 1 3 1 1 4 1 1 5 12345673910111 1 1 11 1 1 wmaowhuzu Laplacian Matrix LG 1 2 J 4 5 1 1 1 2 1 2 1 3 1 2 1 4 1 2 1 5 1 1 1 2345 6 7 s 9 21 1 131 1 12 1 10 Properties of incidence and Laplacian matrices 1 nG nGT LG This is independent of the signs chosen for each column of nG LG is symmetric This means the eigenvalues of LG are real and its eigenvectors are real and orthogonal Let e 11T ie the column vector of all ones Then LGe0 Suppose LGv WV so that v is an eigenvector and 7 an eigenvalue of LG Then 7nGTV2V2 Xll22k sz 2 vivj2 for all edges eij l2i vi2 The eigenvalues of LG are nonnegative O M lt k2 lt lt kn The number of connected components of G is equal to the number of xi equal to 0 In particular x2 0 if and only if G is connected De nition A2LG is the algebraic connectivity of G11 Spectral bisection algorithm Compute eigenvector V2 corresponding to MLG For each nodej of G wwmlto put nodej in partition N else put nodej in partition N 12 Spectral bisection algorithm cont d Why does this make sense Fiedler s theorem 1 Let G be connected and N and N defined as above Then N is connected If no V2j 0 then N is also connected 2 Let G1NE1 be a subgraph of GNE so that G1 is less connected than G Then x2LG1 lt x2LG ie the algebraic connectivity of G1 is less than or equal to the algebraic connectivity of G 13 Motivation for spectral bisection Vibrating string has modes of vibration or harmonics Modes computable as follows Model string as masses connected by springs a 1D mesh Write down fma for coupled system get matrix A Eigenvalues and eigenvectors of A are frequencies and shapes of modes Label nodes by whether mode or to get N and N Modes of a Vibrating String Lowest Frequency lambda Second Frequency lambda2 Third Frequency lambda3 14 Computing v2 and M of LG using Lanczos Given any nbyn symmetric matrix Asuch as LG Lanczos computes a k byk approximation T by doing k matrixvector products k ltlt n Choose an arbitrary starting vector r W r i0 repeat ii1 qj rlbj1 scale a vector r Aqj matrix vector multiplication the most expensive step r r bj1vj1 saxpy or scalarvector vector aj vj r dot product r r ajvj saxpy bj r compute vector norm until convergence details omitted T a1 b1 b1 a2 M2 0 b2 a3 b3 O bk2 ak1 bk1 bk1 ak Approximate A s eigenvaluesvectors using T s 15 Spectral bisection summary Laplacian matrix represents graph connectivity Second eigenvector gives a graph bisection Roughly equal weights in two parts Weak connection in the graph will be separator Implementation via the Lanczos Algorithm To optimize sparsematrix vector multiply we graph partition To graph partition we find an eigenvector of a matrix associated with the graph To find an eigenvector we do sparsematrix vector multiply Have we made progress The first matrixvector multiplies are slow but use them to learn how to make the rest faster 16 First references M Fiedler Algebraic connectivity ofgraphs Czech Math J 23298305 1993 25619637 1975 A Pothen H Simon and KP Liou Partitioning sparse matrices with eigenvectors of graphs SIAM J Mat Anal Appl 114304521990 J Shi and J Malik Normalized cuts and image segmentation IEEE Trans On Pattern Analysis and Machine Intelligence 228889082000 17
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'