### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Honors Linear Algebra MATH H110

GPA 3.93

### View Full Document

## 21

## 0

## Popular in Course

## Popular in Mathematics (M)

This 79 page Class Notes was uploaded by Kavon Feest on Thursday October 22, 2015. The Class Notes belongs to MATH H110 at University of California - Berkeley taught by Staff in Fall. Since its upload, it has received 21 views. For similar materials see /class/226607/math-h110-university-of-california-berkeley in Mathematics (M) at University of California - Berkeley.

## Reviews for Honors Linear Algebra

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/22/15

Math H110 Geometry of Elementary Operations and Subspaces December 14 2000 1143 Geometry of Elementary Operations and Subspaces A continuation of notes titled Geometry of Elementary Operations Matrices Represent Linear Operators Let L be a linear operator that maps a space of vectors x Bx to a space of vectors y Ey with their respective bases B and E Here x is a column vector that represents x in the basis B as y represents y in E Now what represents L Matrix L E71 LB represents L with bases B and E because y Lx just when y E Iy E ILBx Lx When bases change say from B to B BC and E to E EG where the matrices C and G must be square and invertible as we have seen their matrices gure in changes of coordinates representatives thus x B IX and x B IX B 1 Bx Cilx represent x y E Iy and y 1y 1Ey G 1 represent y L E lLB and E y E71 LB E71 ELBilB G71 LC represent L Evidently y Lx just when y Lx and y I What works for vectors works also for linear functionals these are just linear maps to a 1 dimensional space whose basis need not change uT uTB and HT uTB uTB IB uTC represent 11T and 5T VTE vTE IE vTG represent vT so uTx uTx ET and VTy va Now 11T VTL just when uT vTL and ET EYE Con rm the last seven equations In all cases the misnamed coordinate free it should be called coordinate independent algebra is the same only the arithmetic changes when bases change To simplify arithmetic is the principal motivation to change bases Do not try to memorize where to put C or G Or is it C 1 or G 1 7 Left or right of L 7 You can work out these question s answers more reliably by remembering that a basis is an invertible linear map from a space of column vectors to another vector space and a change of basis postmultiplies the basis by an invertible matrix Here is an example to show how changes of bases can simplify arithmetic A given arbitrary possibly rectangular matrix L represents some linear map L from one vector space to another if apt bases are used in those spaces Let G71 be any product of elementary row operations that puts L into Reduced RowEchelon Form G IL and let C be any product of elementary column operations that puts G IL into its Reduced ColumnEchelon Form E G 1 LC As we have seen innotes The Reduced RowEchelon Form is Unique this last Reduced Echelon Form L is determined unique y by the starting matrix L regardless of how the elementary operations get to L But L must now consist of an identity matrix in its upper left corner and zeros in all later rows and columns if there are any Since elementary row operations leave rowrank unchanged and similarly for columnrank we see that these ranks are the same namely the dimension p of that identity matrix Now reinterpret G and C as matrices of two changes of basis one basis in each of the vector spaces connected by whatever linear operator L is represented by L since E I g L maps the rst p changed basis vectors in one Af ne vector space to the rst p changed basis vectors in the other Prof W Kahan Page 1 6 The r n nm an xxvna quotAnn1 ML 17mm AA M1 1 n A Math H110 Geometry of Elementary Operations and Subspaces December 14 2000 1143 Thus we conclude that every matrix L GLC 1 can be factored into a product whose two outer factors are invertible and whose third inner factor L is an identity matrix with perhaps some rows andor columns of zeros appended to make L have the same dimensions as L whose rank p is the dimension of the identity matrix For any such fixed L the family of matrices L GLC 1 generated as G and C sweep through all invertible matrices of appropriate dimensions is a family called Equivalent matrices they all represent the same abstract linear operator L but in different coordinate systems They all have the same rank p which we might as well define to be RankL What else have they in common Dimensions that are Invariants of Equivalent Matrices All that Equivalent matrices have in common are their dimensions and their rank p These numbers are the dimensions of three important vector spaces associated with that abstract linear operator L Let us name them One space is the DomainL the space of vectors upon which L operates Another is the Range space an ambiguous phrase best not used or Target spaceL into or onto which L throws its results If y Lx then x must come from DomainL and y from TargetspaceL As x runs through all of DomainL y Lx sweeps out a third vector space RangeL Why is it a vector space RangeL need not fill TargetspaceL but may be a proper subspace This is why Targetspace is a better phrase than Rangespace when they must be distinguished from Range Let L be any of the Equivalent matrices that represent L then DimensionDomainL CountColumnsL DimensionTargetspaceL CountRowsL and DimensionRangeL RankL RankL p The last equation comes from the Equivalent matrix L which tells us that RangeL has a basis with p vectors This basis cannot be chosen uniquely even though RangeL is fully determined by L This basis of RangeL is the image of the rst p basis vectors in some basis of DomainL those rst p basis vectors span a subspace of DomainL that need not be determined uniquely by L if its rank p is less than the dimension of its domain To think otherwise is a mistake made by many students but adding to each of the first p basis vectors any linear combinations of subsequent basis vectors from NullspaceL yields a new basis whose first p vectors now spanning another subspace of DomainL are mapped by L upon the same basis of RangeL The subspace of DomainL determined uniquely by L is its Kernel or Nullspace consisting of all vectors z that satisfy Lz 0 Why is it a vector space Looking at L tells us NullityL DimensionNullspaceL CountColumnsL 7 RankL for any matrix L that represents L In other words and this is IMPORTANT RankL NullityL Dimension DomainL Every linear operator L operates in two directions We have just looked at one operator L maps vectors in its domain to vectors in its range Now look at L another way it maps linear functionals vT that act upon TargetspaceL linearly to linear functionals 11T VTL acting upon DomainL This space of linear functionals vT dual to TargetspaceL is called Prof W Kahan Page 26 Math H110 Geometry of Elementary Operations and Subspaces December 14 2000 1143 CodomainL as vT runs through all of it VTL sweeps out a subspace of the space dual to DomainL This subspace can be called CorangeL The subspace of CodomainL swept out by solutions wT of WTL 0T is CokernelL You may have seen some of these spaces before in texts where L L is a matrix that maps one space of column vectors to another and then RangeL is the columnspace of L and its rowspace is CorangeL Exercise 0 The foregoing plethora of subspaces and names for them sound more complicated than they are describe all eight ofthem when L L 1 8 These eight subspaces are Targetspace Range Nullspace Domain Codomain Cokemel Corange Dual of Domain Exercise 1 Explain why the rank of a matrix product cannot exceed the rank of any factor Exercise 2 Every linear operator L can be written as a sum L cerl c2rT2 cker of dyads ch linear operators of rank 1 in infinitely many ways here each cJ is drawn from TargetspaceL and each rTJ from the dual of DomainL Show that RankL is the minimum possible number k of terms in the sum and exhibit a way to achieve this minimum This minimum called TensarRank is an alternative way to define RankL it can be generalized from linear operators to multilinear operators but nobody knows how to compute Tensor Rank for multilinear operators Exercise 3 The linear operator whose matrix is 1 maps a plane to a line in the plane 123 why The linear operator whose matrix is maps a 3space to a line in the plane why Similarly describe the effects of operators whose matrices are 015 45 23 456 4560 A1204 B101 C100 Di123 E11231 070 23 46 678 6789 The next four exercises were supplied by Prof B N Parlett and A Hernandez Exercise 4 Anoperator L is represented bya 3by3 matrix The set of all solutions p of Lp 0 sweeps out a plane P through the origin 0 Vectors b Lu and c LV are nonzero Describe the set X of all solutions x of Lx b and the set Y of all solutions y of Ly bc What is Dimension RangeL In the next three exercises V is some vector space of high dimension and b c d x y z are nonzero vectors in it Exercise 5 W is a subset of V containing b c and 3c 5d but not d nor e 7 2d determine whether W can possibly be a subspace of V Give your reasoning Do likewise for U a subset of V containing 5c and 3d 7 2b but not b nor d Exercise 6 Q is spanned by c x y z here x and y are linearly independent and span a subspace U that also contains y 32 but not y 2c What is DimensionQ and why Prof W Kahan Page 36 Math H110 Geometry of Elementary Operations and Subspaces December 14 2000 1143 Exercise 7 E and F are subspaces of V F is spanned by b e f and c d f is abasis for E which also contains b However the spans of c d and of b f intersect only in the zero vector Explain whether b e f is a basis for F Intersections Sums and Annihilators of Subspaces Let E and F be proper subspaces of avector space B Proper means neither 0 nor B Bases E for E and F for F consist of spanning rows of linearly independent vectors drawn from B but not necessarily from a given basis B of B Still some bases must be related E BE and F BF for rectangular matrices E and F with as many columns as the dimensions of subspaces E and F respectively and as many rows as the dimension of B Why must the columns of E be linearly independent and likewise those of F but maybe not those of E F Let the sum of subspaces E and F be denoted by E F it consists of all sums e f of vectors e drawn from E and f drawn from F Note that E F is a vector space Why Don t confuse the sum with the union E U F of two subspaces which consists of all vectors that belong to at least one of E and F 39 it need not be a vector space at all39 can you see why by providing a suitable example Let the intersection of subspaces E and F be denoted by E n F it consists of all vectors that belong both to E and to F and is a subspace of B too Why It may be just 0 Given E and F can we compute a basis for E n F It s a bit tricky First assemble matrix E F and reduce it to its echelon form GT1 E FC by pre and postmultiplication by invertible square matrices GT1 and C Next partition C H 1 conformably to obtain K L G4 EH FK I and G 1EJ FL 0 Then EH FK is a basis for E F and O 0 EJ 7FL is a basis for E n F unless it is 0 in which case J and L are empty matrices But the assertions in the last sentence are unobvious can you prove them Here are proofs of those assertions First E FL BEJ FL BO if it is not empty so EJ iFL If 2 satis es Jz 0 then it also satis es 0 EJ FLz FLz and then Lz 0 too because the columns of F are linearly independent since the last columns of matrix C are independent else it wouldn t be invertible z 0 too Therefore the columns of J and similarly L must be linearly independent if not empty so EJ 7FL is a basis for a nonzero subspace of E n F if not all of it To show that none of it is left out we must solve equation EJx Eu for x whenever Eu iFv lies in E n F The solution can be expressed in terms of a conformable partition of CT1 g here CTIC I CC 1 implies that MHNK PJQL HMJP and KNLQ are identity matrices and MJNL PHQK HNJQ and KMLP are zero matrices of perhaps diverse sizes Now a little algebra Do it suffices to con rm that x Pu Qv is the desired solution so EJ iFL does span all of E n F E F is next If EH FKz 0 then 2 satisfies 0 1 OG 1EH FKz 2 too so that EH FK must be a basis for a subspace of E F if not all of it To show none of it is left out we need only solve equation EH FKy Es Ft for y The solution is y Ms Nt Do the algebra to confirm this formula for y The formulas for x and y above are fragile numerically easily broken by rounding errors Robust formulas are more subtle Prof W Kahan Page 46 Math H110 Geometry of Elementary Operations and Subspaces December 14 2000 1143 A byproduct of the proof obtained by counting columns of the echelon form is the formula DimensionE n F DimensionE F DimensionE DimensionF This IMPORTANT formula deserves a proof simpler than the computation above can you nd a simpler proof Here it is Let D be any basis for E n F Ifnot already a basis for E D can be augmented to form a basis 1 E for E Likewise 1 E forms a basis for F Certame E F is spanned by the elements of 1 E E It is a basis too if its elements are linearly independent Suppose Dd 7 Ee 7 Ef 0 This says that Ef Dd 7 Ee lies in F and in E so it lies in E n F Therefore FfDd Dd 7Ee for some d Then Dd 7Ff q is the zero vector in F so d o and f o perhaps of different dimensions Dd7d 7 Ee 0 implies d 7d o and e 0 Similarly Therefore the elements of 1 E E really are linearly independent they do form a basis of E F Counting elements confirms the IMPORTANT formula above Exercise 8 If the dimension of a vector space is less than the sum of the dimensions of two of its subspaces can their intersection be just 0 Justify your answer Exercise 9 Two proper subspaces of a vector space are Complgmentary just when their sum is the whole space and their intersection 0 Can either determine the other uniquely Why The Annihilator of a subspace E is the set of all linear functionals wT that satisfy wTe 0 for every e in E and is denoted by EL This annihilator is a subspace of the dual space determined uniquely by E The notation Ei is a relic from Euclidean spaces which are their own duals that is why Ei is often called the orthogonal complement of E even if it is asubspace of a nonEuclidean space This terminology can mislead only in Euclidean spaces are E and EL complementary Annihilator is unmistakable A good way to think about subspaces is in terms of their bases Given a basis B for a vector space think of E BE for some rectangular matrix E with linearly independent columns as the basis for a proper subspace E RangeE whose annihilator is EL CokemelE This latter subspace has a basis too consisting of linear combinations of the rows of B 1 Exercise 10 Con rm important relations DimensionE DimensionEi DimensionB and ELL E by augmenting E to get a basis E E BE E and partitioning its inverse conformably to get a basis for EL Exercise 11 Cite ELL E for a quick proof of Fredholm s alternative 1 in the notes The Reduced RowEchelon Form is Unique It works for some in nitedimensional spaces Exercise 12 Prove that E F L EimFL If E mF 0 must EimFL 0T too Prove the answer is surely Yes if DimensionE DimensionF 7 DimensionE n F is less than the dimension of the whole space but otherwise surely No Prof W Kahan Page 56 Math H110 Geometry of Elementary Operations and Subspaces December 14 2000 1143 Warning about Duals of Duals of InfiniteDimensional Spaces The proof techniques used above are based upon finitedimensional matrix multiplication but most of the de nitions and inferences make sense for in nite dimensional spaces too There is one important exception that would go unnoticed because our notation takes it for granted A vector space is the dual of its dual This assertion obviously true for all finitedimensional spaces is false for many in nitedimensional spaces It is false for the space of continuous functions on a closed domain and for the space of all absolutely convergent series and for the space of all infinite sequences with at most finitely many nonzero terms these three spaces are each properly contained in the dual of its dual For infinitedimensional spaces Linear Algebra has to be rebuilt from the ground up in a graduate course that takes convergence into account it lies beyond the syllabus of this course except for occasional warnings like this one Prof W Kahan Page 66 Topics for Math H110 Linear Algebra and Matrix Theory Fall semester 2000 Lectures Tues Thurs 8 930 am in 61 Evans Discussion section Wed 8 9 am in 81 Evans Matlab will be used for computational examples For notes see httpwwwcsberkeleyedukaahanMathHl10 Vector spaces Abstract linear spaces subspaces dimension basis Dual spaces innerscalar product outer productdyad Crossproduct in Euclidean 3space Abstract Linear MapsTransformations Domain codomaintarget space kernelnullspace range Sums and products of linear maps inverses Representation by matrices dependent upon bases C nge of basis canonical bases anticipating later developments Elementary row and column reductions to canonical forms Row echelon form column echelon form diagonal form Rank equality of row rank and column rank nullity Triangular factorizations and variants of Gaussian Elimination Fredholm s Alternatives Determinants Determinant as ratio of volumes obtainable from triangular factors Determinantal expansions Cram er s rule Jacobi s formula for derivative Convexity Convex body as convex hull of points as intersection of halfspaces Support planes separating planes Linear programming the Simplex algorithm Normed linear spaces Vector norms triangle inequality convergence completeness compactness Dual norms operatorm atrix norms projections Nearness to singularity norm of inverse ill conditioned linear systems Euclidean and Unitary spaces orthogonal maps transpose of matrix GramSchmidt orthogonalization positive de nite matrices Cholesky factorization Least Squares Linearly constrained least squares Eigenvalues and Eigenvectors Triangularization by similarity block triangularization Characteristic polynomial CayleyHamilton theorem Jordan s normal form irreducible invariant subspaces continuity and derivatives of eigenvalues Real symmetric matrices variational derivation of eigenvalues Singular value decomposition Applications to as time permits Positivite matrices PerronFrobenius theory stochastic matrices Linear differential equations matrix exponentials Text The best text would be LinearAlgebra by PD Lax 1997 Wiley but for the fact that it is a graduate level text more likely to be appreciated after than before this course And it has no drill problems Better to buy any text that covers most the topics and is cheap enough to throw away afterwards then buy Lax s text for a reference Prof W Kahan This r n nm Ant xxvna quotAnnl with 17mm AA Anlmr A n A Math H110 Jordan s Normal Form December 7 2000 1038 am Jordan s Normal Form Our objective is to demonstrate that for any given complex nbyn matrix B there exists at least one invertible matrix C that transforms B by Similarity into a diagonal sum 3111J1 o o o BZIZJ2 o o 1 C BC 0 o B31313 o o o o BLILJL of Jordan Blocks each of the form 51 J where B is an eigenvalue of B and J is obtained from the identity matrix I either by deleting its rst row and appending a last row of zeros or equivalently by deleting its last column and prepending a rst column of zeros For example here is a 4by4 Jordan Block B 1 0 0 151 J 0 B 1 0 0 0 3 1 0 0 0 3 Such a block has one repeated eigenvalue and only one eigenvector regardless of its dimension Every eigenvalue 5 of B appears in at least one Jordan Block and these blocks can appear in any order and their various dimensions add up to the dimension n of B We ll see that B determines its Jordan blocks completely except for the order in which they appear Since every matrix Z71 BZ Similar to B has the same blocks they tell us all that can be known about the geometrical effect of a linear operator whose matrix in an unknown coordinate system is B For instance they show how B decomposes the vector space into an Irrealucible sum of Nested Invariant Subspaces as will be explained later An important application of Jordan s Normal Form is the extension of the de nitions of scalar functions fB of a scalar argument 5 to de ne matrices fB However we shall nd that fB is easier to nd from a Pennants form of B or from a triangular Schur form Jordan s canonical form under similarity is hard to discover because it can be a discontinuous function of its data B For example no matter how tiny the nonzero number u may be Jordan s Normal Form of B 1 0 0 0 3 1 0 0 0 3 1 u 0 0 B must be diagonal with four lbyl Jordan blocks do you see why And do you see why Jordan s Normal Form of B 1 0 0 0 3 u 0 0 0 3 1 0 0 0 3 is the same for all u i 0 Irreducible invariant subspaces are not determined uniquely if u 0 Discovering the Jordan blocks takes several steps each intended to simplify the problem The rst step identi es the eigenvalues B of B as the zeros generally complex numbers of its Characteristic Polynomial detOtl 7 B 7317 TraceB7tn 1 ilndetB H k 7 Bi Prof W Kahan Page 1 Tlm39a r n nm A xxvna quotAnnl ML 17mm Al Anlmr A n A Math H110 Jordan s Normal Form December 7 2000 1038 am The CayleyHamilton Theorem Every square matrix satis es its own Characteristic Equation ie fB 0 when f7 det7 IiB 203an fJ7 j is the characteristic polynomial of B This theorem is stated with an incorrect proof or none in many texts on linear algebra which is reason enough to present a correct proof here Let the Classical Aaljoint or Aaljugate of 7 17B be A7 Adj7 IiB It is known to satisfy A7 7 17B 7 17BA7 f7 I At first sight we could replace the scalar 7 by the matrix B in the last equation to get fB B17BAB O which is what the theorem claims But this is not a proof How do we know that a matrix identity valid for all scalar values of a variable 7 remains valid after 7 is replaced by a matrix It s not so in general as the next examples show Set 13 liil 2 31311 R M and Sam then P7 Q O for all scalars 7 but PRQ P i O and Q77 1Q7 I 7 21 for all scalars 7 but Q7P1QPI S i 7P21 These counterexamples reveal a aw in the simpleminded substitution of B for 7 above A correct proof must be more complicated Each element of adjugate A7 is a polynomial in 7 of degree at most nil it must have the form A7 EOSJql AJ7 j in which every coefficient is an nbyn matrix In fact every AJ is a polynomial in B computable from the identity A7 7 17B 7 17BA7 f7 I ie 7 1 7 B Eogjltn Aj 7 J 203ng fj 7 J I by matching the coefficients of successive powers of 7 Begin with the coef cient of 7 n An1 nt 1 Then for x114 nd that An2 7 BAIH 13111 so An2 13111 B And in general AJq fJI BAJ for j n nl 3 2 l 0 in turn starting from An O and ending at A71 O to meet endconditions in the sums This confirms by reverse induction that every AJ is a polynomial in B with coefficients drawn from the numbers fj and therefore BAJ AJB just as 7 AJ AJ7 justifying simpleminded substitution Alternatively observe that O A71 f01 Bf11 Bf21 Bfn11 B fB which is what the CayleyHamilton theorem claims End of proof Triangular Forms Similar to B Two other forms are almost as useful as Jordan s and far easier to exhibit First is Schur s decomposition B QUQ in which Q Q 1 and U is uppertriangular with the eigenvalues of B on its diagonal This unitary similarity has many uses and is relatively easy to compute with fair accuracy QUQ is almost exactly B its existence will be demonstrated below The second form to which every square matrix B can be reduced by complex similarity is a diagonal sum of triangular matrices of which each has only one eigenvalue and this eigenvalue is distinct from the eigenvalue of every other triangle in that sum Though still a continuous function of B this similarity is more difficult to compute than Schur s as we shall see later Prof W Kahan Page 2 Math H110 Jordan s Normal Form December 7 2000 1038 am Schur s triangularization will be shown to exist through a process of de ation as each eigenvalue of B is chosen its eigenvector will be used to reduce by l the dimension of the matrix from which the next eigenvalue of B will be chosen Here is how de ation works Choose any eigenvalue 151 of B and nd eigenvector v1 as a nonzero solution of the singular homogeneous linear system 15117 Bv1 o Then embed VI in a new basis V v1 v2 of the vector space as its rst basis vector B is the matrix of a linear operator whose matrix in B1 13 because Bv1Blv1 so V 1Bv1 B1 Here B is a o B 0 square matrix whose dimension is 1 less than B s The eigenvalues of B are 151 and the eigenvalues of B because detOtl 7 B detOtl 7 V 1BV 7 7 Bldet7tl 7 B What was just done to B can now be done to B Choose any eigenvalue 152 of B and of B and solve sz 15252 for a nonzero eigenvector 2 of B not of B and then form a new basis the new basis is V 1BV V V2 53 for the space upon which B not B acts the rst column of VTIBV is T B vilmz B2 Set W loT to nd VW 1BVW 51 W 7 015 0 0 V o 71 V 2 0 0 Repeating the process ultimately delivers an uppertriangular U QTIBQ with its eigenvalues on its diagonal in the order in which they were chosen as eigenvalues of B Exercise Use this U to deduce the CayleyHamilton Theorem from the factorization det l 7 B Hj 7 7 Bj Because the theorem s proof given earlier required no knowledge of eigenvalues it works also for a scalar field like the Rational field whose matrices B may lack eigenvalues because the field is not algebraically closed Schur s triangularization is a special case of de ation performed by Unitary Similarities The given matrix B is regarded as a linear operator that maps a Unitary Space to itself the space is endowed with a length HvH Vvv de ned as the rootsumsquares of the magnitudes of the elements of vector v Only orthonormal bases are used for this space every change from one such basis to another is represented by a Unitary Matrix whose inverse equals its complex conjugate transpose When eigenvector v1 is found it is divided by its length to normalize it so that HVlH l and then it is embedded in an orthonormal basis V v1 v2 V3 so that V 1 V There are many ways to construct such a V One computes subsequent columns v2 V3 by applying GramSchmidt orthogonalization to the columns of v1 1 and discarding a resulting column of zeros Another computes the elementary orthogonal re ector V V VT1 I 7 2uuuu that swaps v1 with the rst column of the identity I Likewise for V so W above is unitary too and so is the product Q of unitary matrices Thus we obtain Schur s triangularization U QBQ with Q Q 1 Below we ll need a special Schur triangularization in which repeated eigenvalues of B appear in adjacent locations on the diagonal of U If 51 is a repeated eigenvalue of B it is also an eigenvalue of B and can be chosen for 152 above Thus can the needed ordering of B s eigenvalues be imposed upon all diagonal elements of upper triangle U at least in principle Prof W Kahan Page 3 Math H110 Jordan s Normal Form December 7 2000 1038 am Exercise Use Schur s triangularization to deduce that every Hermitian matrix A A is unitarily similar to a real diagonal matrix so the eigenvectors of A can be chosen to form a complex orthonormal basis A real Schur decomposition of a real square matrix B QUQT exists in which QT Q 1 is real orthogonal and U is real blockuppertriangular with 1by1 and 2by2 blocks on its diagonal each 1by1 block is a real eigenvalue of B and each 2by2 block is a real matrix with two complexconjugate eigenvalues of B The existence is proved by choosing for any complex eigenvalue Bl a pair of orthogonal vectors that span an invariant subspace of B belonging to this complex 39 and its J39 and the pair as the first two vectors of a new basis for the space This change of basis de ates the eigenproblem of B to a real matrix B of dimension 2 less than B s The de ation and subsequent real block triangularization is otherwise very much like the foregoing complex triangularization I JJ39 Exercise Use Schur s real triangularization to deduce that every real symmetric matrix A AT is orthogonally similar to a real diagonal matrix so the eigenvectors of A can be chosen to form a real orthonormal basis Next we shall show how Schur s U QBQ can be reduced by further similarities changes of basis to a diagonal sum of Pennants each an uppertriangular matrix with one eigenvalue of B repeated as often as the triangle s dimension First we need a Lemma Suppose square matrices F and P of perhaps different dimensions have no eigenvalue in commonl De ne a linear operator Z ZF 7 PZ mapping the vector space of matrices Z of appropriate dimensions to itself then Z 0 only when Z O so this linear operator f is invertible Proof Suppose Z O Then PZ ZF and hence P2Z PZF ZF2 and similarly PkZ ZFk for k 0 1 2 3 Now consider the characteristic polynomial of F namely CPO detOvI 7 F 1 113an k7pj where p1 p2 pn are all the eigenvalues perhaps not all distinct of F The CayleyHamilton theorem says CDF O but all the factors of CDP 1 113an P 7 pl1 are nonsingular so it is too Expand CD in powers of its argument to see termbyterm that CDPZ ZltDF O so Z O as claimed End ofproof Therefore the equation X XF 7 PX Y can be solved for X 1Y in terms of F P and Y of the right dimensions so long as F and P have no eigenvalue in commonl Le so long as GCDdetM7F detM7P 1 Entirely rational closedform formulas for X do exist Let Yk Pk lY Pk zYF PYFk 2 YFk 1 for every integer k 2 0 Evidently Y0 O and Y1 Y and Yk XFk 7 PkX For each k substitute Yk for kk in 130 to get XC1gtF 7 C1gtPX 7C1gtPX whereupon X 7C1gtP 1 This is a rational formula requiring no knowledge of eigenvalues but rarely useful for numerical computation Another way begins by reducing F and P by similarity each to one of the uppertriangular forms above say U Q IFQ and R D IPD Then solve ZU 7 RZ D lYQ for Z element by element starting in its lower left corner and working up to the right by diagonals Finally set X DZQ 1 T Sometimes the phrase disjoint spectra is said instead of no eigenvalue in common First Physicists and Chemists said spectrum for the set of eigenvalues of a linear operator that Quantum Mechanics associates with an atom or molecule radiating light in colors characterized by its spectrum Now Mathematicians say it too Prof W Kahan Page 4 Math H110 Jordan s Normal Form December 7 2000 1038 am BlockDiagonalizing a BlockTriangular Matrix Lemma above helps us construct a triangular similarity that blockdiagonalizes a given block 71 triangular matrix 13 is similar to I X 01 1 211211 1 1X1 131211 13 matrix X satis es XF 7 PX Y and such an X exists when square matrices P and F have disjoint spectra no common eigenvalue Repeat this process upon P and F so long as they too are blocktriangular and have on their diagonals square blocks whose spectra are disjoint Such con gurations come from the special Schur triangularizations mentioned above Therefore after any square matrix B has been triangularized by a similarity QTIBQ U in such a way that equal eigenvalues of B are consecutive on the diagonal of U similarities can further reduce the triangle U ultimately to a diagonal sum of Pennants like this 3111 N1 o o o o B212N2 o o QK 1BQK K lUK 0 0 W N 0 395 395 395 III mm here 51 52 B3 Bk are the distinct eigenvalues of B Each pennant 151 N is atriangle with only one eigenvalue on its diagonal like this 4by4 example 3 151 N 0 B 7 7 0 0 3 0 0 0 3 Each such pennant is copied from corresponding elements of U Each strictly uppertriangular N has zeros on its diagonal and is therefore Nilpotent meaning Nm O for some positive integer m S dimN Uppertriangular matrix K is a pennant too with l s on its diagonal and nonzero superdiagonal blocks only where KTIUK puts zero blocks above the diagonal The foregoing diagonal sum of pennants can be shown to be a continuous function of B in the following sense As the elements of B all vary continuously but not too far the elements of QK and of each pennant in the diagonal sum also vary continuously however the eigenvalues on each pennant s diagonal can become no longer all equal though different pennants spectra remain disjoint Eigenvalues can vary abruptly an eigenvalue of multiplicity m can split into In eigenvalues that spread apart as fast as the mLh root of the perturbations in B as in the first example above with a tiny perturbation u Worse the elements of QK or QK 1 can be so gargantuan that roundoff committed during the pennants numerical computation gets amplified enough to obliterate some of the data in B A few computer programs not MATLAB try to avoid this obliteration by locating tight clusters of B s eigenvalues choosing one cluster per approximated pennant in such a way that the elements of QK and QK 1 never become intolerably big This is a tough task Several years ago Prof 1Ling Gu proved a conjecture of Prof 1W Demmel they are both here at UCB to the effect that attempts to avoid obliterating the data must occasionally consume a lot of time trying to choose suitable clusters Specifically for all dimensions sufficiently large there are rare matrices B for which choosing clusters consumes time that grows like an exponential function of B s dimension though the time required to compute all approximate pennants would grow like the cube of dimension comparable to several matrix multiplications or inversions if a good choice of clusters were known in advance Therefore our discussion of pennants and of Jordan s Normal Form is entirely theoretical not a recipe for infallible numerical computation with arithmetic operations rounded to finite precision Prof W Kahan Page 5 Math H110 Jordan s Normal Form December 7 2000 1038 am Functions of Matrices and Sylvester s Interpolation Formula The diagonal sum of pennants helps us understand the extension of a scalar function fU of a scalar argument 7 to matrix arguments We have already seen how a polynomial f can be extended to a matrix argument it happened in the CayleyHamilton theorem Analogously epr 21120 XIIn can be extended to de ne expB 2120 Bnn for all square matrices B since the in nite series converges absolutely and ultimately very quickly Exercise The economist M Keynes said Ultimately we are all dead Roughly how many terms of the series for exp71000 must be added up until every subsequent term is tinier than the sum of the series How much bigger than its sum is the biggest term in the series One way to answer these questions is to use a computer to generate those terms and the value of exp71000 A better way uses Stirling s asymptotic approximation n m 21tn exp nlogn 7 n 112n for big n Answering these questions will help you appreciate why computers don t compute exp just from its series Exercise Why bother to compute exp for matrices Confirm from the series that for any scalar variable 1 and constant square matrix B the derivative d expTBd c BexpTB expTBB Then show that the solution y c of dyd39c By c39c is the vector function y c expTBy0 If expieBc0d0 Exercise We say thatmatrix Y is a square root of X when Y2 X One of 0 I and g g 1 has square 0 0 0 0 roots and the other doesn t say which and why Explain why every nbyn Hermitian Positive Definite X X has at least 2quot square roots Y and if more than 2quot then infinitely many Not every such Y lX since If an extension of f to square matrix arguments exists it is expected to have certain properties among them B fB fBB C 1fBC 1104130 for all invertible c and f B1 O Bl O whenever fBJ is de ned for both square submatrices B since 0 B2 0 B2 these equations are certainly satis ed when f is a polynomial or an absolutely convergent power series as you should verify PAM Dirac summed up these expectations in one line ZfB fBZ for all matrices Z that satisfy ZB BZ These expectations are met by Sylvester s Interpolation Formula which expresses fB as a polynomial CDB with coef cients that depend upon B and f as follows Suppose polynomials CPU and WU have these properties WB O and fU CDU WU tLU for a function MU suf ciently differentiable at all zeros of W Then Sylvester s Interpolation Formula de nes fB CDB How does this work First W has to be chosen and then it has to be used to determine CD One candidate is WU detU IiB because the CayleyHamilton theorem says this WB O but a better one may be the monic polynomial WU of minimum degree that satis es WB O here a monic polynomial of degree d has 1 as the coef cient of kd This WU of minimum degree divides evenly into every polynomial pU satisfying pB O as would pU detU IiB because pU divided by WU yields a quotient polynomial qU and aremainder polynomial rU pU 7 WU qU of degree less than WU s but then rB O and so rU 0 since no nonzero polynomial r of degree less than W s can satisfy rB O Prof W Kahan Page 6 Math H110 Jordan s Normal Form December 7 2000 1038 am This I called the Minimum Polynomial of B is determined uniquely by B Exercise Explain why Then use the pennants form of B to show that its every eigenvalue B is a zero of with the same multiplicity m as the maximum index In at which Nm 1 0 for the pennant BI N Use Jordan s Normal Form to show that the distinct zeros if any of the polynomial det7 IiB I 7 are those distinct eigenvalues of B each with two or more eigenvectors if they exist Sylvester would call B Derogatory For Sylvester s formula to work I does not have to be the minimum polynomial of B any polynomial multiple of that minimum polynomial is acceptable After I has been chosen it determines uniquely a polynomial ltIgt of minimum degree less than I s that Interpolates matches f and perhaps its first few derivatives f39 fquot if they all have nite values at every zero 15 of I in the following sense if B is azero of I ofmultiplicity m which means I B was was I lm UUD 0 I mB then 1315f15 1315 fB was was and qgtltm 1gth fm 1B Why must ltIgt satisfy these equations How do they determine ltIgt Why uniquely CIgt must satisfy the last m equations at every mtuple zero B of because f7 CIgt7 I 7 u7 was assumed it implies that the Taylor Series expansion of f7 7CIgt7 in powers of 7 713 begins at 7 713m If the aggregate of those equations for all zeros of I determines CIgt7 it is determined uniquely because the difference between two such polynomials is a polynomial of degree less than I s and yet having all the zeros of I with at least the same multiplicities this difference would have degree no less than I s if it did not vanish CIgt is now determined by the aggregate of those equations because of their linearity in the desired coefficients of CIgt 39 the number of linear equations is the same as the number of coefficients namely the degree of and the equations solution is unique if it exists so it exists and Sylvester s fB CIgtB Exercise Prove that if all zeros Bj of are distinct CIgt7 Zj fBj I 7 7 713jCIgt39BJ this is Lagrange s Interpolaling Polynomial Repeated zeros of lead to a different CIgt named after Hermite In short when f and enough of its derivatives take nite values on the spectrum of B so that Sylvester s Interpolation Formula fB ltIgtB does exist it provides a polynomial that de nes fB but does not provide an always good recipe for computing it Analogous situations arise for functions like exp sin cos which are defined well by infinite series or by differential equations both of which are impractical ways to compute those functions numerically at big arguments recall the Exercise above about exp71000 For a matrix B of big dimension n the n71 matrix multiplications required for an explicit computation of the polynomial CIgtB would generally consume time roughly proportional to n4 far greater than the time roughly proportional to n3 that will be needed to compute fB from its pennants form when it can be computed which is almost always Worse the coefficients of a polynomial CIgt can be huge compared with its value so the explicit computation of CIgtB can turn into a mess of rounding errors left behind after cancellation For instance consider the Tchebyshev polynomials Tk7 coskarccos7 They are polynomials because they satisfy the recurrence T07 1 T17 7 and Tk17 27 Tk7 7 Tk17 for k 1 2 3 in turn Although 71 S Tk7 S 1 when 71 S 7 S 1 coefficients of Tk7 grow exponentially with k 39 confirm for k 2 2 that Tk7 21917 k 7 21quot3k7 k 2 This formula is a numerically unstable way to compute values of Tk7 when 7 is almost i1 39 use the recurrence instead There s a faster way than Sylvester s Interpolation Formula to compute fB when a pennants hum of B like QK 1BQK above is available fB QK fQK 1BQK QK 1 It is derived from Sylvester s polynomial thus fB ltIgtB QK ltIgtQK 1BQK QK 1 in Prof W Kahan Page 7 Math H110 Jordan s Normal Form December 7 2000 1038 am which inner factor CDQK 1BQK could be obtained from the pennants form by replacing its every pennant BIN by BHN However CD need not be computed at all Since 1 QK 1BQK QK 1 1 BQK O every pennant s l BIN O consequently CDBIN fBIN which can be computed faster than CDBIN can as follows Exercise Show that fBI N fBI FBN fquotBN22 fm 1BNm 1m71 when N quot o Use this formula and the pennants form of B to confirm that expB eT expB739cl for every scalar T Warning expBC expBexpC only if BC CB 39 otherwise the CampbellBaker Hausdor series must be used Fortunately Sylvester s definition of fB CDB as a polynomial does not compel us to compute the polynomial which could have high degree and huge coefficients causing loss of accuracy to roundoff Instead the pennants form permits a more direct computation l3111N1 O O O o BZIZ N2 0 o 1 f B QK o o 3313 N3 o QKY o o o Bka Nk in which each pennant fBIN fBI f BN fquot BN22 fm 1BNm 1m71 when Nm 0 Most often m l or2 Actually we needn t compute B s pennants form to compute fB The pennants form s purpose is to help us understand fB after which we can compute it in better but unobvious ways developed here at UCB by Prof BN Parlett and his students An outline follows Consider first an approximate pennant P m Bl N 39 here P is uppertriangular with all diagonal elements nearly equal so P7131 mN is nearly nilpotent We can t be sure the Taylor series fF fBI 903037131 fquotBPeBI22 fltm1gthpemmrlmeit will converge rapidly when P7131 m N is nearly nilpotent since its elements may be huge Still if the dimension In of P is small a polynomial close to the first In terms of this series can be used in Sylvester s Interpolation Formula to compute fP as follows Let 9 0 det liP k7817t7132w 3me and express f 7 f031 M031 132 A7031 132 130 Am71f J31 J32 Bm713mei7J327J31 t AmfBl 132 13m 70 1 0 in terms of its DividedDi erences Ak lfOEI 132 1319 using Newton s DividedDi erence Formula 39 these are explained usually in a different notation in texts like Conte amp de Boer Elementary Numerical Analysis 3rd ed 1980 ch 2 Like derivatives divided differences can be computed from the function f and scalar values of its argument For instance AfBl 132 fBl7fBzBFBZ if Bl 132 39 otherwise AfB B f39B Then f P f1311Af131 132 A7031 132 130 t Am71f J31 J32 BmXP BmilDP BZDP BID because PP O Thus fP can be computed quickly from a polynomial for small approximate pennants P Next consider that f P 31 P X has to satisfy P Y f P Y f P 31P Y so X must satisfy 0 F o F o F o F o F o F an equation XFiPX Y fF 7 fPY whose solution exists when the spectra of P and F are disjoint and X can be computed quickly when P and F are uppertriangular see after Lemma Thus Schur s decomposition U QTIBQ provides an upper triangle from which fB Q fUQ 1 can be computed directly block by block starting at nearpennant diagonal blocks and working to the upper right without computing U s pennants form Prof W Kahan Page 8 Math H110 Jordan s Normal Form December 7 2000 1038 am So fB can be computed without reducing B to a diagonal sum of pennants by similarities Can we compute fB without knowing Schur s triangle U QTIBQ nor B s eigenvalues One might surmise so at rst Certainly fB can be computed whenever f0 p7tq7t is a Rational function a ratio of polynomials whose denominator polynomial 1 has no zero coincident with an eigenvalue of B Otherwise qB 1 would not exist do you see why Moreover fB pBqBT1 CDB wherein all the coef cients of Sylvester s polynomial CD can be determined by nitely many rational arithmetic operations upon the elements of B without rst computing its eigenvalues An inelegant way to do so is suggested by the Exercise Use the CayleyHamilton theorem to express qB 1 if it exists as a polynomial in B Exercise Show that in principle B 5 minimum polynomial can be determined by finitely many comparisons and rational arithmetic operations upon B 5 elements Hint l B B2 B3 can t all be linearly independent The situation is different for nonrational functions Any analytic function f can be extended to fB provided no eigenvalue of B falls upon a singularity of f Afunction f is analytic if its domain in the complex kplane consists of a region at every interior point B of which the Taylor series ms mums fquot15x71522 fmB7wBmm converges to f0 for all sufficiently small 117151 gt 0 The singularities of f are the boundarypoints of its domain For example polynomials and exp and sin are all analytic with no nite singularity rational functions are analytic with their singularities at their poles where they take in nite values cotan is analytic with singularities at all integer multiples of TE analytic functions ln and W are singular at 0 and along the negative real axis across which they jump discontinuously Every textbook about analytic functions of a complex variable explains Cauchy s Integral Formula ms 1C f7v7wl5 dx 271 in which 1 W 71 and the path of integration runs counterclockwise around a closed curve C lying in the domain of f and surrounding B but no singularity of f The integral is zero if 5 lies outside C This formula can be used to extend f to a matrix argument fB 1C f7 MiB 1 dx 27c1 provided C surrounds the spectrum of B but no singularity of f Although no eigenvalue of B appears explicitly in this integral formula it delivers the same result as did Sylvester s Interpolation Formula To see why let 1 be either B s minimum polynomial or its characteristic polynomial and consider 805 X l 7 IIB IIM 7wB Despite first appearances 805 X is a polynomial in B of degree one less than 1 s with coef cients that are rational functions of 7M Do you see why Then we nd that Sylvester s Polynomial has 1C Know 1 an 27c1 fB 7 1 Blc fkms a an 271 has coef cients determined by f by l and by integration around any closed curve C inside which lies the spectrum of B and no singularity of f Then fB CDB since l B O and yet no eigenvalue of B appears explicitly in the integral that de ned CD But it s a swindle Attempts to simplify the integral that de ned CD by expressing it in a closed form in terms of elementary transcendental functions like ln and arctan etc bring the eigenvalues of B back They persist unless f is rational or sometimes algebraic Prof W Kahan Page 9 Math H110 Jordan s Normal Form December 7 2000 1038 am Example expB l7 Solutions 111 of a 2ndorder linear homogeneous differential equation 1quot 7 20m39 7 Yr 0 here 1139 dnd c and 11quot17 d211d l72 with constant coefficients 0L and y can be obtained from an equivalent firstorder homogeneous system of linear differential equations y39 By with y and a constant coefficient matrix B 210 2 Solutions are y l7 expB l7yO Tl for any constant yO y0 To compute expB l7 we get the eigenvalues BJ 0 i WOL2 y of B as the zeros of its characteristic polynomial det7 1 7 B 7 2 7 207 7 y 7 7 Bl7 7 52 51 152 if 0L2 7 0 and then the Lagrange interpolating polynomial that matches exp7 l7 at 7 151 and at 7 152 is 137 7 7152exp151 l7 7 7 7151exp152 l7 151 7 I52 and then expB l7 CDB Note how this degenerates towards 00 as 151 and 52 approach equality Bl 152 0L if 12 y 0 and then the Hermite interpolating polynomial that matches exp7 l7 and its derivative Eexp7 l7 at 7 0c is 137 l 177 70L exp0c l7 and then expB l7 CDB again Many texts provide this and the former formula for CD Although expB l7 is a continuous function of B the formula for CDB jumps from one form to another as 12 7 passes through zero This jump re ecting the discontinuity of Jordan s Normal Form of B rather than any aspect of the differential equation is an artifact of algebra removable by using the pennants form of B instead of Jordan s and Newton s interpolating polynomial twice instead of Lagrange s Set 5 15171522 V0L2 y and find 137 exp0c l7 cosh5 c 7 70Lsinh5 l75 exp0c l7 cos15 c 7 7 7xsin15 c15 wherein it is understood that sinh5 C5 7gt 17 and 7sin15 c15 7gt 17 as 5 7gt 0 Despite appearances the two forms of 137 are really the same because of identities cos1u coshu and sin1u sinhu1 use whichever is more convenient in the light of the sign of 12 y Example Nonexistent and Nonunique V Nothing like Sylvester s Polynomial Interpolation Formula can cope with extensions of some functions f to some square matrices B The difficulties will be illustrated by using W as an example We may say that Y is a square root of B when Y2 B but not every such square root Y if any exist is a polynomial in B as Sylvester s formula requires of B Existence is problematical when 0 is an eigenvalue of B because 0 is also a singularity of W its derivative is infinite there Therefore it is not surprising that 0 I has no square 0 0 11 0 g n 0 0 0 has infinitely many square roots Y 0 0 0 0 0 0 0 111 0 as T and E run through all scalar values except T 0 yet WB does not exist A derogatory B with more than one 39 39 for some 39 39 may have infinitely many square roots for instance i I il but the 2by2 square roots Y of I include also roots but it is surprising that B 1 1 all 2by2 matrices with eigenvalues 1 and 7l namely Yi Vl gn n J E 7417 n Prof W Kahan Page 10 Math H110 Jordan s Normal Form December 7 2000 1038 am Jordan s Normal Form of Nilpotent Matrices These notes began by displaying Jordan s Normal Form of B 3111J1 o o o 1 o 3212J2 o o C BC 0 o B313J3 o o o o BLILJL and now its existence will be proved This form the canonical form under Similarity is important because all matrices Similar to B have the same Jordan Normal Form except that its Jordan blocks 51 J may appear ordered differently along the diagonal The proof is complicated because the number of blocks and their dimensions depend discontinuously upon B whenever any block is 2by2 or bigger or whenever any two blocks share an eigenvalue 5 in short the Jordan Normal Form is discontinuous whenever it is interesting These interesting cases are rare because they require that some eigenvalue be repeated which occurs just when the characteristic polynomial detIB and its derivative TraceAdj7tIiB have at least one zero in common and therefore have a nontrivial Greatest Common Divisor which occurs just when these polynomials coefficients satisfy a polynomial equation illustrated by the following example k4 0 3 73 n Q and its derivative 4A3 30 2 2E n vanish together for some 7 just when 10 E n E 0 0 0 1min 0 0 0 1 a 5 ng det 0 0 0 43m2 n 0 0 0 43m2 n 0 0 43m2 n 0 0 43m2 n 0 0 0 Thus the elements of interesting matrices B the ones with repeated eigenvalues satisfy one complicated polynomial equation which places B on a convoluted hypersurface in the space of all matrices of B s dimensions This hypersurface intersects itself in many ways For example in the 4dimensional space of 2by2 matrices 3L the interesting ones with a double eigenvalue lie on a 3 dimensional circularconical cylinder whose equation is 06 7 5 2 4B7 0 39 its selfintersections form two 2 dimensional planes whereon 06 5 and either B 0 or Y 0 and these two intersect alonga line whereon 06 5 and both B 0 and Y 0 Only on that line do matrices have two eigenvectors for one eigenvalue The selfintersections fall into many families of shapes some of which correspond to different special cases in the proof of the Jordan Normal Form s existence complicating it Our proof will be computational to minimize abstractness and to indicate how Jordan s Normal Form can be obtained at least in principle from arithmetic performed exactly Rounded arithmetic operations are problematical for reasons discussed after the pennants form above The first step reduces B by Similarity to a pennants form a diagonal sum of blocks each like 151 N where B is an eigenvalue of B and N is strictly uppertriangular and therefore nilpotent Nm O for some m Recall the pennants form QK 1BQK displayed above Prof W Kahan Page 11 Math H110 Jordan s Normal Form December 7 2000 1038 am The next step seeks for each pennant 51 N an invertible matrix A such that A 1BI NA is a Similarity exhibiting Jordan s Normal Form of the chosen 151 N If all searches succeed the diagonal sum A of the A s will provide a Similarity A 1QK 1BQKA that exhibits Jordan s Normal Form of B s pennants form and hence of B with C QKA above Two more steps will simplify each search First observe that A 1BI NA 151 AilNA is Jordan s Normal Form of 51 N just when A71 NA is Jordan s Normal Form of N so no generality is lost by seeking Jordan s Normal Form of only nilpotent matrices Second the search will exploit a proof by induction starting with the hypothesis that for each nilpotent matrix N of dimensions smaller than N s a Similarity can be found that transforms N into its Jordan Normal Form This hypothesis is valid for all lbyl nilpotent matrices since there is only one of them namely 0 and the lbyl identity 1 effects the desired Similarity Exercise Although nilpotent N need not be in its pennants form this form is strictly uppertriangular why Since N is nilpotent NHquot1 i O Nm for some positive integer m It is important to our search If m 1 then N O is already in Jordan s Normal Form and our search is over Let us suppose m 2 2 and continue searching Since N quot1 i O we can find a column vector v such that Nmilv i o and then construct matrix V Nm 1v Nmizv Nv v Its columns must be linearly independent for the following reason If Elksq ujNJv o for some k 2 0 then Nm kTIEkSJqn ujNJv ukav o and so Mk 0 setting k 0 l 2 1 mil in turn implies every uj 0 Therefore V can be embedded in a square matrix V V all of whose columns are linearly independent V is empty if m dimensionN Anyway V VT1 exists V was designed to satisfy NVVJ where J is the mbym nilpotent Jordan block if m 4 0 1 0 0 J 321 for example Therefore VV 1NVV J in WhiCh N is emptyif O N 0 0 0 0 m dimensionN If N is empty we have found Jordan s Normal Form of N so our search is over Let us suppose 2 S m lt dimensionN and continue searching Now N is not empty but still J Jul 7 V V 1NmV V O This tells us Jm O O N 0 111m which we knew already and Nm O Our induction hypothesis says that this nilpotent N Eln be transformed by a Similarity to its Jordan Normal Form AilNA say Substituting VA for V and renaming it V replaces N by its Jordan Normal Form ina Similarity w WNW v I O N that de nes R and exhibits N already as adiagonal sum of Jordan blocks J1 J2 J3 each like J but of perhaps diverse dimensions Later we shall see how very special R is To complete our search we must find one more Similarity one that gets rid of R Prof W Kahan Page 12 Math H110 Jordan s Normal Form December 7 2000 1038 am 71 J13IZJD Since 12 172 o 01 ON 01 01 this Similarity gets rid of R if and only if ZN 7 JZ R We have to compute a solution Z of this linear equation in order to compute the desired Similarity The equation cannot be solved with the aid of Lemma above because the spectra of N and J are not disjoint they are the same Consequently the linear operator that maps Z to ZN 7 JZ is singular ZN 7 JZ R can be solved for Z nonuniquely if and only if this equation is consistent To demonstrate its consistency we shall apply one of F redholm 3 Alternatives The equation Fz r has at least one solution 2 if and only if 71 The desired Similarity takes the form 1 wTr 0 whenever wTF oT See class notes titled The Reduced RowEchelon Form is Unique Instead of wTr we must use TraceWR because this runs through all linear combinations of the elements of R as W runs through all matrices of the same shape as RT Do you agree Instead of wTFz we must use TraceWZN 7 JZ TraceNW 7 WJZ the last equation follows from TraceWZN TraceNWZ As WTFZ 0 for all 2 just when wTF oT so does TraceWZN 7 JZ 0 for all Z just when NW 7 WJ O Therefore instead of wTF oT we must use NW 7 WJ O In short To nd a Similarity that gets rid of R and exhibits Jordan s Normal Form of N we must solve ZN 7 JZ R for Z which can be done if and only if TraceWR 0 whenever NW WJ This last line is all that remains for us to prove we must deduce that TraceWR 0 for all solutions W of NW W from a property of R What property of R do we need Set 3 7 Jm lR Jm ZRN WARN2 JzRNm 3 JRNm Z RNm l S O is the property we need To prove S O con rm rst by induction for m l 2 3 J 1 and then recall that J I V V 1NmV V O O N 0 lm 0 N From S O we shall deduce that TraceWR 0 whenever NW WJ via a sequence of steps that simplify the problem ultimately to a nilpotent N with only two Jordan blocks in turn that J S N is a diagonal sum of nilpotent Jordan blocks J1 J2 J3 that induce partitions of R S and W into blocks of corresponding sizes as follows split R R1 R2 R3 so that J R1 R2 R3 w 0 J1 o o 1 J I o 0 J2 O andthen SS1 S2S3 and W W1 compatibly Every ON w o o 0 J3 3 3k 7 Jm le Jm ZRka Jm 3Rka2 J2Rkam 3 JRkam z Rkam 1 7 0 and every Wk satis es Jka Wk just when NW WJ so that TraceWR 2k TraceWkRk 0 for all such W just when every TraceWkRk 0 for every k l 2 3 You agree Prof W Kahan Page 13 Math H110 Jordan s Normal Form December 7 2000 1038 am Whatever is done to cope with each Jordan block Jk and its associated Rk Sk and Wk can be done independently of the others nothing is lost by pretending that N is just one Jordan block thereby dropping the subscripts k to simplify the process In short There is a Similarity that exhibits Jordan s Normal Form of N if and only if TraceWR 0 whenever NW WJ where J and N are Jordan blocks Jm0atl quot1 Nm 0 and s ElsJSm JIH JRNJ 1 0 Next let n be the dimension of N from Nm 0 follows n Sm Recall that m is J s dimension Let nbym matrix WO 0 I have min columns of zeros followed by the n byn identity matrix Observe that NWO 0 N WOJ Moreover As TE runs through all polynomials of degree less than n W TENWO WOTEJ runs through all solutions of NW WJ To see why this is so de ne function urcX to be the element in the UpperRight Corner of its matrix argument X Thus urcW 01m More generally the element of W in row i and column j is DU urcNiilWJm j because premultiplication by Ni 1 shifts row i up to row 1 and postmultiplication by Jm J shifts column j right to column m If NW WJ then Dij urcNHWJm j urcNm Hi jW chiiimm say Here Tck 0 if k lt 0 since then Nm Hi j Nnilik O and k j7i7mn lt m47mn n Set TEOQ Eoskql TEka and observe that the element in row i and column j of TENWO zoskltn TEkaWO is urcNHZOSklt11 TEkaWOJm j urcZogklt11 nkali llm jWO since WOJ NWO urcZOSklt11 TEkaTiTITm j since WO 0 I urcnjiiiernNnl nj7i7mn Dij therefore every solution W of NW W has the form W TENWO as asserted above Next we shall demonstrate how every such W satis es TraceWR 0 completing the proof that there is a Similarity that exhibits the Jordan Normal Form TraceWR TraceRW and W Eogkql TEkaWO so we need only demonstrate TraceRNkWO O for all k 2 0 TraceRNkWO 213ng urcle1 RNkWO Jm j since RNkWO is mbym zlgj n urcJJquot1 RNm itka since WOJ NW0 urc 21st 1171 RNm JTkWO urcSNkWO O End of proof The foregoing proof was devoted mostly to proving that the linear system ZN 7 JZ R has at least one solution Z not to computing it Elements of Z can be computed in order starting at the lower left and working up to the right by diagonals Computation is tricky because the system ZN 7 JZ R is both over and underdetermined It is overdetermined because no solution Z exists unless R is constrained by S O which means each diagonal of R that ends in its last row must sum to zero The system is underdetermined because it does not determine its solution Z uniquely another solution is Z ZOMN where Z0 is obtained from the nbyn unitmatrix by appending min rows of zeros and n is any polynomial of degree less than n Jordan s Normal Form of B can vary discontinuously and the Similarity C71 C that exhibits it violently discontinuously as B varies Prof W Kahan Page 14 Math H110 Jordan s Normal Form December 7 2000 1038 am Nested lrreducible Invariant Subspaces We know every linear operator B that maps a vector space to itself is represented by a matrix 3111M1 o o o 71 BZIZHZ o o C BC 0 o B313J3 o o o o BLILJL in Jordan s Normal Form if an appropriate basis C is chosen Among the basis vectors must be eigenvectors cj i 0 that satisfy Bcj 5ij but if the Jordan Normal Form is not purely diagonal these eigenvectors are too few to span the space then extra vectors have to be found to fill out the basis These extra vectors needed for an appropriate basis are called principal vectors or generalized eigenvectors For every JJ of dimension greater than 1 there is a principal vector ii that satisfies ij Bjdj cj For every JJ of dimension greater than 2 there is aprincipal vector ej that satisfies BejBJej dj And so on For example if 3100 1 0 0 0 BBIJ 0510 then c 0 d1 e 0 f 0 and BfBfe 00B1 0 0 1 0 000B 0 0 0 1 BeBedBdBdcandfinally BcBc Unlike eigenvectors whose directions are determined uniquely except when different Jordan blocks have the same eigenvalue principal vectors are intrinsically nonunique for example dc is a principal vector as well as d In fact for any polynomial TE with TE0 l the columns of P TEJ after the first provide another set of principal vectors for 151 J do you see why Their intrinsic nonuniqueness makes principal vectors sometimes more difficult to handle theoretically than the subspaces they span Partition the basis C into blocks of basis vectors corresponding to the Jordan blocks of C 1 BC thus C C1 C2 C3 CL so that BCJ CJBJI Jj This shows that the range of CJ is a subspace mapped to itselfby B it is called an Invariant Subspace of B although strictly speaking it is a subspace of the vector space B decomposes this vector space into an Irreducible sum of Invariant Subspaces Each such subspace spanned by those columns C of C that correspond to a Jordan block Bl1J Jj is mapped to itselfby B has zero intersection with all those other invariant subspaces and is irreducible cannot itself be decomposed into a sum of two invariant subspaces B s effect upon each invariant subspace is revealed completely by its corresponding Jordan block which reveals how this invariant subspace may contain a nested sequence of invariant subsubspaces as follows For simplicity suppose 15 appears in just one Jordan block 51 J and its dimension is m so Jm O i Jmi1 Then the invariant subspace corresponding to this block is determined uniquely as the mdimensional nullspace of B 7 BIm Within this invariant subspace if m gt 1 is another invariant subsubspace the m7ldimensional nullspace of B 7 Mfrquot1 And so on the innermost nonzero invariant subspace in the nest is the nullspace of B 7 Bl Prof W Kahan Page 15 Math H110 Jordan s Normal Form December 7 2000 1038 am Thus when no eigenvalue of B appears in more than one Jordan block all of B s invariant subspaces including the nested ones are determined uniquely and this proves that Jordan s Normal Form is unique except for the ordering of Jordan blocks along the diagonal Agreed But the Derogatory case when some eigenvalue 5 of B appears in more than one Jordan block is not so simple In this case B determines uniquely the invariant subspace associated with B it is the nullspace of B 7 BIk for all sufficiently large k Its further decomposition into a sum of irreducible invariant subspaces is an accident of a computational process not determined uniquely by B However B does determine the dimensions of its Jordan blocks uniquely as follows Check this by working with the Jordan Normal Form instead of B For m l 2 3 let nmB be the number of Jordan blocks of dimension m with B on their diagonals Finitely many nmB 0 And n1B n2B n3B NullityB 7 Bl Also n1B 2n2B 2n3B NullityB 7 BI2 from which follows that n1B 2 NullityB 7 51 7 NullityB 7 BI2 In a similar fashion for k l 2 3 n1B 2n2B 3n3B k71nk1B knkB knk H B NullityB 7 BIk which implies nmB 7NullityB 7 BIm 1 2NullityB 7 EU 7 NullityB 7 BImH for all m gt 0 Thus B determines the numbers of its Jordan blocks of all dimensions Exercise For any nontrivial projector P satisfying 0 P P2 I show that its Jordan blocks are all 1by1 Exercise First show that every complex square matrix B is Similar to its transpose by showing that they have the same Jordan Normal Form Then show that B must be a product of two complex symmetric matrices i e B QR 1 where complex Q QT and R RT Hint Reverse the order of invariant subspaces basis vectors The Real Jordan Normal Form For any given real nbyn matrix B there exists at least one real invertible matrix C that transforms B by Similarity into a diagonal sum E1 K1 0 O O 0 E2 K2 0 O CTIBC o 0 E3 K3 0 6 6 6 I ELQ39KL of Real Jordan Blocks each of the form E K where either 3 1 0 0 0 0 0 B 1 0 0 0 E K 151 J with areal eigenvalue 5 of B like this 6by6 example 3 g E 13 1 g 0 0 0 0 B 1 or 0 0 0 0 0 0 E K BI uS J2 for a pair of complex conjugate eigenvalues B i iu of B u i 0 and S 7ST is a diagonal sum of 2by2 skewsymmetric matrices that satisfies 82 7I Here is a 6by6 example Prof W Kahan Page 16 Math H110 Jordan s Normal Form December 7 2000 1038 am 3111000 711130100 EKBISJ2 OOBHIO ll 007111301 00001311 000071113 Such a block has two equally repeated complex conjugate eigenvalues B i 1M and only two complex conjugate eigenvectors regardless of its dimension which is always even Every eigenvalue of B appears in at least one Jordan Block and these blocks can appear in any order and their various dimensions add up to the dimension of B which determines its Jordan blocks completely except for the order in which they appear and the signs of their imaginary parts u The proof that such a real Jordan Normal Form exists is more complicated than the proof for the complex case but no more illuminating so it is not presented here Exercise First show that every real square matrix B is Similar to its transpose by showing that they have the same Real Jordan Normal Form Then show that B must be a product of two real symmetric matrices of which at least one is invertible i e B HY 1 where real H HT and Y YT Hint See the complex case A Rational Canonical Fonn Any monic polynomial WU k 7 ulknil 7 uzkniz 7 7 unilk 7 un is the Characteristic 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 Polynomial of its Companion Matrix Y 0 0 0 0 0 0 In fact 0 0 0 0 0 1 n n71 n72 n73 2 1 1 is the Minimum polynomial of Y because Nullity7 17 Y S l Do you see why Conversely for any given square matrix B there exists at least one invertible matrix C that Y1 O O O 0 Y2 O O transforms B by Similarity intoadiagonal sum C IBC O O y O of companion O O O YL matrices Y1 is the companion of the minimum polynomial of B and every YJH is the companion of a divisor of the minimum polynomial of YJ Although C may be computed from the elements of B in nitely many rational arithmetic operations the diagonal sum and C are discontinuous functions of the elements of B worse C is violently discontinuous 1n the 19405 an algorithm called Danilewski s Method used to be recommended as a fast way to compute C and the diagonal sum C IBC but the method is unreliable for any but a smalldimensioned matrix B whenever it is nearly derogatory even when Y1 is not bad subsequent companion matrices Yj1 need not be companions of Prof W Kahan Page 17 Math H110 Jordan s Normal Form December 7 2000 1038 am divisors of minimum polynomials of previous Yj s Hardly any current texts describe the method so a succinct description is presented here The method consists of a finite sequence of elementary rational Similarities each is an elementary column operation followed by its inverse operation applied to rows For j 1 2 3 in turn 1n row j find the afterdiagonal element of largest magnitude and swap that elem ent s column with column j1 39 then swap the correspondingly numbered rows Divide column j1 by its element in row j and then multiply row j1 by that number unless it is zero in which case stop before the division Otherwise the element in column j1 of row j is now 1 Subtract multiples of column j1 from all other columns to annihilate their elements in row j 39 then add the same multiples of all other rows to row j1 The process stops either because the Similarities have reduced B to the form of a companion matrix Y or because they have reduced B to the form Y O in which Y is a companion matrix but B is probably not and R B R is probably nonzero Skipping the last row of Y O and resuming the process from the first row of R may or may not reduce B to companion form but will probably not annihilate R Lemma elseq offers a way in principle to get rid of R when T and B turn out to have disjoint spectra39 otherwise deem Danilewski s method to have failed because of an unlucky initial ordering of the rows and columns of B The method may succeed if restarted after a shuffle of B s rows and corresponding columns has put a different diagonal element into the upper lefthand corner of B Even if the method would succeed in exact rational arithmetic it may give poor results in rounded arithmetic if the multiples of column j1 subtracted from previous columns have to be huge multiples that amplify the effect of roundoff Nobody knows a good way to compute the rational canonical form in rounded arithmetic without computing Jordan s Canonical Form first The SouriauFrameFaddeev Method For a square matrix B of integers can its characteristic polynomial va detOinB EOSJgl fJN be computed from the elements of B by exclusively integer arithmetic If by exclusively integer arithmetic is meant only adds subtracts and multiplies no divides then all known methods require work that grows exponentially with the dimension of B But if exclusively integer arithmetic lets divisions by small integers produce integer quotients there is a faster method traceable to UJJ Leverrier in the midnineteenth century and subsequently improved independently by JM Souriau JS Frame and DK Faddeev acentury later This method requires a number of arithmetic operations of order dimensionB4 Approximate methods that compute eigenvalues rst go faster for strictly numerical matrices of large dimensions taking a number of arithmetic operations of order dimensionB3 Therefore this method serves mainly for computations perhaps symbolic that cannot tolerate roundoff Here is how it goes Set B1 B and for j 1 2 3 n dimensionB in turn compute fJ TraceBJj the division goes evenly BJH BBJ 7 fJI Don t bother to compute Bn As a check on the computation expect Bn fnl This vanishes when B is singular and then its Adjugate AdjB iln 1 Bnil 7 fnill Exercise Validate the foregoing claims by observing the form Bj takes if B is the companion matrix of f I doubt the existence of an analogous method to compute B s minimum polynomial Prof W Kahan Page 18 Math H110 NORMlite November 14 1999 550 pm Notes on Vector and Matrix Norms These notes survey most important properties of norms for vectors and for linear maps from one vector space to another and of maps norms induce between a vector space and its dual space Dual Spaces and Transposes of Vectors Along with any space of real vectors x comes its dual space of linear functionals wT The representation of x by a column vector X determined by a coordinate system or Basis is accompanied by a corresponding way to represent functionals wT by row vectors wT so that always WTX wa A change of coordinate system will change the representations of x and wT from X and wT to X CTIX and WT wTC for some suitable nonsingular matrix C keeping WTX WTX But between vectors x and functionals wT no relationship analogous to the relationship between a column X and the row XT that is its transpose necessarily eXists Relationships can be invented so can any arbitrary maps between one vector space and another For eXample given a coordinate system we can de ne a functional XT for every vector x by choosing arbitrarily a nonsingular matriX T and letting XT be the functional represented by the row TXT in the given coordinate system This de nes a linear map XT Tx from the space of vectors x to its dual space but whatever change of coordinates replaces column vector X by x CTIX must replace TXT by TXT TXTC TCXTC to get the same functional XT The last equations can hold for all X only if T CTTC In other words the linear map Tx defined by the matriX T in one coordinate system must be defined by T CTTC in the other This relationship between T and T is called Congruence Sylvester s word for it Evidently matrices congruent to the same matriX are congruent to each other can all matrices congruent to a given matriX T be recognized Only if T TT is real and symmetric does this question have a simple answer it is Sylvester s Law of Inertia treated elsewhere in this course The usual notation for compleX vector spaces differs slightly from the notation for real spaces Linear functionals are written WH or w instead of wT and row vectors are written wH or w to denote the compleX conjugate transpose of column w instead of merely its transpose wT Matlab uses w 39 for wT and w for w We ll use the w notation because it is older and more widespread than wH MatriX T is congruent to CTC whenever C is any invertible matriX and C is its compleX conjugate transpose Most theorems are the same for compleX as for real spaces for instance Sylvester s Law of Inertia holds for congruences among compleX Hermitz39an matrices T T as well as real symmetric Because many proofs are simpler for real spaces we shall stay mostly with them Not all maps from a vector space to its dual have been found useful some useful maps are not linear Among the most useful maps linear and nonlinear are the ones derived from the metrics or norms associated with different notions of length in vector spaces Applications of these norms and of their derived maps are the subject matter of the following notes Prof W Kahan Page 121 Math H110 NORMlite November 14 1999 550 pm Norms A norm is a scalar function HxH de ned for every vector x in some vector space real or complex and possessing the following three characteristic properties of length Positivity 0 lt HxH lt 00 except that HoH 0 Homogeneity HWxH W HxH for every scalar W Triangle inequality Hx yH S HyH Equality need not imply parallelism Exercise 1 Prove l HwixH iHyizH l S Hwin HxizH for any w x y z inanormed space Three examples of norms de ned on the space of column vectors x with elements E1 E2 Er1 are Hpr 2k lEklp p for p l or 2 and Hwa maxk lEkl Can you verify that these three H Hp are norms The triangle inequality is the hard part see below In this course we shall discuss mostly these three norms but there are lots of others Every nonsingular linear operator L converts one norm HxH into another norm HLxH Why nonsingular Also the maximum of two norms is a third and the sum of two norms is another Can you see why The Norm s Unitball S2 Every norm has its own Unit ball S2 de ned as the set of all vectors x with HxH S l Some writers use the words Unitsphere to mean what we call its boundary BS2 consisting of all the norm s unit vectors u with HuH l Our unit ball S2 turns out to be a bounded closed centrally symmetric convex body with an interior Bounded means for every x i 0 that Wx lies outside S2 for all W gt lHxH Closed means that S2 includes its boundary BS2 Centrally Symmetric means that if x lies in S2 so does Wx whenever W l for real vector spaces W i1 Convex means that if x and y both lie in S2 then so must the line segment traced by Wx lWy for 0 SW S 1 it s because HWx lWyH SWHxH lWHyH S 1 Interior to S2 is where 0 lies this means for every x and all nonzero W chosen with W tiny enough smaller than lHxH that Wx lies in S2 too Conversely given a bounded closed centrally symmetric convex body S2 with an interior a norm can be so de ned that S2 is its unitball In fact de ne HoH 0 and for nonzero vectors x de ne HxH to be that positive value of E that puts xE on the boundary BS2 Such a E must exist because xE lies interior to S2 for all E big enough and lies outside S2 for all E tiny enough Central symmetry implies homogeneity of Convexity of S2 implies the triangle inequality thus For any nonzero x and y we know that xHxH and yHyH both lie on BS2 Therefore WxHxH lWyHyH lies in S2 whenever 0 lt W lt l and surely lies there if W HxHHyHHxH whereupon H WxHxH lWyHyH H S l and the rest follows easily Unitballs can be very diverse For instance the unitballs S2p belonging to the norms H Hp defined above for p l 2 and 00 have very different shapes when the dimension n is large S200 has 2n facets and 2H vertices whereas S21 has 2n facets and 2n vertices and BS22 is a very smooth sphere in between To appreciate these shapes draw pictures of S2p for n 2 or 3 then S200 is a square or cube S21 is a diamond or octahedron and S22 is a circular disk or solid sphere respectively Shapes like these will predominate in the following notes Prof W Kahan Page 221 Math H110 NORMlite November 14 1999 550 pm Continuity and Topological Equivalence of Norms Despite the diverse shapes of unitballs all vector norms 11 11 have many common properties One is Continuity this is proved by substituting w y o in Exercise 1 above to deduce that 1 11x11711z11 1 S 11x7z11 This shows that 11x11 can be kept as close to 11z11 as one likes by keeping 11x7z11 small enough by keeping x in a suf ciently tiny ball shaped like 2 centered at z But if Q can be arbitrarily needlelike or arbitrarily attened why can t 11x11 change arbitrarily Violently when x changes arbitrarily little measured by some other metric That can happen in in nitedimensional spaces but not in a space of nite dimension n and here is why First choose any basis B b1 b2 bu and then substitute bj11bJ11 for every hi to force every 11bJ11 l In this basis the components E of any vector x 2J bJEg Bx form a column vector x De ne 1x100 1134x100 1x100 maxj lajl and 1x1111B 1x111x11zjlijl two new norms to compare with 11x11 112 bj j11 S 2 11bJ11 Ej11x111Sn11x11w Then 1 11x11711z11 1S11x7z11S11x7z111S n11x7z1100 which con rms that is a continuous function of the components E9 of x and of x in every basis If n were in nite 11x11 might change arbitrarily Violently even though every change in every component E9 of x is arbitrarily tiny Because every x satis es 11x11 S 11x111 Sn11x11w the unitball Q of 11x11 contains the unitball 21 B91 of 11x111 and 21 contains ln 200 afractional copy ofthe unitball QM BS200 of 11x1100 Can you see why This phenomenon is typical given any two norms for a nite dimensional vector space some small positive multiple of either s unitball always ts inside the other Here is why For any two norms and let s consider the quotient 11x111x1 As x runs through all nonzero vectors this quotient sweeps through a range of positive values which is the same range as 11u111u1 sweeps out while u x11x1100 runs through all unit vectors on 390 Every such u Bu for a unit column u on 390 and Viceversa so the range in question is swept out by the quotient 11Bu111Bu1 while u runs over all of QM Two paragraphs ago we saw why 11Bu11 must be a continuous function of u and the same goes for 1Bu1 and since both norms are positive their quotient is continuous too Boundary 390 is a closed bounded set in a nite dimensional space so every continuous function thereon achieves its maximum and minimum values somewhere on 390 in particular the quotient s maximum M 11Bu111 B111 gt 0 and minimum LL 11Bu111 B111 gt 0 are achieved respectively at some unit columns f1 and 1391 not determined uniquely Therefore 0 lt LL S 11x111x1 SM for all x i o and each S sign turns into for some x These inequalities tell us something geometrical about the norms unit balls 2 for 11 11 and Q for you should con rm now that M9 barely contains Q which barely contains HQ Here barely means boundaries touch The foregoing paragraph is important for two reasons First its style of reasoning will recur Second it shows that all nitedimensional vector norms are Topologically Equivalent if an in nite sequence of vectors converges when distance from its limit is measured in one norm then convergence occurs no matter what norm is used to measure distance Do you see why Different norms defined for an infinitedimensional vector space do not have to be Topologically Equivalent Prof W Kahan Page 321 Math H110 NORMlite November 14 1999 550 pm Lagrange s Identities and Cauchy s and Holder s Inequalities These are stated and proved here for columns w wj and x E9 of complex numbers of which and EJ are their complex conjugates and ijj 1wJ12 and jig EJ2 their squared magnitudes First a generalization of HxHZHwHZ71xw12 Hrrwa2 from Euclidean 3space is Lagrange s Identity ww xx 71wx1 23211qu 7 wkEJ Z2 It is proved by expanding the doublesum s 112 and collecting terms In matrix terms it says ww xx 7 1wx12 Tracewa 7 wawa 7 wa 2 Another version more natural is ww xx 7 Rewx2 Tracewx 7 xwwx 7 xw2 Since TraceMM is the sum of squared magnitudes of any matrix M s elements it must be nonnegative whence follows Cauchy s Inequality 1wx1 S Vww xx HwHz HxHZ This becomes equality only if w or x is a scalar multiple of the other Cauch s Inequality implies and can be proved equivalent to the triangle inequality for HxHZ xx because Hsz 11X1122 W X1122 2HW112HXHr ReWX 2 211W112HX112 1WXD 2 0 Note the implicit de nition of HwHz HwHZ Ww kw here It is a special case We shall see that other norms HwH of rows are not computed from the same formulas HwH as for columns Angle arccos Rewx1 1wHHxH between vectors w and x an a Euclidean or Unitary space is real because of Cauchy s Inequality which was proved by HA Schwarz for integrals as well as sums and was discovered also by Bunyakovsky39 all three names get attached to it Analogous inequalities apply to H Hp for every p 2 1 39 its triangle inequality is also called Minkowski s Inequality and its analog of Cauchy s Inequality is called Holder s Inequality 1wx1 S HwHpHpr HquHpr for q 1 1p7121 Note that the formula to compute HwHp from row w is not the same as the formula to compute Hpr from column w unless q p 2 39 see below Class notes on Jensen s Inequality or texts about Normed Spaces or Functional Analysis or Inequalities supply proofs of Minkowski s and Holder s inequalities either of which can be deduced from the other Neither will figure much in these notes for p and q different from 1 2 and co The Dual Norm Given a norm HxH for a real space of vectors x its Dual Norm is another norm induced over the dual space of linear functionals wT thus HWTH max 1wa1 max wa maximized over all x in Q For complex spaces Hw kH max 1wx1 max Rewx over all x in Q Please do verify that these de nitions have all three of the characteristic properties norms must have and that max 1wx1 really equals max Rewx IT S IMPORTANT Provided the vector space has nite dimension the asserted maxima exist because they are maxima of continuous functions of x over a closed bounded region Q but no simple formula for HWTH need exist Fortunately a simple formula does exist for the norms dual to HHp defined above Let row wT w1 w2 wn then HwTHp turns out to bejust Hqu with q11p1 for every p 2 1 though we care about only p 1 2 and co In these cases observe that HwTHp max wa over all x with Hpr S 1 max wTu over all u with Hqu 1 maxk 1wk1 when p 1 You can verify this easily so do so 2k 1wk1 when p oo You can verify this easily so do so V 2k 1wk12 when p 2 You can verify this easily so do so Prof W Kahan Page 421 Math H110 NORMlite November 14 1999 550 pm The case p 2 follows from Cauchy s Inequality which is the special case for H HZ ofwhat is called for norms in general T Holder s Inequality w x S HWTH HxH for all wT and x This follows immediately from the de nition of the dual norm HWTH Moreover we may verify easily and you should do so for all three norms HHp that Hpr max wa over all wT with HWTHp S l which suggests that the relationship between dual norms H H and H TH may be symmetrical The last relation is true not just for H Hp and H Tl H but for all pairs of dual norms though the general proof must be postponed until the HahnBanach theorem has been presented Suppo rtPlanes Now some geometrical properties of dual norms can be described They will be described for real 2 and 3dimensional spaces though analogous descriptions apply to all finite dimensions Let 11T be an arbitrarily chosen unitfunctional HuTH l Let V be aunitVector HvH l on the boundary BS2 that maximizes llTV HuTH HvH l max uTx over all x with 1 For each constant 7 the equation uTx 7 describes a line or plane in the space of vectors x Corresponding to different values 7 are different members of a family of parallel lines or planes Two of those lines or planes touch the unitball S2 and sandwich it between them their equations are uTx i1 To con rm that S2 lies between them observe for every ix in S2 that uTix S HuTH HixH S l so 1 S uTx S l And to verify that those two lines or planes touch BS2 note that uTi39V i1 each of them touches BS2 but not the interior of S2 The line or plane whose equation is uTx i1 is said to support S2 at i V respectively it is tangent to BS2 there only if V is not avertex corner of S2 Thus the association of S2 s supportlines or supportplanes with their points of contact can be Viewed as an association of unitfunctionals 11T with unitvectors V on BS2 This association is onetoone only if S2 is rotund which means smooth no vertices nor edges and strictly convex no facets nor edges otherwise V cannot determine 11T uniquely at edges or vertices and 11T cannot determine V uniquely at edges or facets of BS2 as these diagrams show 1 T uTX1 Hle HVzH u x 1 V HVH 1 V1 V2 39 S2 uTxl T T 7 u x l u H 1 HuTH HuTH 1 First choose uT then find v First choose v then find uT Prof W Kahan Page 521 Math H110 NORMlite November 14 1999 550 pm One of these diagrams takes for granted something unobVious that requires proof In the first diagram an arbitrary unitfunctional 11T is chosen rst so HuTH l and then at least one unitvector V on BS2 is found to maximize llTV HvH l All other vectors x must satisfy uTx S 1le in other words the interior of S2 lies entirely on one side of the supportline or supportplane whose equation is uTx l and which touches BS2 at the points V thus found For the second diagram an arbitrary unitvector V with HvH l is chosen first on BS2 and then at least one unitfunctional 11T so HuTH l is found to maximize llTV max WTV over all wT with HWTH l this maximum llTV S HuTHHvll l But the diagram assumes llTV l Why isn t llTV lt l The dotted line shows what would happen were the maximized uTv lt 1 The supportline or supportplane whose equation is u x 1 would touch BS2 elsewhere than at v What seems so obvious in the diagram namely that every v on the boundary BS2 is a point of contact with at least one of S2 5 support planes needs a proof and it is difficult enough to deserve being named after the people who first got it right in the late 19205 The Hahn Banadl Theorem HvH max WTV over all wT with HWTH l in other words every point on BS2 is touched by at least one supportline or supportplane of S2 Proof Since this max wT V S max HWTH HvH HvH the proofmerely requires the construction of a maximizing unitfunctional 11T with HuTH l and llTV HvH No generality is lost by assuming HvH l The construction proceeds through a sequence of subspaces of ever greater dimensions The first subspace is ldimensional consisting of scalar multiples uV of V On this subspace uTuV u follows from an initial assignment llTV HvH 1 consistent with the requirement that HuTH l Subsequent subspaces will be spanned by more leading elements of an arbitrary basis V b2 b3 while the components llTV l uTb2 uTb3 of 11T for that basis are determined in turn until the definition of 11T extends over the whole space Suppose 11T has been de ned upon a subspace S that includes V and luTxl S 1le for every x in S as well as llTV HvH l If S is not yet the whole vector space there must be some nonzero vector b not in S Our first task is to choose uTb B without changing uTS in such a way that luTsbl S HsbH for every s in S We already know uTs7t S HsitH for all s and t in S and this implies uTs7t S HsbitbH S HsbH HtbH which implies in turn that illtbH 7 uTt S HsbH 7 uTs Therefore the least upper bound of the last inequality s lefthand side cannot exceed the greatest lower bound of its righthand side as s and t run independently through S Any number 15 between those bounds must satisfy illtbH iuTt SBS HsbH iuTs for every s and t in S This choice for uTb B ensures that illtbH S uTt B uTtb and uTsb uTs B S HsbH for every s and t in S which boils down to luTsbl S HsbH for every s in S as desired For every x s uh in the bigger subspace s uh we nd luTxl lMHuTsu bl s M 39Hsu bH 1le again Prof W Kahan Page 621 Math H110 NORMlite November 14 1999 550 pm Thus can the components llTV uTb2 uTb3 of 11T be chosen successively until all of them have been so de ned that luTxl S 1le for every x in the whole vector space and llTV HVH which is What the theorem asserts End of proof The assertion just proved is a special case attributed to Minkowski that conveys the essence of the Hahn Banach theorem which is usually stated in a more general way valid for infinitedim ensional spaces and for other convex bodies besides unitballs of norms The theorem was first proved only for real vector spaces Bohnenblust and Sobczyk proved its validity for complex spaces too in 1938 The following simplification of their approach began to appear in texts like W Rudin sReal and Complex Analysis 2d ed 1974 McGrawHill in the 19705 The norm on the dual Z of a complex normed vector space Z was de ned above to be HWH max lwzl max Rewz over all z in Z with HzH l Did you verify this Now given any nonzero complex vector t in Z we shall prove that HtH max lwtl max Rewt over all w in Z with HWH l Proof The complex version shall be deduced from the real by associating the complex spaces Z and Z with real counterparts Z and Z T that have respectively the same vectors and functionals with the same norms Begin by choosing any basis for Z that has t among the basis vectors The set of all real linear combinations of these basis vectors multiplying them by only real scalar coefficients constitutes a real vector space X with t among its vectors Each 1 in Z is a linear combination of basis vectors with complex coefficients whose real and imaginary parts taken separately decompose 1 into 1 x 1y where 1 71 and x and y come from X and are determined uniquely by z This decomposition associates each 1 in Z with Z x y in areal space Z of pairs of vectors from X 39 real Z has twice the dimension of complex Z and inherits its norm thus 1le HzH And Z inherits t39 0 Although Z and Z seem to have the same vectors under different names the spaces are different because multiplying a vector in Z by a complex scalar multiplies the vector s associate in Z by a linear operator not a real scalar B rux 1y in Z associates with Bxiuy39 uxBy in Z What about dual spaces Space XT dual to X consists of realvalued linear functionals obtained by decomposing complex linear functionals from Z thus Applying each c in Z to any x in X defines aTx Recx and bTx 71mcx 39 that aT and bT really are linear functionals in XT is easy to verify Conversely every two members aT and bT of XT determine a linear functional c aTi 1bT in 2 whose value at any 1 x 1y in Z is cz aTxbTy1aTyibTx These formulas also associate ET aT bT in ZT with each c in 2 thus ETZ Recz aTxbTy Note that 1mcz 7Re1cz 7Rec1z icTs where s in Z is the associate of 11 in Z Conversely each 3T aT bT in Z T determines c in Z from the preceding formula for cz The same result can be obtained without decomposing azT from a definition Recz GTE and an identity cz Recz 7 1Rec1z 39 this identity requires ET to be applied twice first to the real associate z of z and second to the real associate of 11 Finally HQTH mameH T matzH1Recz HcH Strictly speaking spaces X and XT are extraneous introduced here only in the hope that they help make the relationship between Z and Z easier to understand by making it more concrete This relationship amounts to two onetoone invertible and normpreserving maps one map between all of complex space Z and all of real space Z the other map between their dual spaces such that Tz Recz Back to the proof of the complex HahnBanach theorem Its real version provides at least one real ET in Z T to satisfy STE HM and lrcTzl S Hall for every 2 in Z so HLETH 1 The associated c in 2 has HcH HLETH 1 and Rect T HftH HtH 39 moreover 1mct 0 because otherwise lctl l HtH 1mct1l would exceed HtH contradicting HcH 1 Therefore ct HtH mawaH1Rewt as claimed End ofproof Prof W Kahan Page 721 Math H110 NORMlite November 14 1999 550 pm Duality 0r Polarity with respect to the Norm Analogous to the involutory selfinverting map between columns and rows effected by complex conjugate transposition is a map between any normed space Z of vectors z and its dual space Z of functionals w inspired by the symmetry we have just established in the formulas HwH matzH1 Rewz and HzH maxHWH1 Rewz The constraints l are inessential in these formulas which can be rewritten in the equivalent forms HWquotH maXz o ReWZHZH and HZH man o ReWZHWH to show how each nonzero w determines at least one maximizing direction z and each nonzero z determines at least one maximizing direction w When this maximization occurs nonzero lengths can be assigned to satisfy the DualityEquations wz HwHHzH and HwH HzH 0 and then w and z are said to be Dual with respect to the norm This kind of duality is also called Polarity sometimes These duality equations determine either w or z as a generally nonlinear function of the other and not always uniquely for instance given a nonzero w choose any unitvector u that maximizes Rewu to determine z 1WHu Examples First for p 2 then for p l and then for p oo we shall see how given either of w and z to determine the other so that they will be dual with respect to the norm HHp In all cases the column vector z has components C1 C2 C3 and the row w has components 61 62 53 where is the complex conjugate of DJ For p 2 erH2 sz mj2 and Hsz sz H2 duals have m gj For p l HwHl maxj DJ and HzHl Ej duals have either DJ HzHl J CJ whenever C t 0 and otherwise any DJ S HzHl will do or C 0 unless DJ HwHl and then any Cjoj 2 0 With 2 Cjoj 1 Will do For poo swap w and z inthe case p l The cases p l and p oo like the two diagrams earlier illustrate how neither z nor w need determine its dual uniquely if the unitball has vertices or edges or at facets Exercise 2 Verify that the Duality Equations are satisfied by the alleged dual pairs w and 2 defined above Exercise 3 Tabulate for p and q taking values 1 2 and co independently upq meitzHpHzHq as 2 runs over all nonzero complex ndimensional column vectors Exercize 4 Two given norms H H and H on a finitedimensional vector space induce norms H H and respectively on the dual space explain why maxz HzH z maxw 0 w 1lwll Exercise 5 Given one norm H H and an invertible linear operator R define a new norm z HRzH for all vectors 1 in some space How is related to H H on the dual space Given also nonzero z and w dual with respect to H H find a nonzero pair dual with respect to Exercise 6 Explain why the set of vectors 1 dual to a given nonzero functional w must constitute a convex set Exercise 7 Show that HzuvH 2 HzH for all scalars u ifand only if wv0 fora w dual to 1 Then v is called orthogonal to Z In that case must 1 be orthogonal to v 7 Justify your answer This exercise is hard Prof W Kahan Page 821 Math H110 NORMlite November 14 1999 550 pm Duality 0r Polarity and the Derivative of 3 Norm The nonuniqueness in the determination of either of a pair of dual vectors by the other and of either of a support lineplane and its point of contact by the other is a complication that af icts notation terminology and proofs to the detriment of pedagogy Let s leave these complications to courses more advanced than this one see texts about convexity and convex bodies For the sake of simplicity in our discussion of differentiability we shall assume a real vector space of finite dimension with a rotund unitball Q If it is not rotund Q can be made that way by slightly rounding its vertices and edges and slightly puffing out its facets just as perfect cubes must be rounded off a little to make a usable pair of dice Then the linesplanes that support 2 can be regarded all as tangents each one touching BS2 at just one point A tangent s orientation and its point of contact determine each other uniquely and continuously each as a function of the other since an analytical proof of this claim is too tedious an argument about compact sets to reproduce here its confirmation is left to courses and texts about real analysis and convex bodies or to the reader s geometrical intuition about smooth rotund bodies The continuous onetoone association between tangents to rotund BS2 and their points of tangency between dual pairs of unitfunctionals uT and unitvectors V extends to all pairs of dual functionals wT and vectors x satisfying the DualityEquations wa HWTHHxH and HWTH HxH because these now determine either wT or x uniquely from the other Can you see how The importance of this duality to applications of norms can be gauged from the fact that HxH is now differentiable if nonzero and its derivative involves the functional wT dual to x thus de11 wdeHxH To see why this formula is valid fix nonzero vectors x and h arbitrarily and for any real scalar 7v let WT be dual to X7vh so WT Xvh HWTAHHx vh and HWTAH Myrrth The rotundity of 2 implies WTA 7gt wT as 7 7gt 0 For all 7 0 Holder s inequality says 11th 7 Hxm s wTi 111wawa WTMX39khWWTAH 2W1 1111me and similarly with 7 in place of 7x Hence if 7 gt 0 WTikhHWTJLH s 11xth 7 Hxm yak SWTAhHWTAH Letting 7 7gt 0 implies wThHxH wThHWTH de7thHd7v at 7x 0 Since h is arbitrary this confirms that deH wdeHxH Derivatives and Gradients are often mixed up Let fx be any differentiable real scalar function de ned over a space of real vectors x The derivative f 39x belongs to the dual space because scalar dfx f39xdx The gradient Grad fx is a vector in the same space as x not in its dual space de ned to have the direction in which fx increases most rapidly and a norm equal to that rate of change More precisely Grad fx is parallel to the unitvector u that maximizes dfXvlld7v at 7x 0 and this maximum equals HGrad fxH Exercise 8 Show why Grad fx is the vector dual to f 39x with respect to the norm Therefore Grad HxH xHxH And when x runs over a Euclidean space Grad fx f xT Prof W Kahan Page 921 Math H110 NORMlite November 14 1999 550 pm Auerbach s Theorem for a Real Rotund Unit ball Q It can be circumscribed by at least one parallelepiped whose faces support 2 at their midpoints Proof Of all the parallelepipeds circumscribing barely containing Q the ones with minimum content area or volume hashave the desired supporting properties as we shall see after nding one Others have the desired properties too but one is all we need Choose an arbitrary basis in which vectors c can be represented by column vectors c and functionals rT by row vectors rT ofcourse HcH Hell and HrTH HrTH For any square matrix C c1 c2 cn whose columns all have length ch l 1 there is a square matrix R T whose rows r J are dual to the corresponding columns cJ of C this means every rTj cJ l and HrTJH HcJH l Also unless detC 0 there is an inverse matrix Q CT1 whose rows qTi satisfy qTi J 1 if i j else 0 Now choose C to maximize detC The maximum is achieved because it is the maximum of a continuous function det over a closed bounded set of columns all of length l Varying columns cJ in nitesimally by dc causes log detC to vary according to Jacobi s formula for a determinant s derivative by d log detC TraceC 1 dC 2J qTJ ch Keeping chdch l constrains ch to satisfy dHcJH rTJ ch HcJH 0 Whenever every ch satis es this constraint 2 qTJ ch 0 because of the maximality of log detC Therefore every pair rTJ qu must be a pair of parallel rows and since rTj cJ qTJ cJ l we conclude that R Q C 1 row by row Now we revert from rows and columns to functionals and vectors and interpret geometrically the con guration that has been found A basis C 01 02 on of unit vectors cj and a dual basis R ofunit functionals rTi have been found to satisfy rTicj 1 if ij else 0 and HrTJH HcJH l The 2n equations rTix i1 are the equations of Zn hyperplanes that support 2 touching its boundary BS2 at 2n points ici respectively each exactly midway between all the other planes because rTJ ci 0 for all j i i Thus each contact point ici is the midpoint of a face of a parallelepiped 3900 that circumscribes Q that face lies in the T T plane 1 ix il between n71 pairs ofplanes 1 x i1 for j i End ofproof J Incidentally the content volume inside the circumscribing parallelepiped 899 can be shown to be 2quotdetR which is minimized when detR is maximized An argument very similar to the foregoing leads to the same conclusion R C 1 where C is the matrix whose columns are dual to the rows of R with respect to the norm An Auerbach Basis is a basis like C c1 02 on above of unit vectors each orthogonal in the sense of Exercise 7 to all the others Auerbach s theorem which has been proved also for nonrotund unitballs complex spaces and in nite dimensions ensures that at least one such C exists It associates every column nVector x with a vector x Cx for which Hwa Hwa is a new norm whose unitball s boundary BS2 is the theorem s circumscribing parallelepiped Athird norm is Hle Hle its unitball is the convex hull of all vectors icj EXCI Cise 9 Showthat Hlen S HXH00 S 1le S 11le S nHXH00 for every x See p 3 Prof W Kahan Page 1021 Math H110 NORMlite November 14 1999 550 pm Matrix Norms The trouble with matrix norms is their abundance They are so numerous that they overwhelm typography s capacity to differentiate among them without a plethora of subscripts andor nonce symbols However for two reasons only a few of those many matrix norms get much use Only a few matrix norms are easy to compute and another is not difficult and These can approximate the rest well enough for most practical purposes Justification for the last reason explains why a substantial fraction of the theory of matrix norms concerns the closeness with which one can approximate another It s a vast theory over owing with terms and concepts Only a few pages worth can be touched in these notes Auerbach s Theorem and Ex 9 have shown why with the choice of an apt Auerbach Basis any vector norm 1le for abstract ndimensional vectors x can be approximated within a factor n 2 by Hwa or Hle applied to the column x that represents x in the apt coordinate system Finding an apt basis is the costly part of this approximation Fortunately simply rescaling whatever basis is already in use this is tantamount to a change to more convenient units for variables often gets adequate approximations from simple norms like Hwa and Hle Since mbyn matrices L representing linear operators L in chosen bases for their domain and targetspace also constitute a vector space of dimension mn either the largest magnitude or the sum of magnitudes of mn linear combinations of the elements of L can approximate any norm HLH within a factor m n 2 according to Auerbach s Theorem Actually far less work suffices to approximate any HLH far better the more so as dimensions m and n get bigger This will be just one payoff from a relatively brief foray into the theory of matrix norms In this theory the norm function H H is heavily Overloaded this means that the function s definition and computation depend upon the Linguistic Type of its argument We have seen this already HrTH may be computed from the elements of the row rT representing functional rT very differently than is computed from the elements of the column x representing vector x Now another HLH will be computed from the matrix L representing linear operator L And if vector y Lx in the range of L is represented by a column y its elements may gure in the computation of HyH differently than column x gures in the computation of 1le for x in the domain of L even if its range and domain have the same dimension the two spaces can have different norms An assertion so simple as HLxH S HLHHxH is likely to involve three different norms and their computation from representing arrays Lx L and x will generally change when bases for the domain and target space of L change What a mess READERS BEWARE Always keep the context of s argument in mind to identify its linguistic type and hence determine which norm is at issue The mental burden imposed because the meaning of H depends upon the linguistic type of its argument has turned out to be less annoying than the notational clutter that used to be created when each linear space s norm had to have its own distinctive name Some unavoidable clutter will af ict these notes whenever different norms for the same linear space are compared Ha The oldest matrix norm is the Frobenius Norm HBHF W traceBTB Ed 23 bijz For complex matrices B replace bij2 by bij2 Since HBHF is the Euclidean norm applied to a Prof W Kahan Page 1121 Math H110 NORMlite November 14 1999 550 pm column containing all the elements of B it satis es the three laws that every norm must satisfy Positivity co gt HBH gt 0 except HOH 0 Homogeneity HuBH HHHBH for every scalar u Triangle Inequality HBCH S HBH HCH Another property desirable for every matrix norm is possessed by the Frobenius norm Multiplicative Dominance HBCHF S HBHFHCHF for all multipliable B and C Proof Let biT denote row i of B and let cJ denote column j of C Then HBCHF2Zi2j biTcj2SZiZj biTbichcj by Cauchy s inequality ZibFbiidj ejTej HBHFZHCHFZ as claimed Exercise 10 Forarank 1 matrix bcT show that HbcTHFHszHcTHz HbHFHcTHF By analogy with the Frobenius norm s descent from the Euclidean vector norm two other matrix norms are the biggestmagnitude norm HBHM maxi maxj bij and the totalmagnitudes norm HBHZ Zizj bij However for dimensions 2 or greater Exercise 11 Show that HHz possesses Multiplicative Dominance but H HM does not Evaluate mech llB39CllzHBllz39HCllz and meech HBCHMHBHMHCHM and describe the maximizing matrices B and C Then show that HBHLl HBHMnumber of B s columns is a matrix norm that does possess Multiplicative Dominance The number of rows works too The four norms H HF HHz H HM and HHLl act upon matrices ofarbitrary dimensions nite in the case of H HLl This will be true of practically all the matrix norms discussed in these notes though a matrix norm can be de ned more narrowly to act only upon matrices of a specified shape andor dimensions In particular a norm H H that acts only upon square matrices ofa speci c dimension can always be scaled as was done in Ex 11 to possess Multiplicative Dominance thusly Find E maxBC O HBCHHBHHCH for square matrices B and C ofthat speci c dimension and then replace by HHg Consequently there is little incentive to study matrix norms lacking multiplicative dominance and we shall avoid most of them A property resembling multiplicative dominance is possessed by any matrix norm that is Subordinate to two vector norms in the sense that HLxH S HLHHxH for all matrices L and column vectors x of dimensions acceptable to their respective norms H is subordinate to HHZ Many writers prefer the phrase Compatible with instead of Subordinate to What happens next depends upon which came rst the vector norms or the matrix norm Prof W Kahan Page 1221 Math H110 NORMlite November 14 1999 550 pm Operator Norms Given two normed vector spaces with one norm 1le for vectors x in one space and another possibly different norm HyH for vectors y in the other space a norm HLH maxX 0 HLxHHxH is induced upon linear maps L from the rst space to the second and another perhaps fourth norm HKH maxy 0 HKyHHyH is induced upon linear maps K from the second to the rst These induced norms are called Operator Norms and Lub Norms after Least Upper Bound which replaces max when dimensions are in nite and Sup Norms likewise Exercise 12 Con rm that each operator norm obeys all norms three laws namely Positivity Homogeneity and the Triangle Inequality Con rm that as a linear map from a space to itself the identity I has operator norm HIH 1 and that therefore and from Ex 11 cannot be operator norms for dimensions 2by2 or greater What about Multiplicative Dominance Operator norms have a similar but subtly different multiplicative property whose description requires a third space of vectors z and its norm 1le Three operator norms are induced for maps from the rst to the second from the second to the third and from the rst to the third and three more are induced in the opposite directions All six are given the same heavily overloaded name H H If z By and y Cx then we nd Operator Norms are Multiplicative HBCH S HBHHCH Proof If RC 0 then HB39CH maXx o HB39C39XHHXH maXx oampy7cxHB39YHllYH39HC39XHHXH S maXx oampy oHB39YHllYH39HC39XHHXH HBH39HCH as claimed Note that this proof involves ve generally different norm functions namely HBCH HxH HByH HzH HCxH HyH lBH and HCH Somehow the grammer ofabstract linear algebra has ensured that the right ones were invoked Still when you use the inequality HBCH S HBH HCH you should check that the norm applied to the target space of C is the same as is applied to the domain of B just as you would check that their spaces dimensions match before multiplying Exercise 13 Show that HBHM maximaxj bij is the operator norm for matrices B construed as linear maps y Bx from a space of columns x normed by 11le to a space of columns y normed by Hwa so this operator norm is as multiplicative as are all operator norms though it lacks multiplicative dominance if the matched dimension is 2 or bigger Exercise 14 Explain why if a linear operator B maps a normed space to itself and has a compatible HBH lt l then 17B is invertible However if asquare matrix B maps one normed space to another with a different norm then 17B can be noninvertible despite that for a compatible matrix norm HBH lt l HIH give an example Hint 2by2 and Ex 13 Exercise 15 Con rm these equivalent expressions for an operator norm 7 7 7 T 7 T T HBH 7 maxvo HB39xHHXH 7 maxHxH1 HBxH 7 maxHxHHWTH1w M 7 mawa HW 39BHHW H Five different norms appear here identify each one s domain Con rm also that every operator TH norm satis es HxWTH Hw and identify the three norms domains Prof W Kahan Page 1321 Math H110 NORMlite November 14 1999 550 pm When a Matrix Norm is Given First Operator norms are induced by vector norms in the operators domain and target spaces This process is partly reversible Suppose B is a given matrix norm speci ed for square matrices B and possessing multiplicative dominance BC S B C for all square matrices B and C ofthe same dimension Choose a nonzero row rT ofthe same dimension and let HxH xrTl for column vectors x of the same dimension This vector norm induces a new operator norm X OBxrTlxrTl 13 Thus does a given multiplicatively dominant matrix norm induce a vector norm that induces an operator norm H H no greater than the given matrix norm no matter which 111311 I maXx o HB39XHHXH max arbitrarily xed row rT was chosen Consequently there is a sense in which operator norms are minimal among matrix norms possessing multiplicative dominance Exercise 16 Con rm that if I 1 above is already an operator norm then its induced operator norm Why is this equation violated when the given operator norm H HM Exercise 17 Describe the operator norm H H induced by l Then describe the operator norm induced by In each case the induced costs a lot more than its given to compute when the dimension is big Formulas for Familiar Operator Norms Each subscript p and q below will be chosen from the set 1 2 co to specify familiar vector norms like Hpr and HwTHq de ned in these notes early sections titled Norms and The Dual Norm These norms induced operator norms HBHpq maxX O HBprHxHq will be identi ed by pairs of subscripts except when the two subscripts are equal in which case we may sometimes abbreviate HBpr to HBHp at the risk of con icts with a few other writers who sometimes write HBHl for our HBHX HBHOQ for our HBHM and more often HBHZ for our HBHF introduced two pages ago Exercise 18 Con rm the following ve formulas in which matrix B has elements bij HBHOQ maxi 2J bij HBTHI the biggest rowsum ofmagnitudes HBHl maxj bij HBTH00 the biggest columnsum ofmagnitudes HBHZ the biggest singular value of B HBTHZ the biggest eigenvalue of 0 ET B 0 HBle maxi maxj bij the biggest magnitude Ex 13 HBsz maxi bij2 the biggest rowlength When dimensions are big the other four norms HBHpq get little use because they cost much too much to compute Do you see why HBHZ the biggest eigenvalue of BTB It costs a little too much too but has too many uses both in theory and practice to be disregarded Prof W Kahan Page 1421 Math H110 NORMlite November 14 1999 550 pm Maxima 0f Ratios of Matrix Norms When dimensions are not too big and sometimes even if they are different norms may approximate each other well enough that only the one easier to compute gets used Suppose two norms HxHa and Hbe are given for the same space ofvectors x Let uab maxX 0 HxHaHbe For Hle HxHZ and Hwa these ratios should already have been tabulated in Ex 3 in the section of these notes titled Duality or Polarity with respect to the Norm And Ex 4 there established that manr 0T HWTHaHWTHb uba for dual norms Of course uabuba 2 l There are analogous maxima for ratios of matrix norms Exercise 19 What are all six nontrivial maxima for ratios of matrix norms H HF H HZ and Yes the matrices dimensions will gure in some of those maxima Bounds for ratios of vector norms turn out to provide bounds also for ratios of induced operator norms Subscripts clutter the notation unavoidably We write HLHCd maxX 0 HLxHCHde and then uabcd maxL 0 HLHabHLHCd This can be found with the aid of Exs 4 and 15 llade Ina39XL O HLHabilI Hcd Mac db Proof Let L i 0 be a linear operator that maximizes HLHabHLHCd and then choose wT OT and x i 0 to maximize WT39L39XHWTHa39HXHb HLHab as Ex 15 permits It also provides T 0Tampz oyTLzHyTHcHzid 2 wTLxHwTHcHxid which implies that itade HLHabHLHCd swTLxiinHai1x1ibwTLxHwTHcHde L T T HW HcHW Ha39HXHdHXHb S Hac39Hdb To promote this inequality uabcd S Macudb up to equality we construct a maximizing L out llLllcd maXy ofa functional yT 0T chosen to maximize HyTHbHyTHd udb cf Ex 4 and a vector z 0 chosen to maximize HzHaHzHc Mac Let L zyT to find from Ex 15 again that Habcd 2 HLHabHLHcd 1lzlla39llyTHbHZHC39HYTHd Hac39Hdb End 0fPr00f Exercise 20 Use the six maxima upq tabulated in Ex 3 to tabulate all 72 nontrivial maxima uabcd of ratios of pairs of operator norms obtained when a b c and d range over the set 1 2 co This task is for a computer program that manipulates symbols like V5 Isometries An Isometry Q is a linear map from a normed vector space to itself that preserves the norm HQxH 1le for all vectors x The space s isometries form a Multiplicative Group because a product of isometries is an isometry Operator norm HLH is unchanged if L is postmultiplied by an isometry for the domain of L and or premultiplied by an isometry for its targetspace Prof W Kahan Page 1521 Math H110 NORMlite November 14 1999 550 pm For Hull and HHl the group is generated by all Permutations and all Sign Changers 7 diagonal matrices whose diagonal entries all have magnitude 1 7 so the group consists of all square matrices of the right dimension in whose every row and every column only one element is nonzero and its magnitude is l These spaces have 2nn real isometries of dimension n Those and infinitely many more belong to the group of isometries Q for H 112 these are the Orthogonal matrices QT Q 1 for real spaces Unitary matrices Q Q 1 for complex spaces Orthogonal matrices represent linear operators that rotate andor re ect real Euclidean space Proper Rotations are generated by either Q expS or Q IST1 17S as S 7ST runs through all real Skew Symmetric matrices Neither of these formulas for proper rotations Q each of which must have detQ 1 is fully satisfactory The formula Q expS is manytoone logexpS i S if HSHZ gt TE The Cayley Transform formula Q 1S 117S is onetoone because S 1Q 1IQ but cannot generate any proper rotation Q that has 71 as an eigenvalue necessarily of even multiplicity except by taking a limiting value as the elements of S approach infinities in suitably correlated ways A simple orthogonal re ection W WT W 1 I 7 wwT is determined by its mirrorplane T whose equation is w x 0 and whose normal w has been scaled to have length HwHZ V7 You should con rm easily that Ww 7w but Wx x if wTx 0 Numerical analysts call them Householder Re ections because Alston S Householder demonstrated their Virtues for solving LeastSquares problems on computers in the mid 1950s and then they became staples for eigenvalue and singular value computations too Every nbyn orthogonal matrix Q can be expressed as a product of at most n such re ections and an even number of them if Q is a proper rotation but the re ections in the product need not be determined uniquely by Q Any linear map L from one Euclidean space to another can be reduced to its unique canonical form by isometries in its domain and target spaces This canonical form of the matrix L is a similarly dimensioned perhaps not square diagonal matrix V of sorted nonnegative Singular Values satisfying L QVPT the Singular Value Decomposition in which Q and P are square orthogonal matrices not necessarily determined uniquely by L though its singular values on the diagonal of V are determined uniquely if sorted in descending order In a more compact SVD diagonal V is square of dimension r rankL so only the nonzero singular values of L QVPT appear and QTQ PTP I rbyr This compact SVD asserts algebraically a geometrical relationship called Autonne s Theorem Every linear map L of rank r from one Euclidean space to another is a Dilatation described as follows L selects r vectors columns of P from an orthonormal basis for DomainL and associates them onetoone with r vectors columns of Q constituting an orthonormal basis for RangeL then L projects its domain orthogonally onto its rdimensional subspace spanned by the selected r vectors stretches or squashes each of these coordinate directions by its corresponding singular value in V and copies the result onto RangeL after aligning the result s stretchedorsquashed coordinate directions along their associated coordinate directions in RangeL If L maps a Euclidean space to itself the last realignment amounts to a rotation Prof W Kahan Page 1621 Math H110 NORMlite November 14 1999 550 pm Fritz John s Ellipsoid Theorem His contribution to the 1948 CourantAnniversary Volume InterScience Wiley New York was a proof of a slightly more general statement than the following Theorem Any given centrally symmetric convex body 135 in nspace can be circumscribed by an ellipsoid 1E closely enough that WnB Q E Q B The constant Wn cannot be reduced without falsifying the theorem when E is a hypercube or more general parallelepiped Compare this constant with the bigger constant n in Auerbach s theorem where E is drawn from parallelepipeds and B can be the hyperactahedron which is the unit ball for the norm HHl that gures in our EX 9 Fritz John s theorem can be restated in norm terms by interpreting B as the unit ball of a given norm The restatement is Any norm H H in nspace can be approximated by HEle EXTEX closely enough ifmatriX E is chosen appropriately that 1N5 S HEXHZHXH S l for every vector X 7 o Fritz John s ellipsoid E X EXTEX S l E 1 y yTy S l E7192 is the unit ball for the vector norm HEle just as B X HXH S l is the unit ball for the given norm HXH His more general statement covered arbitrary conveX bodies B for which Wn was increased to n Restricting his theorem to centrally symmetric bodies simplifies its proof to fit with what has already been presented in class As he did we shall characterize E as the ellipsoid of least Content area volume circumscribing 1B3 Because Content1E Content 22ldetEl we seek as he did a matriX E that maXimizes detE subject to the constraint HEtzHXH S l for all X i o But our argument first presented in lecture notes for Math 273 in 1974 will go more directly than his did First observe that two matriX norms HZHZ maXX O HZXHZHXH and HZH2 maXX O HZXHHXHZ are induced by the two vector norms in question Now we seek a characterization of those matrices E that maXimize detE over the ball HEHZ S l in nbyn matriX space and hope to infer from that characterization that HEFIHQ S Wn which will imply WnB Q E Q B At least one maXimizing E must eXist because detE is a continuous function on a compact set the unit ball HEHZ S l in nbyn matriX space For such a maXimizing E we find that 11134112 maX HE4VH maX wTE lv over HwTH 11sz 1 and this maXimum is achieved at some wT and V determined here as in EX 15 to satisfy HwTH Nb 1 and HE IHQ wTE lv and WTE 1HE 1H2VT The last equation is satis ed because in order to achieve this maXimum wTE 1 and V must be dual to each other with respect to the norm H 112 Meanwhile because HwTH l every vector y has HyH 2 wTy wTEilEy HE 1H2VTEy This will be used twice in T below Prof W Kahan Page 1721 Math H110 NORMlite November 14 1999 550 pm Now let Z vaHE 1H2 7 EnB for any tiny B gt 0 and let fu logdetE HZ be examined at tiny values u gt 0 Jacobi s formula for the derivative of a determinant says that f 39u dfudu TraceE MZYIZ provided u is tiny enough that E uZ 1 still exists Therefore f390 TraceETIZ wTEilvHETIHQ 7 nnB BnB gt 0 Since E maximizes f0 logdetE subject to the constraint HEHZ S l it is violated by E Hz for every sufficiently tiny u gt 0 in other words HE uZHZ gt 1 for every sufficiently tiny u gt 0 For every such u some maximizing vector y yu exists with HyH l and ME sz2 71E HZHz239HYl12 gt M2 712113112 2 HEy1122 Rearranging this algebraically produces a strict inequality 0 lt HltE sz2 711Ey1122M 7 2EyTZy HHZYl122 7 2vTEywTyHE7IHQ 7 211Ey1122HHS wizny T 7 2WTyHE 1HQ2 7 leEszZIHB wizny T s 2HE 1H22 7 211Ey1122n15 HHZHzoz Combine this with another inequality HEYH2 21103 t HZYHz 7 HHZYllz gt HYH 7 HHZHz 17 HHZHz to infer that 0 lt 2HE 1H22 7 21 7 MHZHz2n HHZ1122 7 2HE 1H22 7 2nIs as u 7 0 1H22 TIHQ S Wn as we had hoped Consequently HE S n3 for every 15 gt 0 which proves HE Fritz John s Ellipsoid Theorem has farreaching implications here brie y are three of them Ellipsoidal Bounds for Errors in Computed Approximations A norm chosen to gauge computational errors should ideally have this property All errors of about the same norm are about equally inconsequential Such a norm may be difficult if not impossible to find and may be found only after at least one attempt at the computation has been tried But the norms implicit in many approximate computational algorithms typically resemble the vector norms H Hp discussed extensively in these notes their common characteristic is that the norm of perturbations of a vector are little changed by permutations which means that errors in one component of a vector will be deemed roughly as inconsequential as errors of the same size in any other component This can be a serious mistake For instance a mathematical model of the processes that control the growth of an elephant from a fertilized ovum will involve amounts of materials ranging from micrograms of hormones to tons of esh Were all these amounts reckoned in grams and then arranged in a column vector representing the state of the organism s development the vector s elements could range from 00000001 to 100000000 An error of the order of 10 committed during an equationsolving process could alter some elements imperceptibly and alter others overwhelmingly To govern numerical errors better and to bring variables closer to values humans can assess easily we choose different units 7 micrograms milligrams grams kilograms tonnes 7 for different variables And we may change units as the growth process passes through different phases like Prof W Kahan Page 1821 Math H110 NORMlite November 14 1999 550 pm implantation in the endometrium birth and maturation The choice of units is tantamount to premultiplying the statevector by a diagonal matrix of scale factors before applying a familiar HHpnorm to its perturbations Sometimes diagonal premultiplication is not general enough Sometimes later states of an evolving process respond to early perturbations far more severely in some directions than others and those directions need not be parallel to coordinate axes In such cases the severity of an early error x should be gauged by a norm 1le whose unit ball is squashed in some of those directions elongated in others According to Fritz John s theorem a premultiplying matrix E can be so chosen that 1le is approximated by HEtz to within a factor no worse than Dimensioni14 This much uncertainty about an error estimate is often tolerable provided the dimension of x is not too big The hard part is finding a satisfactory E Over the several decades since Fritz John s Theorem was published ellipsoidal errorbounds like HEtz have come to be appreciated for qualities not apparent from his theorem See lthttpwwwcsberkeleyedukaahanMath128Ellipsoipdfgt and ODEintVlpdfgt Uncertain Dynamic Systems by Fred C Schweppe 1973 PrenticeHall NJ The wrapping effect ellipsoid arithmetic stability and confidence regions by Arnold Neumaier pp 175190 in Computing Supplementum 9 1993 0 The Banach Space Projections Constant A Projection is a linear map P of a vector space into itself satisfying P2 P To avoid trivialities we assume also that I i P i O Then P cannot have an inverse otherwise it would imply I P71 P P71 P2 IP P so the range of P is the proper subspace onto which P projects the whole space LP is a projection onto a complementary subspace An example is P Note that P need not be an Orthogonal Projection Orthogonal projections are peculiar to Euclidean spaces and are special there too Projection P is orthogonaljust when P2 P PT and then HPHZ l This follows from observing that every eigenvalue of PTP P2 P is either 0 or 1 Any other projection Q onto the same subspace as RangeP must have Hle gt 1 This follows from equations QP P PT and PQ Q that say P and Q are projections each onto the other s range and from a change to 2 but Q into E with R O new orthonormal coordinates that transform P into I O I O A nonEuclidean normed space is called a Banach space after Stefan Banach who studied them intensively in the 1920s and 1930s until the Nazis overran Poland and killed as many of its intellectuals as they could before 1945 he outlasted the Nazis only to die later that year from lung cancer He had greatly advanced the study of in nitedimensional spaces In these notes all spaces dimensions are nite A Banach space s norm violates the Parallelogram Law HxyH2 HxinZ 211x112 2HyH2 for all x and y satis ed by Euclidean norms even if the coordinate system is not orthonormal Consequently Prof W Kahan Page 1921 Math H110 NORMlite November 14 1999 550 pm orthogonality is almost entirely absent from Banach spaces See Ex 7 for a feeble exception See the class notes on How to Recognize a Quadratic Form lt MathHl lOQFpdfgt for a proof that explains why only Euclidean norms honor the Parallelogram Law Each Banach space has a multiplicative operator norm induced by the vector norm and when computed for aprojection P P2 the norm must satisfy HPH HPle S HPH2 so HPH 2 l How much bigger than 1 must HPH be This question posed by Banach was rst answered in 1972 by Yehoram Gordon who established with the aid of Fritz John s Ellipsoid Theorem that any rdimensional subspace in any Banach space is the range of at least one projection P of rank r and norm HPH S V and no constant smaller than Vi can be valid for every r dimensional subspace of every Banach space Gordon s proof is too long to reproduce here 0 The Smallest Generalized Inverse Every possibly rectangular matrix F has at least one Generalized Inverse G satisfying the one equation FGF F that every generalized inverse must satisfy x Gy is a solution of the possibly over or underdetermined linear equation Fx y if a solution x exists and if not Gy is an approximate solution in some sense If F is rectangular or rank de cient it has in nitely many generalized inverses G almost all of which have enormous magnitudes HGH gauged by any norm This follows from the observation that GZ is another generalized inverse whenever Z satis es either FZ O or ZF O An oversized HGH induces severe numerical misbehavior because it ampli es small errors in y when Gy is computed none of its computed digits will contain useful information if HGH is too big There are extreme cases when every HGH is too big Every generalized inverse G of F must satisfy HGH 2 l minimum HAFH for which rankF7AF lt rankF in which the two matrix norms need only be compatible with the vector norms in the domain and target spaces of F The foregoing assertions are Lemma 1 and Theorem 5 in the class notes on Huge Generalized Inverses of RankDe cient Matrices lt MathHl lOGIlitepdfgt Theorem 8 stated but not proved in those notes asserts for the operator norms induced by the vector norms that at least one generalized inverse G also satis es HGH S WrankF minimum HAFH for which rankF7AF lt rankF My proof uses Fritz John s Ellipsoid Theorem but is still too long to reproduce here The foregoing two bounds upon HGH have valuable practical implications when the data in F are uncertain enough that some nearby FiAF of lower rank differs from F by less than its uncertainty Changing the data F to a nearly indistinguishable matrix FiAF of lowest rank may reduce the norm of its nearly minimal generalized inverse enough to forestall numerical obscurity If this can be accomplished we can accomplish it by means of a Singular Value Decomposition after applying whatever coordinate changes in the domain and target spaces of F are necessary to make the spaces norms approximately Euclidean Provided dimensions are not too big Fritz John s Ellipsoid Theorem says that these necessary coordinate changes exist without saying how to nd them Let s hope they amount only to diagonal scaling Prof W Kahan Page 2021 Math H110 NORMlite November 14 1999 550 pm More applications of Fritz John s Ellipsoid Theorem and another longer proof for its centrally symmetric case can be found in Keith Ball s lecture notes An Elementary Introduction to Modern Convex Geometry pp 158 of Flavors of Geometry MSRI Publications Volume 31 Edited by Silvio Levy for Cambridge University Press Cambridge 1997 Ball s notes are also posted at lthttpwwwmsriorgpublicationsbooksBook3l lesballpdfgt Don t read too much into the title s word Elementary Prof W Kahan Page 2121 Math H110 Jacobi s Formula for d detB October 26 1998 353 am Jacobi39s Formula for the Derivative ofa Determinant Jacobi s formula is d detB Trace AdjB dB in which AdjB is the Aaljugate of the square matrix B and dB is its differential This formula will be derived and then applied to the role of the Wronskian in the solution of linear differential equations the derivative of a simple eigenvalue and inverses of nearly singular matrices Certain definitions and formulas will be taken for granted Given an nbyn matrix B BU its Classical Aaljoint or better Aaljugate AdjB A ocij is defined thus xii li j det B without its jth row and ith column so that AB BA detBI In other words xii is the cofactor of SJi in detB so xii is a polynomial function of the elements of B but independent of SJk and Ski for all k and 2k 0ij Ski 2k Bjk xkj detB if i j but 0 otherwise RankB is the biggest dimension of a submatrix of B whose determinant is nonzero RankB r1 just when detB 0 in which case AdjB detBB 1 and AdjB i 0 just when RankB Znil If B is differentiable and RankB n then dB 1 7B 1dBB 1 Also needed is the function TraceB 2J BJJ and the fact that TracePQ TraceQP For proofs of these formulas see any text on matrices or linear algebra Proof of Jacobi s Formula In detB 2k Bik in each element BU of B appears linearly multiplied by its cofactor in so BdetBBl5ij in this leads quickly to Jacobi s Formula 1 2i 2i indBij Trace The Wronskian Consider square matrix solutions X E of a linear differential equation dXd l L l7 X with a piecewise continuous coefficient matrix L l7 Because L l7 is not assumed to commute with L9 when 9 17 ie L EL9 L9L E exploT L9 d9 need not be a solution X l7 of the differential equation None the less a linear differential equation for the Wronskian detXT can be found and solved to prove assertions about Fundamental Solutions found in many texts about differential equations Begin with the observation that solutions X l7 satisfy d detXd1 Trace Adj X dXd l Trace AdjX L X Trace X AdjX L detX TraceL which can be solved for detXT exp lOT TraceL9 d9 detX0 This implies that detXT is nonzero for all 17 if nonzero for any 17 Consequently X l7 is invertible for all 17 if invertible for any 17 in which case X l7 is called a Fundamental Solution since every vector solution x of dxd c Lx has the form x Xc for some constant vector c To verify this claim observe that Xilx must be constant because XdX 1xdt XdX 1dtx dxdt 7dXdtX 1x Lx iLXXilx Lx o Strictly speaking detXT is the Wronskian of the columns of X l7 Prof W Kahan Page 14 The Inmm an xxvna quotAnn1 ML 17mm an M1 1 n A Math H110 Jacobi s Formula for d detB October 26 1998 353 am Fundamental solutions X E of dXd c LX with their nonvanishing Wronskians figure in the solution of the nonhomogeneous linear differential equation dxd c Lx f by the Method of Variation of Parameters The idea is to substitute x Xp into the last differential equation and solve it for the parameter vector p This substitution yields dp l7d l7 X ETlf l7 whence follows x l7 X l7 c IT X9Tlf9 d9 for any constant vector c Exercise Given smooth scalar functions h l7 and g l7 the homogeneous secondorder differential equation yquot 7 hy39 7 gy 0 can in principle be solved for scalar solutions y l7 The Wronskian of any two solutions yl and y2 is WY1Ta Y2T i Y1TY239T Y139TY2T Show that W exp l h l7 117 constant Provided it is nonzero it appears as a divisor in expressions for solutions z l7 of zquot 7 hz39 7 gz e l7 obtained in textbooks by variation of parameters Find those expressions and rederive them after first converting the scalar second order differential equations to 2vector firstorder differential equations Adjugates of Singular Matrices Almost all nbyn matrices are nonsingular because the singular matrices B satisfying the equation detB 0 lie in a hypersurface of dimension n2 7 l in the n2dimensional space of nbyn matrices Among the singular matrices almost all have rank n7l the matrices of rank less than n7l lie in a hypersurface of dimension n2 7 4 embedded in the hypersurface of singular matrices To see why the foregoing assertions are true consider a matrix C of rank n 7 2 and suppose for the sake of simplicity that its first n 7 2 columns are linearly independent These columns can be embedded in a basis of n columns for the ndimensional space of ncolumns and then the last two columns of C can be any two columns whose expressions in that basis have zeros for their last two elements If any one of those four zero elements were replaced by a nonzero then RankC would increase to n l Thus the matrices C of rank n 7 2 or less must satisfy four equations in n2 unknowns whereas the matrices of rank n 7 l or less need satisfy just one equation det 0 It turns out that the n2 7 ldimensional hypersurface of singular matrices intersects itself in the n2 7 4dimensional hypersurface of matrices of rank n72 or less but we shall not prove nor need this Let RankB n7l so A AdjB i O But since BA O all the columns of A must lie in the onedimensional nullspace of B and similarly for the rows of A so A vuT i 0 where Bv o and uTB oT In other words almost all singular matrices B have adjugates of rank one Adj B vuT i O for suitable eigenvectors v and uT belonging to B s eigenvalue 0 The exceptional nbyn singular matrices B have rankB lt n7l and AdjB O Prof W Kahan Page 24 Math H110 Jacobi s Formula for d detB October 26 1998 353 am The Derivative of a Simple Eigenvalue Suppose B is a simple eigenvalue of a matrix B Replacing B by B 7 BI allows us to assume that B 0 for the sake of simplicity in other words we assume that 0 is a simple eigenvalue of B This means that detOLI 7 B vanishes when 0L 0 but because 0 is a simple eigenvalue the derivative d detOLI 7 Bd0L 0 when 0L 0 By Jacobi39s formula d detOLI 7 Bd0L TraceAdjOLI 7 BI so we infer that TraceAdjB 0 This implies that AdjB vuT is of rank one not zero with eigenvectors v and uT belonging to B s eigenvalue 0 and uTv TraceAdjB 0 In general if uT is a row eigenvector and v a column eigenvector belonging to the same simple eigenvalue 5 of a matrix then uTv 0 This is important because it allows us to compute the differential of this simple eigenvalue using its eigenvectors as follows 0 detBI B so differentiate this equation to get 0 d detBI B Trace Adj BI 7 BdB 7 dB Trace vuTdB 7 dB where Bv Bv and uTB BuT and uTv 0 uTv dB 7 uTdBv Therefore dB uTdBvuTv provides the derivative of a simple eigenvalue 5 of B The Derivative of the Adjugate We need the differential of A AdjB This is obtained easily when B is nonsingular in which case A B71 detB and since dBil 7B71dBBT1 it soon follows that d AdjB SB dB where for nbyn matrices Z SB Z TraceAdjB Z AdjB 7 Adj B Z AdjB detB provided detB 0 This formula is a little misleading because it involves division by detB and degenerates into 00 when B is singular In fact AdjB is a polynomial function of the elements of B so its differential d AdjB SB dB is also a polynomial function of the elements of B and linear in dB regardless of whether B is singular This polynomial derivative of the adjugate figures in the determinant s second differential d2detB d TraceAdjBdB Trace dAdjBdB Trace SB dB dB AdjBd2B and therefore figures also in the third term of the Taylor Series for any nbyn Z detB Z1 detB Trace AdjBZ 1 TraceSBZZ l722 Inverses of Nearly Singular Matrices Our final application of Jacobi s formula is a description of the behavior of inverses of matrices close to almost any singular matrix B In particular assume AdjB vuT i O as is so for almost all singular matrices B and let Z be a matrix with B s dimensions Now TraceAdjBZ uTZv is nonzero for almost all Z and for those Z detB Z17 7 uTZv c TraceSBZZ1722 Prof W Kahan Page 34 Math H110 Jacobi s Formula for d detB October 26 1998 353 am is nonzero for all suf ciently small nonzero 17 For those 17 and Z and B we nd that B Z17 1 AdjB Z17detB Z17 vuT S17 uTZv c TraceSZ l722 vuTT uTZv uTZvS 7 TraceSZ vuT2uTZv2 where S SB Z Thus as 17 approaches zero and B Z1 approaches almost any singular matrix B along almost any xed direction Z in matrix space B Z l7 1 approaches in nity along a xed direction parallel to vuT AdjB in matrix space in fact B Z071 is approximated by its leading term vuTT uTZv with ever smaller relative error as 17 approaches 0 and with absolute error B Z071 7 vuTT uTZv that approaches a constant SBZuTZv 7 TraceSBZZ2 VuTuTZV2 This last expression looks worse than it is but its reduction to a manageable form must be deferred to another time The important fact is that inversion maps almost all matrices in a suf ciently tiny ball centered about almost any singular matrix B to two narrow ngers reaching in from infinity in directions nearly parallel to known matrices iAdjB of rank 1 Why should we care One reason is that attempts to invert matrices B numerically usually yield approximations to B 1 that are instead very much like B Z071 for some tiny Z E comparable to roundoff in the elements of B If B is very nearly or exactly singular the computed inverse is recognizable as a matrix nearly proportional to AdjB of rank 1 the factor of proportionality is unpredictable except that it is huge bigger than l roundoff in B Another reason to care is that we often compute eigenvectors by solving a singular set of linear equations 51 Bv cs 0 for a nonzero vector v given a good approximation to the eigenvalue 5 In effect we solve BI Bv e for v with some tiny uncontrolled e comparable to roundoff The foregoing analysis tells us that the computed solution v is likely to point in the direction of the desired eigenvector unless B nearly has a nonsimple eigenvalue near 5 Likely usually most almost all nearly are weasel words that offend many a pure mathematician who prefers always every all and takes exactly for granted Applied mathematicians encounter situations often where the weasel words are the best that can be expected In fact the weasel words above are provably the best that can be expected in their contexts but that is a story for another day Prof W Kahan Page 44 Math H110 LeastSquares September 20 1998 LeastSquares Approximation and Bilinear Forms The Normal Equations Suppose a column vector g is given in an Euclidean space into which a given matrix F maps another real space of column vectors u In the Euclidean space length HgH Vng Usually matrix F is rectangular with more rows than columns Our task is to choose u to minimize HFu 7 gH which will then be the distance from g to RangeF This task is called Least Squares because the u we seek would minimize the sum of squares of the elements of Fu 7 g Differentiating the sum of squares produces d Fu 7 gTFu 7 g 2Fu gTF du which vanishes for all in nitesimal perturbations du if and only if Fu 7 gTF oT Do you see why Transposed this becomes the Normal Equations of the LeastSquares Problem FTFu 7 FTg u must satisfy them to minimize HFu 7 gH and so best approximate g by a vector in RangeF Do the Normal Equations have at least one solution u f1 If so is HF 7 gH S HFu 7 gH for all u ie does f1 minimize rather than maximize These questions among others are addressed in this note At rst sight one might think the Normal Equations solution should be f1 FTF 1FTg But this formula fails if the columns of F are linearly dependent To see why observe that Fz 7 o ltgt FzTFz 7 zTFTFz 7 0 ltgt FTFz 7 o so FTF is invertible nonsingular if and only if the columns of F are linearly independent This hasn t been assumed in fact matrix F could be rectangular with more columns than rows Exercise Show that if the rows of F are linearly independent a solution f1 FTFFT 1 g and if not the only solution of the Normal Equations it is the only solution that minimizes T too In short if neither the rows nor the columns of F are linearly independent neither FFT nor FTF need be invertible and then the existence of a minimizing solution u fl is in question Warning Even when an indicated inverse exists neither formula 13 F TF 1FTg nor FTFFT 1g should be used with numerical data unless the computer s arithmetic carries at least twice as many sig digits as are trusted in the data F g or desired in the result 13 Otherwise roundoff will degrade the result 13 too badly whenever F is too near a matrix of lower rank The reason behind this warning will become clear after Singular Values have been discussed If the arithmetic carries barely more sig digits than are trusted in the data or desired in the result it should be computed by means of a QR factorization which will also be discussed later Matlab uses such a factorization to compute u which Matlab calls Fg whenever F is not square LeastSquares is built into Matlab Existence and Uniqueness of a Minimizing Solution 11 We shall use Fredholm s Alternatives qv to deduce that the Normal Equations always have at least one solution f1 and to determine when it is unique At least one solution exists if and only if wTFTg 0 whenever wTFTF oT so consider any row wT that satis es the last equation It must satisfy also 0 wTFTFw FwTFw which implies Fw o which implies wTFTg FwTg 0 whereupon Fredholm s Alternative 1 implies that the Normal Equations have at least one solution f1 It is unique if and only if the columns of F are linearly independent otherwise add any nonzero solution z of Fz o to one ii to get another Prof W Kahan Page 1 Tim39s Inmm Ame xxvna quotAnnl ML 17mm AA Anlmr A n A Math H110 LeastSquares September 20 1998 How do we know that setting u u minimizes HFu 7 gH For every u we nd mm 7 gll2 711m 7 gll2 7 mm 7 a Fa 7 gm2 71 7 gll2 7 W 7 112 mm 7 in ow 7 g since 11le 7 2T2 7 1le 7 112 2u 7 TFTF 7 g 7 1le 7 112 2 0 with equality instead of inequality just when u is another solution of the Normal Equations When the Normal Equations have many solutions 13 which does Matlab choose for Fg 7 It has a near minimal number of nonzero elements A different solution minimizes H Hz T as if also the space of vectors u were Euclidean This doubly minimizing solution 13 satis es both the Normal Equations FTF FTg and an auxiliary equation 1 FTFV for some vector v of Lagrange Multipliers In consequence v satis es FTFZV FTg an equation with least one solution v whose existence is assured by an application of Fredholm s rst alternative very much like before except that the hypothesis wTFTF 0T is replaced by WTFTF2 0T Can you carry out this inference Every other solution u of the Normal Equations satis es Hqu 7 mill Hu7 13W 7 mill 7 Hu7 Hz Z Tu7 7 Hu7 Hz ZVTFTFu7 7 Hu7 Hz 2 o so this 13 7 FTFv really is doubly minimizing moreover it is determined uniquely by the data F g Can you see why As we shall see later after Singular Values have been discussed there is a matrix Fl called the MoorePenrose PseudoInverse of F such that the doubly minimizing Flg is a linear function of g Matlab s name for El is pinvF However whenever neither FTF nor FFT is invertible so Fl is interesting it turns out to be a violently discontinuous function of F This renders the doubly minimizing doubly dubious because the space of vectors u need not be Euclidean Matlab s Fg can be discontinuous too even when FFT is invertible and the doubly minimizing fl is continuous Linear Regression LeastSquares approximation has been applied to statistical estimation for over two centuries An mbyn matrix F is assumed given with linearly independent columns so m 2 n and a given mvector g y q of data is thought to include a systematic contribution y and a random error q The question is how near is y to RangeF The answer is obscured by the random error The elements of this error q are assumed independently distributed with mean 0 and known variance 52 These terms are given meaning by an Averaging or Expectation operator E which acts upon every random variable r linearly to produce PEr the average or mean of the population ofvalues of r Thus PEq 0 because every element of q has mean 0 and q has covariance matrix PEq7PEqq7PEqT 521 since the square of every element of q has mean 52 but every product of di erent elements of q has mean 0 because they are independent The smaller is B the less uncertainty does random error q introduce into the data g De ne x FTF 1FTy to minimize HFX 7yH although neither y nor x can be known As the known g approximates y so is x approximated by whatever u minimizes HFu gH Get u FTF 1FTg how well can it approximate x Since PEg y we nd that PEu x so u is an unbiased estimate of x The covariance matrix of u is computable too it is 2E a 7 mm 7 xT 7 2E FTFY1 FquTFFTF 1 7 FTFY1 FTAEciciTFFTF 1 7 62FTF 1 The smaller this is the better does u approximate x on average The smaller is HFX 7 yH the smaller do we expect HFu 7 gH to be How small should we expect it to be A calculation below shows that PEHFu gHZ HFX yH2 m7nB2 It means that HFu 7 gH is unlikely to exceed 5 m7n much if y lies in or very near RangeF conversely HFX 7 yH is unlikely to be much smaller than HFu 7 gH if this is many times bigger than SWm7n Explanation follows Prof W Kahan Page 2 Math H110 LeastSquares September 20 1998 Proof that EHF gHZ HFx sz m7nB2 The Trace ofa square matrix is de ned to be the sum of its diagonal elements evaluate this sum to con rm that TraceBTC TraceC ET for any matrices BT and C whose products BTC and CBT are both square though perhaps of different dimensions Next de ne H FFTF 1FT and con rm that HT H H2 H is the orthogonal projector onto RangeF because p F2 for some 2 j p Hp so RangeF RangeH and Hz o j ZTH oT so NullspaceH RangeHL Shortly we shall have use for TraceH TraceFTF 1FTF Traceln n Now we observe that u and x are so de ned that Ed 7 g H 7 lg and Ex 7 y H 7 ly wherein l is the mbym identity matrix Consequently 0le 7 gHz 7 a H7lgTH7lg 7 ETrace H7lgH7lgT because Tracech 7 Tracech AETrace H7lggTH7l Trace H7IEggTH7l because H HT isn t random Trace H7IEcvyT qu qu quH7I because g 7 y q 7 Trace H7I ny o o 1321 H7I 7 Trace H71nyH7I BzTraceH7lz Trace Fx 7 yFx 7 yT BzTracel 7 H HFx 7 yH2 Bzm7n as was claimed Proof that HF gH2 is unlikely to be many times bigger than its mean EHF gHZ More precisely we shall deduce that HF gH2 exceeds WOlF gHz with probability less than 1 for every 7 gt 1 This deduction is an instance of Tchebyshev s Inequality If a positive random variable p has mean u pr then the probability that p 2 ML cannot exceed 1 for any 7 gt 1 Here is a proof of Tchebyshev s Inequality Let p be the probability that p SE This p is a nondecreasing function increasing from p0 0 to pltgtltgt l and u 5 dp by virtue of the de nition of E We seek an overestimate for Kudp which is the probability that p 2 ML We nd that If 1135 3 If 1125 ma 3 jg 1135 ma 7 a ma which yields the result claimed This can be a gross overestimate because it uses almost no information about 7 For almost all values of 7 gt 1 and for all values of 7 gt 1 for almost all probability functions 7 the probability that p 2 ML is actually far tinier than 1 Thus the computed HF 7 gH2 is unlikely to be many times bigger than HFx 7 sz Bzm7n in which Bzm7n is given and HFx 7 sz is unknown whence something probabilistic can be inferred about the unknown Another similar application of LeastSquares is to the assumption that y Fx and g y q for a random error q about which 132 is unknown but estimated from HF 7 gHzm7n These applications are treated in Statistics courses Abstract LeastSquares Suppose a column vector g is given in an Euclidean space into which a given linear operator F maps a real space of abstract vectors u In the Euclidean space length HgH Vng but no such length is de ned yet for DomainF Again our task is to choose 11 to minimize HFu 7 gH which will then be the distance from g to RangeF DiiTerentiating the sum of squares HFu 7 gH2 Fu 7 gTFu 7 g produces d Fu 7 gTFu 7 g 2Fu gTF du which vanishes for all in nitesimal perturbations du if and only if Fu 7 gTF 0T This 0T is the linear functional that annihilates DomainF The last equation says that when HFu 7 gH is minimized the residual Fu 7 g must be normal perpendicular orthogonal to RangeF This explains the word Normal in Normal Equations and removes any suggestion that other equations are abnormal Drawing a picture helps imagine RangeF to be a plane in Euclidean 3space containing a vector Fu which when it comes closest to a given vector g not in the plane comes to that point in the plane reached by dropping a perpendicular from g We could transpose Fu 7 gTF 0T to FTFu FTg ifwe knew what FTF meant Prof W Kahan Page 3 Math H110 LeastSquares September 20 1998 The trouble with the expression FTF is that it is not what it rst seems if F were a matrix then FTF would map D0mainF to itself but a change of basis in D0mainF does not change FTF to the expected similar matrix Here is what happens instead Let B be a basis for D0mainF Then abstractvector u Bu for some column vector u and Fu FBu Fu for a matrix F FB The Normal Equations Fu 7 gTF 0T turn into Fu 7 gTF 0T which becomes FTFu FTg after matrix transposition BC is a new basis for D0mainF and u BCu for u C lu and Fu Fu for matrix F FC where C is any invertible matrix of the same dimension as D0mainF What was FTFu FTg in the old basis becomes FTF1L11 FTg in the new replacing matrix FTF by FTF CTFTFC This differs from C lFTFC which is how the change in basis would have changed FTF if it were the matrix of a map from D0mainF to itself Instead FTF is the matrix of a map from D0mainF to its own dual space If you doubt that these choices of basis matter try the following example Let g lOlOl a scalar and suppose F 1 10 100 in some coordinate system Then get Matlab to compute u Fg to solve the leastsquares problem Next change to a new basis using a diagonal matrix C diaglO 1 116 It changes F to F FC and thus changes the solution of the leastsquares problem to u Fg This maps back to Cu C F C g in the old basis Compare with the old solution u Try again with 6vectors g and 6by3 matrices F at random Bilinear Forms There is no uniquely de ned operator FTF just as there is no functional 11T determined uniquely by vector u in a nonEuclidean space The matrices that appear in the Normal Equations are not all matrices that represent linear maps from one space of column vectors to another or itself matrix FTF belongs to a Symmetric Bilinear Form that maps column vectors to row vectors Consider F11TFV It maps pairs u V of vectors from D0mainF to real scalars and does so as a linear function of each vector separately this is the de nition of a Bilinear Form And since F11TFV is unaltered when u and V are swapped it is a Symmetric BilinearForm There are many notations for bilinear forms HuV Hu V V Hu They all mean this Hui is a linear functional in the space dual to vectors V and HuV is its scalar value H7V is a linear functional in the space dual to vectors u and HuV is its scalar value Given a basis B for vectors u Bu and a basis E for vectors V EV there is a matrix H for which HuV HBuEV HuTv VTHU Changing bases from B to BC and E to ED changes u to u Cilu V to v Dilv and H to H DTHC so that HuV VTHU lelllu Exercise Express the elements of matrix H in terms of the effect H has upon the elements of bases B and E A Symmetric bilinear form maps vectors u and V from the same space to scalars and does so in a way independent of the order of u and V thus HuV Hvu Asymmetric bilinear form has a symmetric matrix H HT in any basis Why Changing the basis changes H to matrix H CTHC for some invertible C the two matrices H and H are called Congruent This congruence is an Equivalence so it preserves rank ie rankll rankH Congruence also preserves a thing called Signature as we ll see when we come to Sylvester s Inertia Theorem Prof W Kahan Page 4 Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Normally we think of the solution x Fily of a linear system Fx y as a continuous function of the given data F and y When no F 1 exists any of a host of algebraic methods can be used to solve the system for x at least approximately although every solution s usefulness may be undermined by violent variability This note assesses that violence quantitatively by relating it to the data s neamess to perturbed data of lower rank Real number data is assumed Let F be a possibly rectangular matrix whose rank may be less than both of its dimensions for all we know A Generalized Inverse G of F can be de ned in many ways all characterized thus x Gy is a solution of the equation Fx y whenever this equation has a solution even if the solution x is not determined uniquely by the equation Consequently Lemma 0 The generalized inverses G of F are the solutions G of the equation FGF F Proof y Fx runs through F s range as x runs through its domain then x Gy must be a solution ofthe equation Fx y whence Fx FGy FGFx follows for all x so FGF F Lemma 1 Every matrix F including F 0 has at least one generalized inverse G This will be proved later But first let us consider a widely used instance the M oore Penrose Pseudo Inverse Fl It is the linear operator that solves a Leaer Squares problem Given F and y in F s target space but perhaps not in F s range find the vector x in F s domain that minimizes H Fx 7 y H and if the minimizing x is not yet unique also minimizes HxH Here HvH VvTv is the Euclidean length ofreal vector v Lemma 2 The LeastZSquares problem s solution x Fly is obtainable from the Singular Value Decomposition F QVPT in which QTQ I and FTP I and V is a nonnegative diagonal matrix perhaps rectangular exhibiting the singular values of F by computing Fl PVlQT where Vl is obtained from the diagonal V by transposing it and then replacing 130 0T 0 0 39 0 0 0 Lemma 2 will be proved later From its formula for pseudoinverses will follow therein every nonzero diagonal element by its reciprocal For example g Lemma 3 FFl F F Fl FFl Fl Fl FT Fl F and FFlT FFl and these four equations characterize Fl because it is the unique matrix satisfying of all of them Lemma 3 s first equation says that Fl is a generalized inverse of F and the second says that F is a generalized inverse of Fl in fact Fll F In general generalized inverses are not reciprocal in this way later we shall see how to construct a generalized inverse G of F that violates any chosen subset of the latter three equations of Lemma 3 unless F O or F has full rank equal to F s least dimension Unless F 1 exists in which case G F 1 there are infinitely many generalized inverses G of F almost all of them with gargantuan elements since GZ is also a generalized inverse for every Z satisfying FZ O or ZF O February 4 2008 840 pm Page I Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Now a gargantuan generalized inverse G is hazardous to approximate because subsequent computations of Gy may encounter massive cancellation leaving little more than error as a residue And even when computation is performed exactly or without roundoff Gy can be changed utterly by tiny perturbations of y Consequently ample incentive exists to choose generalized inverses of nearminimal magnitudes an instance is the MoorePenrose Pseudo Inverse minimal because it solves the Least2Squares problem Lemma 2 In fact Lemma 4 Among generalized inverses G of F with minimal HGH maxy OHGyHHyH the Operator norm based upon Euclidean vector norms HGyH and lin one turns out to be Fl and only it minimizes the Root Sum Squares norm G TraceGTG 2i2j Gil2 The operator norm HGH here turns out to be the biggest singular value of G and HFlH turns out to be the reciprocal of F s smallest nonzero singular value Fl turns out to be the Root SumSquares of reciprocals of all F s nonzero singular values All this will become clear later Choosing a generalized inverse of near minimal magnitude cannot always avoid troubles with gargantuan magnitudes they may be unavoidable if F is too near another matrix F AF of lower rank because then every generalized inverse G of F must be huge A quantitative statement of this causeandeffect relationship involves matrix norms thus Theorem 5 Every generalized inverse G of F has G 2 l min AF such that rankF7AF lt rankF This will be proved later for any matrix norms satisfying Multiplicative Dominance relations HGyH S GHyH and HFxH S FHxH required for Compatibility with vector norms all plausible matrix norms are compatible that way Theorem 5 says that the existence of at least one moderately sized generalized inverse of F implies that F is relatively well separated from all matrices FiAF of lower rank when separation is gauged by the chosen matrix norm Theorem 5 says nothing if the norms have been chosen to make F and G look good For example choose new coordinate bases in the domain and target spaces of F to represent it by a new matrix F consisting of an identity matrix bordered perhaps by zero matrices the dimension of the identity matrix must equal the rank of F Then G FT is the representative in the new coordinates of a generalized inverse G of F in the old coordinates and choosing any of the customary operator norms H H in the new coordinates will induce corresponding operator norms H in the original coordinates such that F HFH1 and HG HGH l This is what is meant by norms quot chosen to make F and G look good Such a choice is not always praiseworthy Ideally the chosen norm should serve the intended application For instance Ideally all vector perturbations of roughly the same length gauged by the chosen norm should be roughly equally inconsequential or unlikely Worthwhile ideals are not easy to achieve achieving this one when possible is a long story for another day For now let us assume the norm is fixed in advance For any given F we seek as small a generalized inverse G as can be computed at a tolerable cost When the norm is the biggest singularvaluenorm used in Lemma 4 then a minimal generalized inverse G barely big enough to comply with Theorem 5 can be identified easily February 4 2008 840 pm Page 2 Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Corollary 6 The MoorePenrose PseudoInverse Fl is a generalized inverse of F satisfying HFIH l min HAFH such that rankF7AF lt rankF here HFIH is the biggest singular value of Fl which is the reciprocal of F s smallest nonzero singular value which in turn is the distance from F to the nearest FiAF of lower rank Exercise Show that FT hmb BHFTFylFT hmb FBIFFT 1 For other operator norms derived from nonEuclidean vector norms a near minimal generalized inverse of F i O can be dif cult to construct unless F 1 exists in which case Theorem 7 HF IH l min HAFH such that detFiAF 0 for all operator norms H H Far harder than the proofs of the theorems corollary and lemmas above is the proof of Theorem 8 For every operator norm at least one generalized inverse G of F besides satisfying Lemma 0 and Theorem 5 also satisfies HGH S WrankF min HAFH such that rankF7AF lt rankF Theorem 8 shows that Theorem 5 is not excessively optimistic I found a proof in 1973 but have not published it yet because I still hope some day to discover a much shorter proof Here are proofs for the other assertions above First is Lemma l s assertion that every F has at least one generalized inverse G One proof deduces that the equation FGF F in Lemma 0 has a solution G by applying one of Fredholm s criteria In general the linear equation Ag f must have at least one solution g if and only if ch 0 whenever cTA oT In our case the linear operator A does to g what FGF does to G and ch matches TraceCTF In short to apply Fredholm s criterion we must decide whether TraceCTF 0 for every matrix C satisfying TraceCTFZF 0 for all Z Since the trace of a product is unchanged by cyclic permutation of its factors the last equation implies that TraceFCTFZ 0 for all Z whence FCTF O whence follows CTF2 O which implies that CTF has no nonzero eigenvalue whence follows that TraceCTF 0 as required to establish that a solution G of FGF F exists Another proof of G s existence follows changes of coordinate bases in the domain and target spaces of F to exhibit its linear operator s canonical form under Equivalence F LI wherein the dimension of the identity matrix I matches RankF and the zero matrices O fill out the rest of the rows and columns the rest of the rows or the rest of the columns or both can be absent if F has full rank equal to one of its dimensions For these bases every generalized inverse G 1 with arbitrary matrices R S and Z so dimensioned if not absent that I S matrix products FG and GF both exist Then FGF F and this equation persists in the form FGF F after restoration of the original coordinate bases February 4 2008 840 pm Page 3 Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Exercise Continuing from the last paragraph prove that every generalized inverse G of F has RankG 2 RankF with equality if and only if F is a generalized inverse of G too To prove Lemma 2 we use analogous coordinate changes but this time to new orthonormal bases obtained from the orthogonal matrices P and Q of a singular value decomposition F QFPT in which diagonal matrix F has the same dimensions as matrix F whose singular values appear on F s diagonal the square diagonal matrix V exhibits all nonzero singular values We might as well assume F F at the outset since these coordinate changes preserve Euclidean length in both domain and target spaces of F leaving the Least2Squares problem unchanged Now with F diagonal the Least2Squares problem s solution is almost obviously x Fly for Fl 1 V O w1th F s d1mensrons transposed so that FlF I 0 15 O O O O a square diagonal matrix as is FFl Also this diagonal Fl satis es all four of Lemma 3 s equations which persist after restoration of the original coordinate bases These equations determine Fl uniquely because they do so in the coordinate systems that diagonalize F as is 71 easrly verr ed In these coordrnate systems any G V R J is a generalrzed mverse because 8 2 it satis es the rst equation FGF F for any R S and Z Only Z iSVilR can satisfy the second equation GFG G too Only S O can satisfy the rst and third GFT GF only R O can satisfy the first and fourth FGT FG Consequently only Fl satis es all four equations This completes the proof of Lemma 3 and the second sentence after it When any generalized inverse G of F is represented as above in orthonormal coordinate bases that diagonalize F the RootSumSquares norm satis es G2 V 12 R2 S2 Z2 which exceeds Fl2 V712 unless G Fl This con rms the second assertion in Lemma 4 The rst assertion s proof begins with the observation that clearing S and Z to zeros cannot increase HGH maxy O HGyHHyH nor can it be increased after that by clearing R therefore every generalized inverse G has its HGH 2 HVTIH HFlH lF s least nonzero singular value This completes the proof of Lemma 4 and the paragraph after it Exercise Fill in all unobvious details of the proofs above Exercise Suppose that F LCRT in which L C and R all have the same rank and C is square and invertible Show that Ll LTL 1LT Rl RTRTIRT and Fl RlTC lLl so it is computable by rational arithmetic alone Current versions of Matlab compute an approximation pinv F to Fl from a singular value decomposition of F after all singular values tinier than a roundoffrelated threshold have been reset to zeros Otherwise reciprocals of these tiny singular values would introduce gargantuan numbers that would swamp everything else in Fl and render it useless numerically A Matlab user can substitute his own threshold 9 for Matlab s by invoking pinv F 9 This is tantamount to perturbng F changing it to FiAF with HAFH lt 9 to get HF7AFlH S 19 provided 9 gt 0 of course If F has no singular values below that threshold AF O but otherwise RankF7AF lt RankF and HFlH gt 19 How should the threshold 9 be chosen February 4 2008 840 pm Page 4 Math H110 Huge Generalized lnverses of RankDeficient Matrices Prof W Kahan lf insights into data revealed by computed results are not to be confounded by an accident of the computational algorithm then these results must be insensitive to ostensibly small changes in the threshold This will be the case only if 9 falls into a relatively wide gap between F s small singular values and much tinier ones tiny enough to be discarded Otherwise the choice of 9 becomes problematical Singular value decompositions reveal almost everything knowable about linear operators from one Euclidean space to another What if the spaces are not both Euclidean Vectors lengths may be gauged by any of various norms each of which satis es all the familiar laws NV gt 0 except 0 0 IIH39VII lHl39llVlla and uV S llull t IIVII but violates the Euclidean norm s Parallelogram Law HuvH2 HuivH2 2HuH2 2HvH2 see our class notes on How to Recognize a Quadratic Form This violation by H deprives the vector space of a wealth of Isometries lengthpreserving linear transformations the rotations and re ections represented by orthogonal matrices possessed by Euclidean spaces Here are two instances The only isometries available for the biggestmagnitude norm and for the sumof magnitudes norm are the permutations and the sign reversals of columnvectors elements This is why nonEuclidean normed spaces they are called Banach spaces have turned out much more difficult to analyse in the course of about a century of study The singular value decomposition has no useful counterpart for linear operators between spaces that are not both Euclidean though the spaces Operator norm HFH maxV O Fvv does resemble the biggest singular value in some respects and is easier to compute for the biggest magnitude and sumofmagnitudes vector norms provided both spaces use the same norm All operator norms satisfy the product identity HuwTH uwT for rank1 operators as well as a multiplicative dominance relation HLFH S HLHHFH satis ed by all wellfounded matrix norms Exercise Prove these recalling that the dual space s vector norm is an operator norm wT maxWED levlv Symmetrically v maXWT0T levlIIWTII 39 use this to prove all operator norms HFH maXWT0T wTFwT too Exercise Prove HFxH SFHxH compatibility and FZ S FZ multiplicative dominance for though Some matrix norms are not operator norms the rootsumsquares norm is an instance Here is why Were there a pair of vector norms for which IFI is the operator norm it would satisfy the product identity luwTI uHwTH for vectors norm H in the operators target space and functionals norm H TH in the domain s dual space Actually uwT TracequuwT HuHHwTH for Euclidean norms in both spaces and their operator norm is the biggestsingularvalue which is less that the rootsumsquares of singular values for all but rankl operators In short is generally too big to be an operator norm In general a matrix norm that is not an operator norm but is Compatible satisfies the multiplicative dominance relation with the vector norms in the domain and target space can be proved always at least as big as the operator norm for those two spaces Exercise Prove that the Sumofallmagnitudes norm HlFHl 2i Zj lfijl cannot be an operator norm for 2by2 matrices F fij though it is compatible with the sumofmagnitudes norm in its target space and the biggest element norm in its domain Hint try F 1 I The operator norm for those two spaces is tedious to compute 1 71 Exercise The Biggestof allelements norm GI maxij lgijl is an operator norm39 for which vector norms and why What is the least constant u for which matrix norm ulGl is compatible with the biggestelement norm in both G s domain and target spaces Similarly what the sumofmagnitudes norm in both spaces Exercise Explain why if B is the matrix of a linear operator from a normed space to itself with a compatible HBH lt l then 17B is nonsingular However if the square matrix B belongs to a linear operator beween two different normed spaces then 17B can be singular despite that a compatible HBH lt l HIH 39 give an example February 4 2008 840 pm Page 5 Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Consider now two normed vector space V and W and two matrix norms one for matrices F that map V to W and a second norm for matrices G that map W to V and suppose both matrix norms are compatible with the spaces vector norms HFvH S FHvH for all V in V and HGwH S GHwH for all w in W Were amatrix norm incompatible multiplying it by a scalar constant big enough would make it compatible do you see how For such norms a lower bound for all generalized inverses of F comes from Theorem 5 Here is its proof So long as RankF gt RankF7AF 2 RankF7AFGF the nullspace of FiAFGF must be a subspace of F s domain with greater dimension than the nullspace of F Therefore vectors x must exist satisfying FiAFGFx o i Fx whence follows 0 Fx FGFx AFGFx and then 0 lt HFxH lAFGFxH S AFHGFxH S AFGHFxH because of compatibility Divide out HFxH and then minimize AF subject to RankFAF lt RankF to complete the proof of Theorem 5 It combines with Lemma 4 to yield Corollary 6 Historically Corollary 6 is several decades older than Theorem 5 which seems three or four decades old See GW Stewart On the Early History of the Singular Value Decomposition in SIAM Review 35 1993 pp 551566 for more chronology Both statements generalize Theorem 7 said to have been known to S Banach in the 1920s certainly known to MG Krein in the 1940s and resurrected by numerical analyst N Gastinel in the early 1960s For Theorem 7 s proof we assume that F is a conventionally invertible square matrix and we seek the singular matrix FiAF nearest F in the sense that operator norm HAFH is minimized Theorem 5 says the minimum can t be smaller than lHFJH so constructing AF to achieve equality will prove the desired result Note that the notation used for norms here is still as usual overloaded because HAFH and HF IH can be gauged by different operator norms when the spaces between which F and F 1 operate in opposite directions are gauged by different vector norms None the less you should be able to see why HFHHF IH 2 HIH l 39 try it We will devise aminimizing AF va ofrank l as follows Since HFTIH maxHXH1 HFile let x v be amaximizing vector then HFilvH HFTIH and HvH l We also know that HFilvH maXHyTH1 yTF 1v and can now choose a maximizing functional yT uT then HFilvH uTF 1v and HuTH l Finally set wT uTHFTIH to get wTFilv l Now AF va has HAFH M MWTH lHF IH and FiAFF 1v v 7 vaFilv 0 which implies that FiAF is a singular matrix nearest F Theorem 7 s proof ends Alas my proof of Theorem 8 is much too long to reproduce here February 4 2008 840 pm Page 6 Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Perturbations How does its generalized inverse change with F Put this way the question begs a crucial question Is its generalized inverse determined uniquely by F It is in important special cases For instance an easily veri ed identity satis ed by any invertible square matrix F is F5F 1 7 F 1 7FESF 1ESFF 1 7F 1ESFFESF 1 provided F5FT1 I F71 5FT1 FT1 exists too as it must when HF 1 395FH lt l for a suitable see the previous exercise operator norm as is surely the case when HFTIH 39115FH lt l Why Then HF5F 1 7 F IHHF IH s 11HF 15FH 7 1 with equality for an apt 5F ofrank 1 Exercise Confirm the last sentence Then exhibita dFd l for which HdF 1drH HF IHZHdFdTH 0 Thus FT1 and its derivative dF ldr become huge together only as an invertible F approaches some singular matrix To behave analogously when F is not invertible its generalized inverse must be determined uniquely by some conditions besides the one in Lemma 0 Nonmetric no norms conditions that sometimes determine a generalized inverse G of F uniquely do exist For instance when F is square the three equations FGF F GFG G and FG GF always have at most one solution G but sometimes none and when a solution G exists it can vary arbitrarily violently as F changes even though RankF does not change I have heard this generalized inverse G called Drazin s SemiInverse when it exists which it does if and only if Fz 0 whenever sz o for any integer k gt 0 This means G exists if and only if F s Jordan Normal Farm has no nonzero Jordan block with diagonal all zero To see why suppose sz o for some k gt 0 and that a solution G exists Then Gz GFsz Gk sz o whence Fz FZGZ 0 too This restricts Jordan s Normal Form of F to have no eigenvalue 0 witha Jordan block bigger than 1by1 Conversely suppose the Jordan blocks of F are constrained that way The characteristic polynomial of F is then detBl 7 F 20513quot ujlwk with um 1 uo at 0 and some k 2 0 and the Cayley Hamilton Theorem ensures that this polynomial vanishes when F is substituted for B The constraint implies 20513 quJTK O with K mink 1 0 or 1 Now G 21513 quj luOYF turns out to be the solution of the three equations in question can you confirm this There is no other solution because if Z also satisfies the three equations FZF F zrz z and FZ ZF then z ZF2Z Z3F2 z3r4G2 ZF3FG2 FGZ G it is unique Note that this solution G when it exists is a rational function of F Exercise Given a rank1 square matrix F uvT with Euclidean HuH HVTH 1 and vTu cos9 at 0 show that Fl FT but Drazin s semiinverse G Fcosz9 Evidently it can be enormously bigger than F1 Among uniquely de ned generalized not ordinary inverses the MoorePenrose Pseudo Inverse F1 is the most commonly used How does it change when F is perturbed Because F1 is a rational function of F formulas generalizing the identity near the top of this page must exist presenting the change in a way that allows a limit process to express the derivative dFldr when it exists in terms of dFd l Here is such a formula with E in place of F5F Lemma 9 E17 F1 7F1E7FE1 I7F1FE7FTE1TE1 FlFlTE7FTI7EEl Proof The identity s righthand side expands into ten terms Two of them condense as does FlFlTFT to F1 and persist Four of them condense as does FlFFTElTEl to FTElTEl and cancel the remaining four terms thus confirming the identity I presented it in 1971 at an IFIP Congress in Ljubljiana but it had already been discovered in Lund by P A Wedin for his 1969 thesis most of which he published in 1973 in BIT 13 February 4 2008 840 pm Page 7 Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Lemma 9 leads to the following overestimate of the biggestsingularvalue norm of El7Fl Theorem 10 If E i F then HEl7FlHHE7FH s HElH4HElHZ39HFlHZHlFlH s VimaxfiiEtH HFlH2 Proof Lemma 9 s identity exhibits El 7Fl R L 7 S wherein S FlE7FEl R CIE7FTElTEl and L FlFlTE7FT I in which the orthogonal projector I l 7 FlF CDT 12 satisfies FCI O CDFl and HCDH 1 except when I O 39 similarly for 1 l 7 EEl Now we can estimate NET 7 FW by using the easily verified formula HZTHZ 7 HZHZ 7 HszH 7 the biggest eigenvalue of ZTZ Since RTs 7RTL 7 0 we find El 7 FlTEl 7 Fl 7 RTR L7STL7S whence NET 7 Fle s HRHZ HL7sH2 And since LsT 7 0 we find L7SL7ST7LLT SST whence HL7SHZS HLH2HSH2 Because HRH s 1HE7FHHElH2 HLH SHFleHE7FH1 and HSH siiFTHHE7FHHETH putting it all together yields HEl7FlHHE7FH s NEW llElllZllFlllz HFW as claimed P A Wedin s more penetrating proof got a more complicated estimate with l 2 in place of g El F5Fl can change violently for a very tiny 5F if RankF5F gt RankF since then Corollary 6 implies HF5FlH 2 lHSFH rendering Theorem 10 s overestimate gargantuan and useless On the other hand when RankF5F RankF and HSFH lt lHFlH so HSFH is too tiny for any perturbation of its size to drop the rank of F5F then Lemma 11 If positive lFlH17HFlHH5FH2HF5FlH provided RankF5FRankF too Proof Let F7AF be the matrix of rank less than RankF5F nearest F5F when gauged by the Biggest singularvalue norm Then Corollary 6 implies both H5FAFH HF5F 7FAFH lHF5FlH and lHFlH S HAFH H5FAF 7 SFH S H5FAFH HSFH lHF5FlH HSFH which turns into the lemma s inequality Inequalities more general than this because they allow 5F to be somewhat bigger and sharper than this and Theorem 10 s inequalities can be found in Wedin 1973 BIT Nordisk Tidskn39ftfar Infaman39ansbehandling 13 pp 217232 and in Stewart 1977 SIAMReview 19 pp 634662 which surveys the subject in depth They go far deeper than necessary for this course Lemma 11 and Theorem 10 implythat HF5Fl7FlH SV H5FHHFlH217HFlHHSFH2 so long as RankF5F RankF and HSFH lt lHFlH Together with Lemma 9 these yield Theorem 12 Provided RankF does not change as F varies Fl is acontinuously differentiable rational function of F with dFldr 7 7FldFd17Fl I7FlFdFdtTFlTFl FlFlTdFdrTI7FFl and HdFldrH s ViHdFddiHFTHZ and since HFlH ldistance from F to the nearest matrix of lower rank neither HFlH nor HdFld EHHdFd EH can become gargantuan unless F approaches a matrix of lower rank This illustrates an important general theme among algebraic problems Each such problem may be embedded in a family of more general problems a linear system of equations or a system of polynomial equations If such an embedding loses sight of a significant combinatorial attribute of the given problem like the rank of a matrix or like the multiplicity of an eigenvalue or a polynomial s zero the admission of even infinitesimal perturbations that upset that combinatorial attribute can turn a tame problem into a knotty if not naughty one The threshold 9 in Matlab s pinv provides a way to restore a matrix s rank after it was upset by rounding errors if successful this compensation improves the computed result enormously If compensation fails serious rethinking is in order February 4 2008 840 pm Page 8 Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Normally we think of the solution x Fily of a linear system Fx y as a continuous function of the given data F and y When no F 1 exists any of a host of algebraic methods can be used to solve the system for x at least approximately although every solution s usefulness may be undermined by violent variability This note assesses that violence quantitatively by relating it to the data s neamess to perturbed data of lower rank Real number data is assumed Let F be a possibly rectangular matrix whose rank may be less than both of its dimensions for all we know A Generalized Inverse G of F can be de ned in many ways all characterized thus x Gy is a solution of the equation Fx y whenever this equation has a solution even if the solution x is not determined uniquely by the equation Consequently Lemma 0 The generalized inverses G of F are the solutions G of the equation FGF F Proof y Fx runs through F s range as x runs through its domain then x Gy must be a solution ofthe equation Fx y whence Fx FGy FGFx follows for all x so FGF F Lemma 1 Every matrix F including F 0 has at least one generalized inverse G This will be proved later But first let us consider a widely used instance the M oore Penrose Pseudo Inverse Fl It is the linear operator that solves a Leaer Squares problem Given F and y in F s target space but perhaps not in F s range find the vector x in F s domain that minimizes H Fx 7 y H and if the minimizing x is not yet unique also minimizes HxH Here HvH VvTv is the Euclidean length ofreal vector v Lemma 2 The LeastZSquares problem s solution x Fly is obtainable from the Singular Value Decomposition F QVPT in which QTQ I and FTP I and V is a nonnegative diagonal matrix perhaps rectangular exhibiting the singular values of F by computing Fl PVlQT where Vl is obtained from the diagonal V by transposing it and then replacing 130 0T 0 0 39 0 0 0 Lemma 2 will be proved later From its formula for pseudoinverses will follow therein every nonzero diagonal element by its reciprocal For example g Lemma 3 FFl F F Fl FFl Fl Fl FT Fl F and FFlT FFl and these four equations characterize Fl because it is the unique matrix satisfying of all of them Lemma 3 s first equation says that Fl is a generalized inverse of F and the second says that F is a generalized inverse of Fl in fact Fll F In general generalized inverses are not reciprocal in this way later we shall see how to construct a generalized inverse G of F that violates any chosen subset of the latter three equations of Lemma 3 unless F O or F has full rank equal to F s least dimension Unless F 1 exists in which case G F 1 there are infinitely many generalized inverses G of F almost all of them with gargantuan elements since GZ is also a generalized inverse for every Z satisfying FZ O or ZF O February 4 2008 840 pm Page I Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Now a gargantuan generalized inverse G is hazardous to approximate because subsequent computations of Gy may encounter massive cancellation leaving little more than error as a residue And even when computation is performed exactly or without roundoff Gy can be changed utterly by tiny perturbations of y Consequently ample incentive exists to choose generalized inverses of nearminimal magnitudes an instance is the MoorePenrose Pseudo Inverse minimal because it solves the Least2Squares problem Lemma 2 In fact Lemma 4 Among generalized inverses G of F with minimal HGH maxy OHGyHHyH the Operator norm based upon Euclidean vector norms HGyH and lin one turns out to be Fl and only it minimizes the Root Sum Squares norm G TraceGTG 2i2j Gil2 The operator norm HGH here turns out to be the biggest singular value of G and HFlH turns out to be the reciprocal of F s smallest nonzero singular value Fl turns out to be the Root SumSquares of reciprocals of all F s nonzero singular values All this will become clear later Choosing a generalized inverse of near minimal magnitude cannot always avoid troubles with gargantuan magnitudes they may be unavoidable if F is too near another matrix F AF of lower rank because then every generalized inverse G of F must be huge A quantitative statement of this causeandeffect relationship involves matrix norms thus Theorem 5 Every generalized inverse G of F has G 2 l min AF such that rankF7AF lt rankF This will be proved later for any matrix norms satisfying Multiplicative Dominance relations HGyH S GHyH and HFxH S FHxH required for Compatibility with vector norms all plausible matrix norms are compatible that way Theorem 5 says that the existence of at least one moderately sized generalized inverse of F implies that F is relatively well separated from all matrices FiAF of lower rank when separation is gauged by the chosen matrix norm Theorem 5 says nothing if the norms have been chosen to make F and G look good For example choose new coordinate bases in the domain and target spaces of F to represent it by a new matrix F consisting of an identity matrix bordered perhaps by zero matrices the dimension of the identity matrix must equal the rank of F Then G FT is the representative in the new coordinates of a generalized inverse G of F in the old coordinates and choosing any of the customary operator norms H H in the new coordinates will induce corresponding operator norms H in the original coordinates such that F HFH1 and HG HGH l This is what is meant by norms quot chosen to make F and G look good Such a choice is not always praiseworthy Ideally the chosen norm should serve the intended application For instance Ideally all vector perturbations of roughly the same length gauged by the chosen norm should be roughly equally inconsequential or unlikely Worthwhile ideals are not easy to achieve achieving this one when possible is a long story for another day For now let us assume the norm is fixed in advance For any given F we seek as small a generalized inverse G as can be computed at a tolerable cost When the norm is the biggest singularvaluenorm used in Lemma 4 then a minimal generalized inverse G barely big enough to comply with Theorem 5 can be identified easily February 4 2008 840 pm Page 2 Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Corollary 6 The MoorePenrose PseudoInverse Fl is a generalized inverse of F satisfying HFIH l min HAFH such that rankF7AF lt rankF here HFIH is the biggest singular value of Fl which is the reciprocal of F s smallest nonzero singular value which in turn is the distance from F to the nearest FiAF of lower rank Exercise Show that FT hmb BHFTFylFT hmb FBIFFT 1 For other operator norms derived from nonEuclidean vector norms a near minimal generalized inverse of F i O can be dif cult to construct unless F 1 exists in which case Theorem 7 HF IH l min HAFH such that detFiAF 0 for all operator norms H H Far harder than the proofs of the theorems corollary and lemmas above is the proof of Theorem 8 For every operator norm at least one generalized inverse G of F besides satisfying Lemma 0 and Theorem 5 also satisfies HGH S WrankF min HAFH such that rankF7AF lt rankF Theorem 8 shows that Theorem 5 is not excessively optimistic I found a proof in 1973 but have not published it yet because I still hope some day to discover a much shorter proof Here are proofs for the other assertions above First is Lemma l s assertion that every F has at least one generalized inverse G One proof deduces that the equation FGF F in Lemma 0 has a solution G by applying one of Fredholm s criteria In general the linear equation Ag f must have at least one solution g if and only if ch 0 whenever cTA oT In our case the linear operator A does to g what FGF does to G and ch matches TraceCTF In short to apply Fredholm s criterion we must decide whether TraceCTF 0 for every matrix C satisfying TraceCTFZF 0 for all Z Since the trace of a product is unchanged by cyclic permutation of its factors the last equation implies that TraceFCTFZ 0 for all Z whence FCTF O whence follows CTF2 O which implies that CTF has no nonzero eigenvalue whence follows that TraceCTF 0 as required to establish that a solution G of FGF F exists Another proof of G s existence follows changes of coordinate bases in the domain and target spaces of F to exhibit its linear operator s canonical form under Equivalence F LI wherein the dimension of the identity matrix I matches RankF and the zero matrices O fill out the rest of the rows and columns the rest of the rows or the rest of the columns or both can be absent if F has full rank equal to one of its dimensions For these bases every generalized inverse G 1 with arbitrary matrices R S and Z so dimensioned if not absent that I S matrix products FG and GF both exist Then FGF F and this equation persists in the form FGF F after restoration of the original coordinate bases February 4 2008 840 pm Page 3 Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Exercise Continuing from the last paragraph prove that every generalized inverse G of F has RankG 2 RankF with equality if and only if F is a generalized inverse of G too To prove Lemma 2 we use analogous coordinate changes but this time to new orthonormal bases obtained from the orthogonal matrices P and Q of a singular value decomposition F QFPT in which diagonal matrix F has the same dimensions as matrix F whose singular values appear on F s diagonal the square diagonal matrix V exhibits all nonzero singular values We might as well assume F F at the outset since these coordinate changes preserve Euclidean length in both domain and target spaces of F leaving the Least2Squares problem unchanged Now with F diagonal the Least2Squares problem s solution is almost obviously x Fly for Fl 1 V O w1th F s d1mensrons transposed so that FlF I 0 15 O O O O a square diagonal matrix as is FFl Also this diagonal Fl satis es all four of Lemma 3 s equations which persist after restoration of the original coordinate bases These equations determine Fl uniquely because they do so in the coordinate systems that diagonalize F as is 71 easrly verr ed In these coordrnate systems any G V R J is a generalrzed mverse because 8 2 it satis es the rst equation FGF F for any R S and Z Only Z iSVilR can satisfy the second equation GFG G too Only S O can satisfy the rst and third GFT GF only R O can satisfy the first and fourth FGT FG Consequently only Fl satis es all four equations This completes the proof of Lemma 3 and the second sentence after it When any generalized inverse G of F is represented as above in orthonormal coordinate bases that diagonalize F the RootSumSquares norm satis es G2 V 12 R2 S2 Z2 which exceeds Fl2 V712 unless G Fl This con rms the second assertion in Lemma 4 The rst assertion s proof begins with the observation that clearing S and Z to zeros cannot increase HGH maxy O HGyHHyH nor can it be increased after that by clearing R therefore every generalized inverse G has its HGH 2 HVTIH HFlH lF s least nonzero singular value This completes the proof of Lemma 4 and the paragraph after it Exercise Fill in all unobvious details of the proofs above Exercise Suppose that F LCRT in which L C and R all have the same rank and C is square and invertible Show that Ll LTL 1LT Rl RTRTIRT and Fl RlTC lLl so it is computable by rational arithmetic alone Current versions of Matlab compute an approximation pinv F to Fl from a singular value decomposition of F after all singular values tinier than a roundoffrelated threshold have been reset to zeros Otherwise reciprocals of these tiny singular values would introduce gargantuan numbers that would swamp everything else in Fl and render it useless numerically A Matlab user can substitute his own threshold 9 for Matlab s by invoking pinv F 9 This is tantamount to perturbng F changing it to FiAF with HAFH lt 9 to get HF7AFlH S 19 provided 9 gt 0 of course If F has no singular values below that threshold AF O but otherwise RankF7AF lt RankF and HFlH gt 19 How should the threshold 9 be chosen February 4 2008 840 pm Page 4 Math H110 Huge Generalized lnverses of RankDeficient Matrices Prof W Kahan lf insights into data revealed by computed results are not to be confounded by an accident of the computational algorithm then these results must be insensitive to ostensibly small changes in the threshold This will be the case only if 9 falls into a relatively wide gap between F s small singular values and much tinier ones tiny enough to be discarded Otherwise the choice of 9 becomes problematical Singular value decompositions reveal almost everything knowable about linear operators from one Euclidean space to another What if the spaces are not both Euclidean Vectors lengths may be gauged by any of various norms each of which satis es all the familiar laws NV gt 0 except 0 0 IIH39VII lHl39llVlla and uV S llull t IIVII but violates the Euclidean norm s Parallelogram Law HuvH2 HuivH2 2HuH2 2HvH2 see our class notes on How to Recognize a Quadratic Form This violation by H deprives the vector space of a wealth of Isometries lengthpreserving linear transformations the rotations and re ections represented by orthogonal matrices possessed by Euclidean spaces Here are two instances The only isometries available for the biggestmagnitude norm and for the sumof magnitudes norm are the permutations and the sign reversals of columnvectors elements This is why nonEuclidean normed spaces they are called Banach spaces have turned out much more difficult to analyse in the course of about a century of study The singular value decomposition has no useful counterpart for linear operators between spaces that are not both Euclidean though the spaces Operator norm HFH maxV O Fvv does resemble the biggest singular value in some respects and is easier to compute for the biggest magnitude and sumofmagnitudes vector norms provided both spaces use the same norm All operator norms satisfy the product identity HuwTH uwT for rank1 operators as well as a multiplicative dominance relation HLFH S HLHHFH satis ed by all wellfounded matrix norms Exercise Prove these recalling that the dual space s vector norm is an operator norm wT maxWED levlv Symmetrically v maXWT0T levlIIWTII 39 use this to prove all operator norms HFH maXWT0T wTFwT too Exercise Prove HFxH SFHxH compatibility and FZ S FZ multiplicative dominance for though Some matrix norms are not operator norms the rootsumsquares norm is an instance Here is why Were there a pair of vector norms for which IFI is the operator norm it would satisfy the product identity luwTI uHwTH for vectors norm H in the operators target space and functionals norm H TH in the domain s dual space Actually uwT TracequuwT HuHHwTH for Euclidean norms in both spaces and their operator norm is the biggestsingularvalue which is less that the rootsumsquares of singular values for all but rankl operators In short is generally too big to be an operator norm In general a matrix norm that is not an operator norm but is Compatible satisfies the multiplicative dominance relation with the vector norms in the domain and target space can be proved always at least as big as the operator norm for those two spaces Exercise Prove that the Sumofallmagnitudes norm HlFHl 2i Zj lfijl cannot be an operator norm for 2by2 matrices F fij though it is compatible with the sumofmagnitudes norm in its target space and the biggest element norm in its domain Hint try F 1 I The operator norm for those two spaces is tedious to compute 1 71 Exercise The Biggestof allelements norm GI maxij lgijl is an operator norm39 for which vector norms and why What is the least constant u for which matrix norm ulGl is compatible with the biggestelement norm in both G s domain and target spaces Similarly what the sumofmagnitudes norm in both spaces Exercise Explain why if B is the matrix of a linear operator from a normed space to itself with a compatible HBH lt l then 17B is nonsingular However if the square matrix B belongs to a linear operator beween two different normed spaces then 17B can be singular despite that a compatible HBH lt l HIH 39 give an example February 4 2008 840 pm Page 5 Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Consider now two normed vector space V and W and two matrix norms one for matrices F that map V to W and a second norm for matrices G that map W to V and suppose both matrix norms are compatible with the spaces vector norms HFvH S FHvH for all V in V and HGwH S GHwH for all w in W Were amatrix norm incompatible multiplying it by a scalar constant big enough would make it compatible do you see how For such norms a lower bound for all generalized inverses of F comes from Theorem 5 Here is its proof So long as RankF gt RankF7AF 2 RankF7AFGF the nullspace of FiAFGF must be a subspace of F s domain with greater dimension than the nullspace of F Therefore vectors x must exist satisfying FiAFGFx o i Fx whence follows 0 Fx FGFx AFGFx and then 0 lt HFxH lAFGFxH S AFHGFxH S AFGHFxH because of compatibility Divide out HFxH and then minimize AF subject to RankFAF lt RankF to complete the proof of Theorem 5 It combines with Lemma 4 to yield Corollary 6 Historically Corollary 6 is several decades older than Theorem 5 which seems three or four decades old See GW Stewart On the Early History of the Singular Value Decomposition in SIAM Review 35 1993 pp 551566 for more chronology Both statements generalize Theorem 7 said to have been known to S Banach in the 1920s certainly known to MG Krein in the 1940s and resurrected by numerical analyst N Gastinel in the early 1960s For Theorem 7 s proof we assume that F is a conventionally invertible square matrix and we seek the singular matrix FiAF nearest F in the sense that operator norm HAFH is minimized Theorem 5 says the minimum can t be smaller than lHFJH so constructing AF to achieve equality will prove the desired result Note that the notation used for norms here is still as usual overloaded because HAFH and HF IH can be gauged by different operator norms when the spaces between which F and F 1 operate in opposite directions are gauged by different vector norms None the less you should be able to see why HFHHF IH 2 HIH l 39 try it We will devise aminimizing AF va ofrank l as follows Since HFTIH maxHXH1 HFile let x v be amaximizing vector then HFilvH HFTIH and HvH l We also know that HFilvH maXHyTH1 yTF 1v and can now choose a maximizing functional yT uT then HFilvH uTF 1v and HuTH l Finally set wT uTHFTIH to get wTFilv l Now AF va has HAFH M MWTH lHF IH and FiAFF 1v v 7 vaFilv 0 which implies that FiAF is a singular matrix nearest F Theorem 7 s proof ends Alas my proof of Theorem 8 is much too long to reproduce here February 4 2008 840 pm Page 6 Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Perturbations How does its generalized inverse change with F Put this way the question begs a crucial question Is its generalized inverse determined uniquely by F It is in important special cases For instance an easily veri ed identity satis ed by any invertible square matrix F is F5F 1 7 F 1 7FESF 1ESFF 1 7F 1ESFFESF 1 provided F5FT1 I F71 5FT1 FT1 exists too as it must when HF 1 395FH lt l for a suitable see the previous exercise operator norm as is surely the case when HFTIH 39115FH lt l Why Then HF5F 1 7 F IHHF IH s 11HF 15FH 7 1 with equality for an apt 5F ofrank 1 Exercise Confirm the last sentence Then exhibita dFd l for which HdF 1drH HF IHZHdFdTH 0 Thus FT1 and its derivative dF ldr become huge together only as an invertible F approaches some singular matrix To behave analogously when F is not invertible its generalized inverse must be determined uniquely by some conditions besides the one in Lemma 0 Nonmetric no norms conditions that sometimes determine a generalized inverse G of F uniquely do exist For instance when F is square the three equations FGF F GFG G and FG GF always have at most one solution G but sometimes none and when a solution G exists it can vary arbitrarily violently as F changes even though RankF does not change I have heard this generalized inverse G called Drazin s SemiInverse when it exists which it does if and only if Fz 0 whenever sz o for any integer k gt 0 This means G exists if and only if F s Jordan Normal Farm has no nonzero Jordan block with diagonal all zero To see why suppose sz o for some k gt 0 and that a solution G exists Then Gz GFsz Gk sz o whence Fz FZGZ 0 too This restricts Jordan s Normal Form of F to have no eigenvalue 0 witha Jordan block bigger than 1by1 Conversely suppose the Jordan blocks of F are constrained that way The characteristic polynomial of F is then detBl 7 F 20513quot ujlwk with um 1 uo at 0 and some k 2 0 and the Cayley Hamilton Theorem ensures that this polynomial vanishes when F is substituted for B The constraint implies 20513 quJTK O with K mink 1 0 or 1 Now G 21513 quj luOYF turns out to be the solution of the three equations in question can you confirm this There is no other solution because if Z also satisfies the three equations FZF F zrz z and FZ ZF then z ZF2Z Z3F2 z3r4G2 ZF3FG2 FGZ G it is unique Note that this solution G when it exists is a rational function of F Exercise Given a rank1 square matrix F uvT with Euclidean HuH HVTH 1 and vTu cos9 at 0 show that Fl FT but Drazin s semiinverse G Fcosz9 Evidently it can be enormously bigger than F1 Among uniquely de ned generalized not ordinary inverses the MoorePenrose Pseudo Inverse F1 is the most commonly used How does it change when F is perturbed Because F1 is a rational function of F formulas generalizing the identity near the top of this page must exist presenting the change in a way that allows a limit process to express the derivative dFldr when it exists in terms of dFd l Here is such a formula with E in place of F5F Lemma 9 E17 F1 7F1E7FE1 I7F1FE7FTE1TE1 FlFlTE7FTI7EEl Proof The identity s righthand side expands into ten terms Two of them condense as does FlFlTFT to F1 and persist Four of them condense as does FlFFTElTEl to FTElTEl and cancel the remaining four terms thus confirming the identity I presented it in 1971 at an IFIP Congress in Ljubljiana but it had already been discovered in Lund by P A Wedin for his 1969 thesis most of which he published in 1973 in BIT 13 February 4 2008 840 pm Page 7 Math H110 Huge Generalized Inverses of RankDeficient Matrices Prof W Kahan Lemma 9 leads to the following overestimate of the biggestsingularvalue norm of El7Fl Theorem 10 If E i F then HEl7FlHHE7FH s HElH4HElHZ39HFlHZHlFlH s VimaxfiiEtH HFlH2 Proof Lemma 9 s identity exhibits El 7Fl R L 7 S wherein S FlE7FEl R CIE7FTElTEl and L FlFlTE7FT I in which the orthogonal projector I l 7 FlF CDT 12 satisfies FCI O CDFl and HCDH 1 except when I O 39 similarly for 1 l 7 EEl Now we can estimate NET 7 FW by using the easily verified formula HZTHZ 7 HZHZ 7 HszH 7 the biggest eigenvalue of ZTZ Since RTs 7RTL 7 0 we find El 7 FlTEl 7 Fl 7 RTR L7STL7S whence NET 7 Fle s HRHZ HL7sH2 And since LsT 7 0 we find L7SL7ST7LLT SST whence HL7SHZS HLH2HSH2 Because HRH s 1HE7FHHElH2 HLH SHFleHE7FH1 and HSH siiFTHHE7FHHETH putting it all together yields HEl7FlHHE7FH s NEW llElllZllFlllz HFW as claimed P A Wedin s more penetrating proof got a more complicated estimate with l 2 in place of g El F5Fl can change violently for a very tiny 5F if RankF5F gt RankF since then Corollary 6 implies HF5FlH 2 lHSFH rendering Theorem 10 s overestimate gargantuan and useless On the other hand when RankF5F RankF and HSFH lt lHFlH so HSFH is too tiny for any perturbation of its size to drop the rank of F5F then Lemma 11 If positive lFlH17HFlHH5FH2HF5FlH provided RankF5FRankF too Proof Let F7AF be the matrix of rank less than RankF5F nearest F5F when gauged by the Biggest singularvalue norm Then Corollary 6 implies both H5FAFH HF5F 7FAFH lHF5FlH and lHFlH S HAFH H5FAF 7 SFH S H5FAFH HSFH lHF5FlH HSFH which turns into the lemma s inequality Inequalities more general than this because they allow 5F to be somewhat bigger and sharper than this and Theorem 10 s inequalities can be found in Wedin 1973 BIT Nordisk Tidskn39ftfar Infaman39ansbehandling 13 pp 217232 and in Stewart 1977 SIAMReview 19 pp 634662 which surveys the subject in depth They go far deeper than necessary for this course Lemma 11 and Theorem 10 implythat HF5Fl7FlH SV H5FHHFlH217HFlHHSFH2 so long as RankF5F RankF and HSFH lt lHFlH Together with Lemma 9 these yield Theorem 12 Provided RankF does not change as F varies Fl is acontinuously differentiable rational function of F with dFldr 7 7FldFd17Fl I7FlFdFdtTFlTFl FlFlTdFdrTI7FFl and HdFldrH s ViHdFddiHFTHZ and since HFlH ldistance from F to the nearest matrix of lower rank neither HFlH nor HdFld EHHdFd EH can become gargantuan unless F approaches a matrix of lower rank This illustrates an important general theme among algebraic problems Each such problem may be embedded in a family of more general problems a linear system of equations or a system of polynomial equations If such an embedding loses sight of a significant combinatorial attribute of the given problem like the rank of a matrix or like the multiplicity of an eigenvalue or a polynomial s zero the admission of even infinitesimal perturbations that upset that combinatorial attribute can turn a tame problem into a knotty if not naughty one The threshold 9 in Matlab s pinv provides a way to restore a matrix s rank after it was upset by rounding errors if successful this compensation improves the computed result enormously If compensation fails serious rethinking is in order February 4 2008 840 pm Page 8 Math H110 Chi s Trick October 24 1998 303 pm Chi s Trick for Linear Equations with Integer Coef cients Gaussian Elimination solves a system Ax b of linear equations for x by a sequence of rational operations 7 during which rounding errors occur unless some extra effort is put into performing the arithmetic exactly This effort may be worthwhile if the data A b is known exactly particularly if the data consists entirely of integers In this case the arithmetic generates rational numbers stored as pairs of integers in lowest terms 139e with no common divisor Reduction to lowest terms is timeconsuming but necessary to prevent the pairs of integers from growing enormously too wide and it reveals a curious phenomenon the divisor integers take relatively few distinct values Chio s trick exploits this phenomenon to perform elimination using exclusively integer arithmetic with integers scarcely bigger than they have to be Since its appearance in 1853 Chio s trick has been rediscovered repeatedly once by the Rev CL Dodgson who wrote Alice in Wonderland and other amusements under his penname Lewis Carroll and often with incomplete incorrect or extremely complicated proofs of validity Let s see whether these notes can do better Gaussian Elimination as Triangular Factorization Nowadays many texts explain the relation between Gaussian Elimination and the Triangular Factorization PA LU wherein P is a permutation matrix that takes account of Pivotal row exchanges L is Unit Lower Triangular it has 1 s on its diagonal and U is upper triangular Although P is a product of row exchanges performed as they are determined during the elimination process we can imagine that they had been applied in advance to produce a matrix PA whose linearly dependent rows if any are its last It is customary to take for granted also that the linearly dependent columns of PA if any are its last few as can be arranged by reordering columns if necessary In short all except perhaps the last few of the leading principal submatrices of PA are assumed invertible otherwise the factorization PA LU becomes either impossible or nonunique Our treatment of Ohio s trick is also simpli ed by the assumption that all except perhaps the last few of the leading principal submatrices of A are invertible This simpli cation is tantamount to the application in advance of whatever row andor column exchanges would otherwise be found during Chio s elimination process to be necessary to ensure the assumption s validity this is not a signi cant restriction so far as matrix computations are concerned However this restriction does prevent our treatment of Ohio s trick from being applied directly to explain how a similar trick works during the computations of greatest common divisors of pairs of polynomials and the r 39 of quot A fraction 1 39 of rational functions Our treatment begins with a representation for the intermediate stages reached during the process of Gaussian elimination or equivalently triangular factorization For each k l 2 3 after the rst k unknowns kave been eliminated from all but the rst k of the equations Ax b the remaining equations take the form Skzk gk wherein zk is obtained from x by deleting its rst k components and Sk is a Schur Complement derived from A by partitioning as follows Prof W Kahan Page 1 Tim39s r n nm an xxvna quotAnnl ML 17mm an Anlmr A n A Math H110 Chi s Trick October 24 1998 303 pm Vkerk LkOUkngk Ckahk EkI Oskgk Here Vk LkUk is the rst principal kbyk submatrix of A its triangular factors are Lk and A b Uk The existence of Vkil is assured by our simplifying assumption which also ensures the existence of the factorization Vk LkUk Because Lk is unit triangular detLk l and we nd detUk 7 detLkdetUk 7 detLkUk 7 detVk 0 so U11 exists and U11 L151 7 v11 Also Ek CkUk 1 and U16 gk Lk 1Rk rk The Schur complement of Vk in A is Sk determined from Sk gk Hk hk 7 Cka 1Rk rk Check it all out What happens when k advances to kl Lk becomes the leading klbykl principal submatrix of Lk1 and Uk does likewise for Uk1 The rest is best explained by partitioning Ska gkl Bk ka nk qk Wk Wki First Bk becomes the last element uk k of Uk the rest of the last column of Uk comes from the rst column of Uk The rest of U10 gk becomes the rst k rows of UMP gm whose last row comes from ka Tck The last row of Lk is formed by appending l to the rst row of Ek whose remaining rows form the rst k columns of EH1 whose last column is qkBk Finally another pass of elimination produces the recurrence Sk a gk1l 1 Wk Wk qkBkHPTk 75k VERIFY THE FOREGOING PARAGRAPH TO CONFIRM YOUR UNDERSTANDING OF THE PROCESSES OF ELIMINATIONAND TRIANGU39LAR FACTORIZATION For future reference note that Bk ukILkH detUk1detUk detVk1detVk Since Vkil AdjVkdetVk we see that Sk Hk 7 Ck AdjVk RkdetVk is a rational function of the elements of A with commonl denominator detVk Therefore all elements of T10 11k i detVkSka gkl detVkHka hkl Ck AdjVk Rka rkl are polynomials in the elements of A and the reduced equations Skzk gk are equivalent to equations Tkzk uk all of whose coef cients are like those of A b integers This is what Chio s trick does but not directly If we tried to compute polynomials Tk and uk directly using only additions subtractions and multiplications but no divisions the arithmetic work would grow horrendously with k Chio s trick works faster by using divisions too T Footnote Though Vk 1 AdjVIJdetVIJ and Sk Hk 7 Ck Adj V19 RkdetVIJ are rational functions of the elements of A with common denominator detVIJ some elements of Vk 1 and Sk may after reduction to lowest terms have denominators that properly divide detVIJ This certainly happens when Vk is triangular for instance Prof W Kahan Page 2 Math H110 Chi s Trick October 24 1998 303 pm Chio s Trick It is an algorithm that for each k l 2 3 in turn shall be shown to compute the coef cients Tk uk of a set of equations Tkzk uk with the same solution zk as the reduced set Skzk gk obtained from Gaussian Elimination but the elements of Tk detVkSk are polynomials in the elements of A instead of rational functions like the Schur complement Sk Hk 7 Ck Vkile Let us disregard the righthand side columns b hk rk gk gk uk for the time being since they re just along for the ride Chio s algorithm de nes a sequence of matrices Ak with elements akij thus a000 1 0 1 1 A1A so amij aij for all 1gt0 and gt0 for k l 2 3 intum ak1ij i akkk39akij akik39akkj ak71k71k71 for all igt k and j gt k Of course the algorithm would fail if any ak1k1k1 0 so this will have to be proved impossible because of our simplifying assumption about invertible leading principal submatrices 39k mTk Partitioning Ak provides a compact description of Ohio s algorithm fltkgt Mltkgt Ak1 i HkMk fkmTk Hk71 Our inductive proof of its effectiveness begins with the induction hypotheses that Wk detVk and that k1Sk1 Tk1 139e that T Hui m k B T k 1 k l ilk l fk Mk qk71 k71 These hold at k 1 because W0 l and A0 A SO so um u0Bo all detV1 0 Now suppose the hypotheses hold for k l 2 K and recall from future reference that SKA detVKdetVK1 uK W191 Then AK1 uKMK 7fKmTK uK1 from Chio s algorithm uK1 BKAWKA 7 qK1pTK1 from the second induction hypothesis WK WK71 qK71BK71pTK71 because BK71 KHK71 WK SK TK from the recurrence for SK and de nition of TK This con rms the second induction hypothesis for k Kl from which the observation that WKH MCQBK detVK1 con rms the rst End of proof To deal with the righthand side columns b uk just do unto A b whatever row operations Chio s algorithm does unto A Thus Chio s algorithm reduces the given linear system AX b through a sequence of ever smaller systems Tkzk uk whose elements are polynomials in the data A b in fact Tk uk detVkHk hk 7 Ck AdjVk Rk rk is of total degree kl If the division by Wk were omitted from Chio s algorithm the total degree would be 2k instead This is why when k is big Chio s trick saves a lot of work during exact computation with integers or symbolic algebraic data Prof W Kahan Page 3 Math H110 Geometry of Elementary Operations September 22 1998 Geometry of Elementary Operations Notation We write x BX for a vector in a geometrical space perhaps the plane or our familiar 3 dimensional space wherein basis B connects x with the column vector X of its components via an invertible linear transformation Similarly wT wTB 1 is a linear functional acting upon vectors in the geometrical space and represented by a row vector wT via the same basis B Hence wa WTX in fact all expressions involving geometrical entities like x and wT in this note are computed by replacing those entities symbols by their numerical representatives X and wT So long as we stick with just one basis B we might as well avoid the bother of typing boldface characters whenever we see X we shall let the context determine whether this refers to the column vector X or to the geometrical vector x The equation rTX constant confines X to a line if the whole space is 2dimensional a plane if 3dimensional an hyperplane if the dimension eXceeds 3 In this note the word plane will be used as an abbreviation for line or plane or hyperplane Elementary Projectors For any vector c and functional rT such that rTc 0 let P chrTc Then P projects any vector 2 onto y Pz as follows Find the plane in the family of parallel planes whose equations are rTX constant that passes through 2 its equation is rTX rTz Find the line through the origin 0 parallel to c its parametric equation is X c7 wherein 7 runs through all real scalars The line and plane intersect at y crTzrTc Pz because y is parallel to c and rTy rTz Some writers call P a projector some a projection In short elementary projector P collapses the whole vector space to a line through 0 parallel to c and the direction of collapse is parallel to the planes I TX constant Draw a picture Note that I i P P2 i O This relation characterizes every projector Another projector is Q I 7 P If chrTc Con rm that it satis es I7 Q Q2 i 0 too It collapses the whole space into the plane I TX 0 and the direction of collapse is parallel to c first because rTQz oTz 0 and second because 2 7 Q2 P2 is parallel to c Draw apicture again Unlike the projectors in most introductory teth ours need not be orthogonal we can use any direction c not in the plane r X 0 and project parallel to the plane onto c via P or else parallel to c onto the plane via Q Our Af ne vector space need not be Euclidean The projectors P and Q are called complementary not spelt complimentary because PQ I They decompose an arbitrary vector 2 into two components P2 is the component parallel to c and Q2 is the component in the plane r X 0 EXercise Verify PQ QP O what does this mean geometrically Prof W Kahan Page I Tim39s r n nm an xxvna quotAnn1 ML 17mm an M1 1 n A Math H110 Geometry of Elementary Operations September 22 1998 Elementary Re ectors A reflector is a linear operator W that satis es W2 I i W Some writers call W a re ection or if British a re exion An example is 7I which re ects through the origin 0 it re ects z to z All elementary reflectors have the form W I 72chrTc with rTc 0 so W I 7 2P 2Q 7 I Every elementary re ector leaves a plane unchanged the plane s equation is rTx 0 since Wz z for every vector 2 in this plane Every vector 2 not in this plane has a nonzero component Pz parallel to c pointing out of this plane W reverses that component 2 Pz Qz so Wz WPz WQz 7P2 Qz For 1 010 exampletake rT 7110 and c 1 so W 100 This Wisa 0 001 permutation matrix that swaps the rst two elements of a row or column vector In general every permutation of two elements is a re ector leaving unchanged the plane of vectors with those elements equal and reversing vectors with only those elements nonzero and of opposite sign In most introductory texts re ectors are all orthogonal re ectors with c rTT in an Euclidean space but our re ectors work in a more general Affine space and can reverse any direction c not in the mirrorplane rTx 0 Draw a picture Why are Elementary Re ectors and Projectors called Elementary Let C c1 c2 c9 be a row of k linearly independent vectors cj drawn from an ndimensional vector space we assume 0 ltk ltn This C serves as a basis for a kdimensional subspace S of the ndimensional space Many a basis C C c1 c2 ck ck c for the nspace can be built up from C by augmenting it successively by vectors ck ck cn each chosen to be linearly independent of all its predecessors though otherwise arbitrary Let RT be the column of the first k linear functionals rows rTl rTz er in C Cfl whence it follows that RTC Ik the k byk identity matrix and RIC Okvrkk Can you see why Finally let P CRT It is a projector because 0 P2 P I Why is P I Complementary to P is the projector Q I 7 P 39 and W I 7 2P 2Q 7I is a re ector because W2 I W What is the inverse of a re ector W 7 P and W leave unchanged the n7kdimensional subspace S spanned by C which turns out to be also the plane of all vectors 5 that satisfy RTs o Can you see why In short the nspace has been decomposed into a sum of two complementary subspaces S and S neither of which determines the other because they need not be orthogonal and projector P collapses the space into S along lines parallelto S Complementary projector Q collapses the space into S along lines parallel to S Re ector W re ects the space in the mirror S along lines parallel to S Although these operators preserve subspaces more general than the lines or hyperplanes preserved by elementary operators these nonelementary operators can be decomposed into elementary operators as we shall see now Let Pj cerj for j l 2 k Evidently Pj sz is an elementary projector moreover Pin 0 if i j 39 can you see why These elementary projectors are said to annihilate each other Thus P P1 P2 Pk is the sum of elementary and mutually annihilating projectors Similarly W is a product of k elementary re ectors G I 2Pj in any order Confirm that Win 39 they commute Finally we have a nontrivial theorem Every projector can be decomposed into a sum of elementary and mutually annihilating projectors Every re ector can be decomposed into a product of elementary and commuting re ectors Every re ector or projector decomposes the space into complementary subspaces S and S as above This could be proved now but its proof will become simpler after we have studied eigenvalues and eigenvectors Prof W Kahan Page 2 Math H110 Geometry of Elementary Operations September 22 1998 Elementary Dilatators Expansions and Compressions For any elementary projector P chrTc as de ned above and for any nonzero scalar u let Mu I u1P Q MP The elementary re ector W Ml is a special case In general the dilatator Mu expands if 1111 gt 1 or compresses if 1111 lt l the space parallel to c while keeping the plane rTc 0 unchanged If u lt 0 the dilatator also re ects the space in that unchanged plane Other writers use dilator dilation dilatation expander expansion compressor or compression in place of dilatator or restrict them to Euclidean space with c rTT orthogonal perpendicular to the plane rTz 0 left unchanged Our c need not be orthogonal Draw pictures to illustrate diverse dilatators Exercise Con rm that Mu MO Mu7 and therefore Mu 1 Mlu 0 l 0 0 Example For rT 0 1 0 and c 1 Mu 0 H 0 is the elementary operation 0 0 0 1 that multiplies the second row or column of a 3by3 matrix by u Every elementary row operation that multiplies a row by a nonzero constant is a dilatator analogous to this one Elementary Shears Choose now any nonzero vector s and functional tT constrained by th 0 In other words choose rst any plane th 0 through 0 and then any nonzero vector s in that plane Now an elementary shear SH I ustT slides an arbitrary vector z to y S z by adding to z a multiple of s proportional to the distance between the two parallel planes th 0 and th th one through 0 and the other through z Draw pictures A good way to visualize the effect of an elementary shear is to imagine a deck of playing cards stacked straight so that the stack s sides look like rectangles Shearing the deck slides the cards keeping their respective edges parallel in such a way that the stack s sides remain parallelograms The bottom card stays fixed in the plane th 0 Exercise Verify SH SB SW3 and 8 71 S H T 6 T l 611 0 Example For t 0 1 0 and s 0 SfL Iust 0 1 0 is the 7 0 711 1 elementary row operation that adds 611 times the second row to the first and 711 times the second row to the third of a 3by3 matrix Every elementary row operation that adds multiples of one row to others is analogously premultiplication by a shear The inverse operation subtracting is a shear too such r 39 upon columns amount to postmultiplications by shears Some writers restrict an elementary shear to have only one nonzero element in s and one in tT but this restriction gains nothing What is common to all elementary operators Each of them of every kind is determined by an operator of rank 1 Prof W Kahan Page 3 Math H110 Geometry of Elementary Operations September 22 1998 Conclusion Every nonsingular invertible square matrix is a product of Elementary Re ectors Dilatators and Shears This is so because a product of these elementary row operations suffices to reduce the matrix to its Reduced Row Echelon Form which must be the identity matrix I if the original matrix is invertible Hence the original matrix is a product of these operations inverses all of them elementary too Similarly every invertible linear operator mapping a finitedimensional space to itself is a product of Elementary Re ectors Dilatators and Shears This is so because it is true for the square nonsingular matrix that represents the linear operator in any basis for the space Work through the details The following may help REVIEW Change of Basis as a Nonsingular Matrix Suppose B is one basis for avector space and E another Since every basis vector e in E is expressible in terms ofa column vector c Bile collecting those columns produces a matrix C BTIE that appears in changes of coordinates as follows Let x Bx be any vector in the space and x its column vector of coordinates in the basis B Since E is a basis too x Ex for some other column vector x The relation between x and x is this Bx Ex so x BilEx Cx And C 1 exists because x Eilx E 1 Bx Cilx is determinable from every x Conversely any invertible matrix C is the matrix that changes coordinates to x Bilx with one basis B from x Eilx with another basis E BC x Cx Exercise If 11T uTB 1 in one basis B but 11T uTE 1 in another basis E BC how do we compute uT given 101T and C How do we compute uTxuTx given UT and x and C Now let L be any linear operator that maps the vector space to itself This L is representable by amatrix L determined from a basis B of the vector space thus if x Bx maps to y Lx representable by column vector y Bily then y B 1Lx B 1LBx Lx for the matrix L BTILB A change of basis from B to E BC changes the matrix representing L from L to L ET1 LE CEILC and the coordinates of y become y Lx can you confirm this Exercise If linear operator Q maps a space with basis B linearly to another space with basis H of perhaps different dimension what matrix Q represents Q in these bases Thus a nonsingular matrix can represent a few things a change of basis an invertible map from a space to itself an invertible map from one space to another of the same dimension Out of context a matrix does not say what it represents The mathematically interesting question about it is this Given a matrix that represents a linear map between two spaces and given a characterization of the spaces but not the relevant basis or bases what geometrical properties of the linear map can be inferred without knowing the basis or bases Prof W Kahan Page 4 Math H110 How to Recognize a Quadratic Form October 31 2000 647 am What is a Quadratic Form It is a scalarvalued functional fV VV obtained from a Symmetric Bilinear Form uV a functional that is both symmetric uV Vu and linear utLV 15w tt uV B uw for all real scalars M and B and all vectors 11 V and w in a real vector space If column vectors u and v represent vectors u Bu and V Bv in some basis B then uV uTYv vTYu for some symmetric matrix Y YT and fV vTYv Symmetric bilinear operator maps the space of vectors u and V linearly to its dual space u7 is a linear functional in the space dual to vectors V and uV Vu is its scalar value 7V is a linear functional in the space dual to vectors u and uV Vu is its scalar value See the notes on Least Squares Approximation and Bilinear Forms Matrix Y represents this operator in the basis B representing u7 by uTY and Yiv by vTY Changing B to anew basis BC 1 changes 11 s representative u to Cu in order to keep 11 BC 1Cu and similarly changes v to Cv and then to keep uV CuTCT 1YC 1Cv unchanged Y changes to CTilYC 1 In other words the Congruerit matrices Y and CTilYC 1 represent the same symmetric bilinear form and the same quadratic form f in different coordinate systems bases A quadratic form fV ZVV is obtainable from any non symmetric bilinear form ZuV Zvu represented by a nonsymmetric matrix Z thus fV vTZv even though uTZv vTZu No good purpose is served this way f depends upon only the symmetric part of Z defined thus uV ZuV Zvu2 and so Y Z ZT2 Moreover only the symmetric bilinear form can be recovered from f via the identity uV fuV 7 fu7V 4 Exercise Suppose Q is a matrix perhaps neither square nor real that satisfies QxTQx xTx for all real column vectors x of the right dimension Must QTQ I 7 Why Another way to recover a symmetric bilinear form from its quadratic form is by differentiation dfV f39VdV 2 VdV In other words the derivative of a quadratic form fV is a linear functional f39V of course that is also linear in the form s argument V and symmetric in so far as f39Vu f39uV In matrix terms f 39V 2vTY and f39Vu 2vTYu 2uTYv f 39uV Conversely if yV satisfies yo 0 and if its derivative y39V 2 V is linear in V then yquotV 2 is a constant symmetric bilinear operator symmetric because all continuous second derivatives are symmetric and yV lovy39udu VV is a quadratic form In matrix terms y39Vu 2vTYu is linear in v yquot Vuw 2wTYu 2uTYw yquotVwu because Y YT must be symmetric and yV lav 2uTYdu vTYv regardless of the path of integration In short a quadratic form can be recognized as such by determining whether its derivative depends linearly on the form s argument Quadratic forms can be defined over complex vector spaces but in two ways There are complex quadratic forms that take complex scalar values algebraically the same as above but quite different from what follows There are realvalued quadratic forms very much like what follows but each obtained from an H emiiian bilinear form Huv that is linear in one argument say u and ConjugateLinear in the other Huuv Bw EHuv BHuw where E and B are the complex conjugates of LL and B Bilinear operator H is Hermitian just when Hvu HIE In matrix terms Huv vHu with an Hermitian matrix H H its complexconjugate transpose H is congruent to C 1HC 1 The Hermitian quadratic form fv va VHv obtained from H is real for all complex v and consequently slightly more complicated than real quadratic forms on real spaces to which this note is confined Prof W Kahan Page 12 The r n nm an xxvna quotAnnl ML 17mm AI Anlmr A n A Math H110 How to Recognize a Quadratic Form October 31 2000 647 am Every quadratic form f satis es an identity called the Parallelogram Law fx y fx 7 y 7 2fx my This law is easy to deduce when f is obtained from a given bilinear form do so The law gets its name from the case Wfv HvH of ordinary length de ned in an Euclidean space by the Pythagorean formula Conversely Theorem Any real continuous scalar function f that satis es the parallelogram law for all vectors x and y in a real linear space must be a quadratic form This seems plausible setting x y 0 implies f0 0 and then setting x 0 and letting y 7gt 0 in fy 7 f7y 0 implies f390 0T if the derivative exists and then doing it again with fx y 7 2fx fx 7 y 2fy suggests that fquotx is independent of x But a suggestion is not a proof The Theorem s proof found by C Jordan and J von Neumann early in the 20th century is unusual enough to be worth reproducing here First we need a Lemma If a continous scalar functional 9x satis es 90 0 and cxy 9x7y 29x for all vectors x and y in a real linear space 9x must be a linear functional 9x ch Proof Start by discovering that 9xy c xy2 xy2 c xy2 7 xy2 since 90 0 29 xy2 by hypothesis 9 xy2 xy2 c xy2 7 xy2 by hypothesis 9x 9y for all vectors x and y Next for positive integers n l 2 3 in turn use this discovery to verify by induction that 901x 9n1x 90 n19x 90 n90 Then from 0 9nx 7 nx 9nx cnx verify that 9nx nltx for all x And from 9x 9nxn n9xn infer that 9xn 9xn Similarly 9mnx mnltx for all rational mn Since every real number M is a limit of rational numbers and since 9 is continuous 9tlx 9x for every real M and every vector x This ensures that 9uxBy McxBcy con rming that 9 must be a linear functional 9x ch for some c End of lemma s proof To prove the Theorem take any given scalar function f that satis es the parallelogram law above and from f construct the functional Cx y fxy 7 fxy 4 Evidently Cx 0 0 and Cx x f2x4 fx and Cy x Cx y via the Parallelogram Law Moreover 4 W 4 xy fzxy 7 fzxy fzxy 7 fzxy 2fzx 2fy 7 2fzx 7 2fy Parallelogram Law 7 89z x for all vectors x y and z Now identify Cz x with the 9x of the lemma to deduce that Q is a linear functional of its second argument and because Cx y Cy x also a linear functional of its rst In short Q is a symmetric bilinear functional of its two arguments in any coordinate system in which x and y are represented by column vectors x and y we conclude Cx y yTYx for some real symmetric matrix Y YT as claimed Exercise How do and the coordinate system basis determine the matrix Y Prof W Kahan Page 22

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.