Note for GEOS 567 with Professor Richardson at UA
Note for GEOS 567 with Professor Richardson at UA
Popular in Course
Popular in Department
This 39 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Arizona taught by a professor in Fall. Since its upload, it has received 18 views.
Reviews for Note for GEOS 567 with Professor Richardson at UA
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Geosciences 567 CHAPTER 3 RMRGZ CHAPTER 3 INVERSE METHODS BASED ON LENGTH 31 Introduction This chapter is concerned with inverse methods based on the length of various vectors that arise in a typical problem The two most common vectors concerned are the data error or misfit vector and the model parameter vector Methods based on the first vector give rise to classic least squares solutions Methods based on the second vector give rise to what are known as minimum length solutions Improvements over simple least squares and minimum length solutions include the use of information about noise in the data and a priori information about the model parameters and are known as weighted least squares or weighted minimum length solutions respectively This chapter will end with material on how to handle constraints and on variances of the estimated model parameters 32 Data Error and Model Parameter Vectors The data error and model parameter vectors will play an essential role in the development of inverse methods They are given by data error vector e dObs dPre 31 and model parameter vector m 32 The dimension of the error vector e is N X 1 while the dimension of the model parameter vector is M X 1 respectively In order to utilize these vectors we next consider the notion of the size or length of vectors 33 Measures of Length The norm of a vector is a measure of its size or length There are many possible definitions for norms We are most familiar with the Cartesian L2 norm Some examples of norms follow N L1 Zzlell 33 11 31 Geosciences 567 CHAPTER 3 RMRGZ Lz2Ier 64gt Lzlew and finally LN max lezl 36 Inverse methods based on different norms can and often do give different answers The reason is that different norms give different weight to outliers For example the L0 norm gives all the weight to the largest misfit Low order norms give more equal weight to errors of different sizes The L2 norm gives the familiar Cartesian length of a vector Consider the total misfit E between observed and predicted data It has units of length squared and can be found either as the square of the L2 norm of e the error vector Equation 31 or by noting that it is also equivalent to the dot or inner product of e with itself given by 81 E T eZ N 2 ee el ez 6N Ze 37 11 eN Inverse methods based on the L2 norm are also closely tied to the notion that errors in the data have Gaussian statistics They give considerable weight to large errors which would be considered unlikely if in fact the errors were distributed in a Gaussian fashion Now that we have a way to quantify the misfit between predicted and observed data we are ready to define a procedure for estimating the value of the elements in m The procedure is to take the partial derivative of E with respect to each element in m and set the resulting equations to zero This will produce a system of M equations that can be manipulated in such a way that in general leads to a solution for the M elements of m The next section will show how this is done for the least squares problem of finding a best fit straight line to a set of data points 32 Geosciences 567 CHAPTER 3 RMRGZ 34 Minimizing the Misfit Least Squares 341 Least Squares Problem for a Straight Line Consider the figure below after Figure 31 from Menke page 36 a b 1 Least squares fitting of a straight line to z d pairs b The error 61 for each observation is the difference between the observed and predicted datum el d dpre l The ith predicted datum dlPre for the straight line problem is given by dime m1 mzzl 38 where the two unknowns m1 and m2 are the intercept and slope of the line respectively and zi is the value along the z axis where the ith observation is made For N points we have a system of N such equations that can be written in matrix form as all 1 z1 m d 1 z 1 1 1 ml 39 dN 1 ZN Or in the by now familiar matrix notation as 33 Geosciences 567 CHAPTER 3 RMRGZ d G m NX1Ngtlt2 2X1 The total misfit E is given by EeTeZNal bs dp 2 1Obs m1 quot122112 Dropping the obs in the notation for the observed data we have N E 21d 2dlm1 2dlmzzl 2m1mzzl m12m222121 1 113 310 311 312 Then taking the partials of E with respect to m1 and m2 respectively and setting them to zero yields the following equations N N n E 2Nm1 42113211122 0 1 11 11 and 8E am 2 N N N 22dlzl 2m1221 2m221212 0 11 11 11 Rewriting Equations 313 and 314 above yields le m22212d1 and quot11221 mzzlzl2 Zdlzl Combining the two equations in matrix notation in the form Am b yields N 22 n1 2d 22 22 m2 2d or simply 34 313 314 315 316 317 Geosciences 567 CHAPTER 3 RMRGZ A m b 318 2x22gtlt1 2x1 Note that by the above procedure we have reduced the problem from one with N equations in two unknowns m1 and m2 in Gm d to one with two equations in the same two unknowns in Am b The matrix equation Am b can also be rewritten in terms of the original G and 1 when one notices that the matrix A can be factored as 1 21 N Z z 1 1 1 1 z 2 2 GTG 319 221 221 21 22 ZN 1 ZN 2X2 2gtltN Ngtlt2 2X2 Also b above can be rewritten similarly as d1 2 d 1 1 1 d 2 GTd 320 2 6121 21 22 ZN dN Thus substituting Equations 319 and 320 into Equation 317 one arrives at the so called normal equations for the least squares problem GTGm GTd 321 The least squares solution mm is then found as mm GTG 1 GTd 322 assuming that GTG 1 exists In summary we used the forward problem Equation 39 to give us an explicit relationship between the model parameters m1 and m2 and a measure of the misfit to the observed data E Then we minimized E by taking the partial derivatives of the misfit function with respect to the unknown model parameters setting the partials to zero and solving for the model parameters 35 Geosciences 567 CHAPTER 3 RMRGZ 342 Derivation of the General Least Squares Solution We start with any system of linear equations which can be expressed in the form 1 G m 113 in NXMMgtlt1 Again let E eTe d dPr9Td dPre E d GmTd Gm 323 E d1 Gymjdl Gkmk 324 11 As before the procedure is to write out the above equation with all its cross terms take partials of E with respect to each of the elements in m and set the corresponding equations to zero For example following Menke page 40 Equations 36 39 we obtain an expression for the partial of E with respect to mq 8E M N N 2kaZG 6 42651 0 325 am k1 1 q q q 11 We can simplify this expression by recalling Equation 24 from the introductory remarks on matrix manipulations in Chapter 2 M c Z aka 24 Note that the first summation on i in Equation 325 looks similar in form to Equation 24 but the subscripts on the first G term are backwards If we further note that interchanging the subscripts is equivalent to taking the transpose of G we see that the summation on i gives the qkth entry in GTG N N 2quka EST 6 GTGqk 326 11 11 Thus Equation 325 reduces to 8E 8m 9 M N 22 mkGTGqk 226151 0 327 k1 11 Now we can further simplify the first summation by recalling Equation 26 from the same section d 2amp6 m 26 36 Geosciences 567 CHAPTER 3 RMRGZ To see this clearly we rearrange the order of terms in the first sum as follows imAGTGu GTG1m GTGm 328 k 1 which is the qth entry in GTGm Note that GTGm has dimension M X NN gtlt MM X 1 M X 1 That is it is an M dimensional vector In a similar fashion the second summation on i can be reduced to a term in GTdq the qth entry in an M X NN X 1 M X 1 dimensional vector Thus for the qth equation we have 0 am 2GTGmq 2GTdq 329 9 Dropping the common factor of 2 and combining the q equations into matrix notation we arrive at GTGm GTd 330 The least squares solution for m is thus given by mm GTG 1GTd 331 The least squares operator Gils is thus given by Gils GTG11GT 332 Recalling basic calculus we note that mm above is the solution that minimizes E the total misfit Summarizing setting the q partial derivatives of E with respect to the elements in m to zero leads to the least squares solution We have just derived the least squares solution by taking the partial derivatives of E with respect to MI and then combining the terms for q 1 2 M An alternative but equivalent formulation begins with Equation 32 but is written out as E d GmTd Gm 323 2 1T mTGTd Gm de dTGm mTGTd mTGTGm 333 Then taking the partial derivative of E with respect to mT turns out to be equivalent to what was done in Equations 325 330 for mq namely 37 Geosciences 567 CHAPTER 3 RMRGZ 0E0mT GTd GTGm 0 334 which leads to GTGm GTd 330 and mm GTG 1GTd 331 It is also perhaps interesting to note that we could have obtained the same solution without taking partials To see this consider the following four steps Step I We begin with Gmd 113 Step 2 We then premultiply both sides by GT GTGm GTd 330 Step 3 Premultiply both sides by GTG 1 GTG 1GTGm GTG 1GTd 335 Step 4 This reduces to mm GTG 1GTd 331 as before The point is however that this way does not show why mm is the solution which minimizes E the misfit between the observed and predicted data All of this assumes that GTG 1 exists of course We will return to the existence and properties of GTG 1 later Next we will look at two examples of least squares problems to show a striking similarity that is not obvious at first glance 343 Two Examples of Least Squares Problems Example 1 Best Fit Straight Line Problem We have of course already derived the solution for this problem in the last section Brie y then for the system of equations 38 Geosciences 567 CHAPTER 3 RMRGZ dGm 01 given by all 1 21 d2 1 22 m1 3399 dN 1 ZN we have 1 21 1 1 z N 221 GR eam 21 22 ZN 221 221 1 zN and d1 1 d 2611 GW 2 63m 21 22 ZN Zdlzl dN Thus the least squares solution is given by m1 N z 3 z 338 m2 LS 221 2212 26121 39 Example 2 Best Fit Parabola Problem The ith predicted datum for a parabola is given by dl m1 mzzi m3zl2 339 where m1 and m2 have the same meanings as in the straight line problem and m3 is the coefficient of the quadratic term Again the problem can be written in the form where now we have dGm 01 39 Geosciences 567 CHAPTER 3 RMRGZ d1 1 Z1 212 E E 3 m1 611 1 z 2 m2 340 E m3 dN 1 ZN 212v and N Z z Z z2 Z d GTG Zz Zz2 22 em 2d 341 2 z2 Z 2 Z z4 Z d 2 As before we form the least squares solution as mm GTG 1GTd 331 Although the forward problems of predicting data for the straight line and parabolic cases look very different the least squares solution is formed in a way that emphasizes the fundamental similarity between the two problems For example notice how the straight line problem is buried within the parabola problem The upper left hand 2 X 2 part of GTG in Equation 341 is the same as Equation 336 Also the first two entries in GTd in Equation 341 are the same as Equation 337 Next we consider a four parameter example 344 FourParameter Tomography Problem Finally let39s consider a four parameter problem but this one based on the concept of tomography n PM I I t2 11 V3 4 3 I 4 I gt I2 I hihihsls 33942 39 39 V1 V3 t3 t4 t4 hihi hs2 34 v2 v4 40 Geosciences 567 CHAPTER 3 RMRGZ 1 1 1 0 0 s1 t2 h0 0 1 1 s2 rs 1 0 1 0 s3 33943 14 0 1 0 1 s4 01 dGm 113 1 0 1 0 1 1 0 0 2 1 1 0 T 21001001121201 GGh h 344 0 1 1 0 1 0 1 0 1 0 2 1 0 1 0 1 0 1 0 1 0 1 1 2 l 1l 3 tt GTdh1 4 345 l 2l 3 l 2l 4 So the normal equations are GTGm GTd 321 2 1 1 0 s1 n5 1 2 0 1 3 1 h 2 1 4 1 0 2 1 s3 1213 33946 0 1 1 2 s4 12 01 2 1 1 0 51 h 1 2 0 1 11 347 S S S S 1 1 0 2 2 31 4 t2t3 0 1 1 2 12 Example s1s2S3S4L h 1thent1t2t3t42 By inspection s1 s2 s3 s4 1 is a solution but so is s1 s4 2 s2 s3 0 or s1 s4 0 S2 S3 2 41 Geosciences 567 CHAPTER 3 RMRGZ Solutions are nonunique Look back at G Are all of the columns or rows independent No What does that imply about G and GTG Rank lt 4 What does that imply about GTG1 It does not exist So does mLs exist No Other ways of saying this The vectors gl do not span the space of m Or the experimental set up is not sufficient to uniquely determine the solution Note that this analysis can be done without any data based strictly on the experimental design Another way to look at it Are the columns of G independent No For example coefficients 1 1 1 and 1 will make the equations add to zero What pattern does that suggest is not resolvable Now that we have derived the least squares solution and considered some examples we next turn our attention to something called the determinancy of the system of equations given by Equation 113 d Gm 113 This will begin to permit us to classify systems of equations based on the nature of G 35 Determinaney of Least Squares Problems See Pages 46 52 Menke 351 Introduction We have seen that the least squares solution to d Gm is given by mm GTG 1GTd 331 There is no guarantee as we saw in Section 344 that the solution even exists It fails to exist when the matrix GTG has no mathematical inverse We note that GTG is square M X M and it is at least mathematically possible to consider inverting GTG NB The dimension of GTG is M X M independent of the number of observations made Mathematically we can say the GTG has an inverse and it is unique when GTG has rank M The rank of a matrix was considered in Section 223 Essentially if GTG has rank M then it has enough information in it to resolve M things in this case model parameters This happens when all M rows or equivalently since GTG is square all M columns are independent Recall also that independent means you cannot write any row or column as a linear combination of the other rows columns GTG will have rank lt M if the number of observations N is less than M Menke gives the example pp 45 46 of the straight line fit to a single data point as an illustration If GTG 1 does not exist an infinite number of estimates will all fit the data equally well Mathematically GTG has rank lt M if GTG 0 where GTG is the determinant of GTG 42 Geosciences 567 CHAPTER 3 RMRGZ Now let us introduce Menke s nomenclature based on the nature of GTG and on the prediction error In all cases the number of model parameters is M and the number of observations is N 352 EvenDetermined Problems M N If a solution exists it is unique The prediction error dObs dPre is identically zero For example 1 1 0 m1 2H5 llml for which the solution is m 1 3T 353 Overdetermined Problems Typically N gt M With more observations than unknowns typically one cannot fit all the data exactly The least squares problem falls in this category Consider the following example 1 m1 2 5 1 349 1 This overdetermined case consists of adding one equation to Equation 348 in the previous example The least squares solution is 1333 4833T The data can no longer be fit exactly 354 Underdetermined Problems Typically M gt N With more unknowns than observations m has no unique solution A special case of the underdetermined problem occurs when you can fit the data exactly which is called the purely underdetermined case The prediction error for the purely underdetermined case is exactly zero ie the data can be fit exactly An example of such a problem is 1l 2 1l 350 Possible solutions include 0 1T 05 0T 5 9T 13 13T and 04 02T The solution with the minimum length in the L2 norm sense is 04 02T 43 Geosciences 567 CHAPTER 3 RMRGZ The following example however is also underdetermined but no choice of m1 m2 m3 will produce zero prediction error Thus it is not purely underdetermined 1 1 2 1 ml m 351 1 2 4 2 m 3 You might want to verify the above examples Can you think of others Although I have stated that overdetermined underdetermined problems typically have N gt M N lt M it is important to realize that this is not always the case Consider the following 1100 2100ml 3011m2 33952 m3 4022 For this problem m1 is overdetermined that is no choice of m1 can exactly fit both d1 and d2 unless d1 happens to equal d2 while at the same time m2 and m3 are underdetermined This is the case even though there are two equations ie the last two in only two unknowns m2 m3 The two equations however are not independent since two times the next to last row in G equals the last row Thus this problem is both overdetermined and underdetermined at the same time For this reason I am not very satisfied with Menke s nomenclature As we will see later when we deal with vector spaces the key will be the single values much like eigenvalues and associated eigenvectors for the matrix G 36 Minimum Length Solution The minimum length solution arises from the purely underdetermined case N lt M and can fit the data exactly In this section we will develop the minimum length operator using Lagrange multipliers and borrowing on the basic ideas of minimizing the length of a vector introduced in Section 34 on least squares 361 Background Information We begin with two pieces of information 1 First GTG 1 does not exist Therefore we cannot calculate the least squares solution mm GTG 1GTd 44 Geosciences 567 CHAPTER 3 RMRGZ 2 Second the prediction error e dObs dPre is exactly equal to zero To solve underdetermined problems we must add information that is not already in G This is called a priori information Examples might include the constraint that density be greater than zero for rocks or that v the seismic P wave velocity at the Moho falls within the range 5 lt v lt10 kms etc Another a priori assumption is called solution simplicity One seeks solutions that are as simple as possible By analogy to seeking a solution with the simplest misfit to the data ie the smallest in the least squares problem one can seek a solution which minimizes the total length of the model parameter vector m At first glance there may not seem to be any reason to do this It does make sense for some cases however Suppose for example that the unknown model parameters are the velocities of points in a uid A solution that minimized the length of m would also minimize the kinetic energy of the system Thus it would be appropriate in this case to minimize m It also turns out to be a nice property when one is doing nonlinear problems and the m that one is using is actually a vector of changes to the solution at the previous step Then it is nice to have small step sizes The requirement of solution simplicity will lead us as shown later to the so called minimum length solution 362 Lagrange Multipliers See Page 50 and Appendix Al Menke Lagrange multipliers come to mind whenever one wishes to solve a problem subject to some constraints In the purely underdetermined case these constraints are that the data misfit be zero Before considering the full purely underdetermined case consider the following discussion of Lagrange Multipliers mostly after Menke Lagrange Multipliers With 2 Unknowns and 1 Constraint Consider Ex y a function of two variables Suppose that we want to minimize Ex y subject to some constraint of the form x y 0 The steps using Lagrange multipliers are as follows next page 45 Geosciences 567 CHAPTER 3 RMRGZ Step I At the minimum in E small changes in x and y lead to no change in E Ex y constant 1 D1 minimum x gt 8E 8E dE d d 0 ampC x d y 353 Step 2 The constraint equation however says that dx and dy cannot be varied independently since the constraint equation is independent or different from E Since x y 0 for all x y then so must d x y 0 But d dx dy0 354 Step 3 Form the weighted sum of 353 and 354 as dEM Ade1dy0 355 where it is a constant Note that 355 holds for arbitrary in Step 4 If it is chosen however in such a way that QE 8 1 0 ax ax 356 then it follows that 8E 8 x1 0 357 3y 46 Geosciences 567 CHAPTER 3 RMRGZ since at least one of dx dy in this case dy is arbitrary ie dy may be chosen nonzero When xi has been chosen as indicated above it is called the Lagrange multiplier Therefore 355 above is equivalent to minimizing E xi without any constraints ie a 8E e 3E1 ampCAax 0 358 and iEt a Ez0 359 Step 5 Finally one must still solve the constraint equation We y 0 360 Thus the solution for x y that minimizes E subject to the constraint that x y 0 is given by 358 359 and 360 That is the problem has reduced to the following three equations E ac1ampC0 356 8E a 1 0 357 and ML y 0 360 in the three unknowns x y xi Extending the Problem to M Unknowns and N Constraints The above procedure used for a problem with two variables and one constraint can be generalized to M unknowns in a vector m subject to N constraints m 0j 1 N This leads to the following system of M equations i 1 M 3E 1 3 0 361 8m 1 8m with N constraints of the form m 0 362 47 Geosciences 567 CHAPTER 3 RMRGZ 363 Application to the Purely Underdetermined Problem With the background we now have in Lagrange multipliers we are ready to reconsider the purely underdetermined problem First we pose the following problem find m such that me is minimized subject to theN constraints that the data misfit be zero M e 611 le 511 ZGUmJ 0i 1 N 363 1 That is minimize N mm me 2M 364 11 with respect to the elements mi in m We can expand the terms in Equation 364 and obtain M N M ymZm 22 ZGUmJ 365 k1 11 F1 Then we have all M amk N M am 2 m A G I 366 amt 1621 amt k 121 121 j amt but 8m am am 51m and am 5H 367 Thus aw 2m 0 q12M 368 In matrix notation over all q Equation 368 can be written as 2m GT 0 369 48 Geosciences 567 CHAPTER 3 RMRGZ where A is an N X 1 vector containing the N Lagrange Multipliers 1i i 1 N Note that GT7 has dimension M X N x N X 1 M X 1 as required to be able to subtract it from m Now solving explicitly for m yields m GT2v 370 The constraints in this case are that the data be fit exactly That is d Gm 113 Substituting 370 into 113 gives 1 Gm GGTX 371 which implies d GGTX 372 where GGT has dimension N X M X M X N or simply N X N Solving for X when GGT 1 exists yields 7 2GGT 1d 373 The Lagrange Multipliers are not ends in and of themselves But upon substitution of Equation 373 into 370 we obtain m GT2 GT2GGT1d 39 Rearranging we arrive at the minimum length solution mML mML GTGGT 1d 374 where GGT is an N X N matrix and the minimum length operator GMIL is given by GM1L GTGGT 1 375 The above procedure then is one that determines the solution which has the minimum length L2 norm me12 amongst the infinite number of solutions that fit the data exactly In practice one does not actually calculate the values of the Lagrange multipliers but goes directly to 374 above The above derivation shows that the length of m is minimized by the minimum length operator It may make more sense to seek a solution that deviates as little as possible from some prior estimate of the solution ltmgt rather than from zero The zero vector is in fact the prior 49 Geosciences 567 CHAPTER 3 RMRGZ estimate ltmgt for the minimum length solution given in Equation 374 If we wish to explicitly include ltmgt then Equation 374 becomes mML ltmgt GTGGT 1d Gltmgt ltmgt Gmm Gltmgt GlgllLd I GMlLGltmgt 376 We note immediately that Equation 376 reduces to Equation 374 when ltmgt 0 364 Comparison of Least Squares and Minimum Length Solutions In closing this section it is instructive to note the similarity in form between the minimum length and least squares solutions Least Squares mm GTG 1GTd 331 with Gils GTG11GT 332 Minimum Length mML ltmgt GTGGT 1d Gltmgt 376 with GMIL GTGGT 1 375 The minimum length solution exists when GGT 1 exists Since GGT is N X N this is the same as saying when GGT has rank N That is when the N rows or N columns are independent In this case your ability to predict or calculate each of the N observations is independent 365 Example of Minimum Length Problem Recall the four parameter four observation tomography problem we introduced in Section 344 At that time we noted that the least squares solution did not exist because GTG1 does not exist since G does not contain enough information to solve for 4 model parameters In the same way G does not contain enough information to fit an arbitrary 4 observations and GGT 1 does not exist either for this example The basic problem is that the four paths through the structure do not provide independent information However if we eliminate any one observation let s say the fourth then we reduce the problem to one where the minimum length solution exists In this new case we have three observations and four unknown model parameters and hence N lt M G which still has enough information to determine three observations uniquely is now given by 377 COD I D It O Ot IO 50 Geosciences 567 CHAPTER 3 RMRGZ And GGT is given by 2 0 l GGT 0 2 1 378 l l 2 Now GGT 1 does exist and we have 025 025 05 T Tq 075 025 05 G GG 379 025 025 05 025 075 05 If we assume a true model given by m 10 05 05 05T then the data are given by d 15 10 15T The minimum length solution mML is given by mML GTGGT 1d 2 0875 0625 0625 0375T 380 Note that the minimum length solution is not the quottruequot solution This is generally the case since the quottruequot solution is only one of an infinite number of solutions that fit the data exactly and the minimum length solution is the one of shortest length The length squared of the quottruequot solution is 175 while the length squared of the minimum length solution is 16875 Note also that the minimum length solution varies from the quottruequot solution by 0 125 0125 0125 0125T This is the same direction in model space ie 1 1 1 1T that represents the linear combination of the original columns of G in the example in Section 344 that add to zero We will return to this subject when we have introduced singular value decomposition and the partitioning of model and data space 37 Weighted Measures 0139 Length 371 Introduction One way to improve our estimates using either the least squares solution mm GTG 1GTd 331 or the minimum length solution mML ltmgt GTGGT 1d Gltmgt 376 is to use weighted measures of the misfit vector 51 Geosciences 567 CHAPTER 3 RMRGZ e dobs dpre 381 or the model parameter vector m respectively The next two subsections will deal with these two approaches 372 Weighted Least Squares Weighted Measures of the Mis t Vector e We saw in Section 34 that the least squares solution mm was the one that minimized the total misfit between predicted and observed data in the L2 norm sense That is E in 81 T 82 N 2 Eeee1 e2 eN zzel 37 11 8N is minimized Consider a new E defined as follows E eTWee 382 and where We is an as yet unspecified N X N weighting matrix We can take any form but one convenient choice is we cov d 1 383 where cov d 1 is the inverse of the covariance matrix for the data With this choice for the weighting matrix data with large variances are weighted less than ones with small variances While this is true in general it is easier to show in the case where We is diagonal This happens when cov d is diagonal which implies that the errors in the data are uncorrelated The diagonal entries in cov d 1 are then given by the reciprocal of the diagonal entries in cov 1 That is if 03912 0 0 0 0392 E covd 2 384 39 0 0 0 a then 52 Geosciences 567 CHAPTER 3 RMRGZ a2 0 0 0 oquot2 E covd 1 2 385 0 0 0 off With this choice for We the weighted misfit becomes N N EeTW2eZeIZW1ej 33986 1 11 But 1 Vij 51 0 12 387 where 5 is the Kronecker delta Thus we have N 1 E 2 2 e 388 11 039 If the ith variance 0392 is large then the component of the error vector in the ith direction eg has little in uence on the size of E This is not the case in the unweighted least squares problem where an examination of Equation 34 clearly shows that each component of the error vector contributes equally to the total misfit Obtaining the Weighted Least Squares Solution mWLs If one uses E eTWee as the weighted measure of error we will see below that this leads to the weighted least squares solution mWLs GTWEGHGTWed 389 with a weighted least squares operator Gvg s given by Gvg s GTW6G 1GTW6 390 While this is true in general it is easier to arrive at Equation 389 in the case where We is a diagonal matrix and the forward problem 1 Gm is given by the least squares problem for a best fitting straight line see Equation 39 53 Geosciences 567 CHAPTER 3 RMRGZ Step I N N N E eTWZe 2elZW1ej EllVile2 11 11 11 N 2 N M 2 ZW1d1 quots w 211ch Gyml 11 11 11 WH 6112 2m1d1 2m2d1 z ml2 2m1mzzl mzzle 11 Step 2 Then 8E N N N 2de2mIZVK12mZsz0 am1 11 11 11 and 39 N N N am 2zdzm 2mlzzm 2mzzszn 0 2 11 11 11 This can be written in matrix form as N N N 2W1 221W m Zd1VV11 11 11 1 11 N N m N 221W 223W 2 szdeu 11 11 11 Step 3 The left hand side can be factored as W11 0 0 1 z1 2W 22W 1 1 1 0 W22 3 1 22 221W Zzlen 21 22 ZN 3 0 3 3 0 0 WW 1 ZN or simply 2W1 221W GTWG 22m 223 2 Similarly the light hand side can be factored as 54 391 392 393 394 395 396 397 398 Geosciences 567 CHAPTER 3 RMRGZ n 0 0 d1 2 d W 1 1 1 0 W22 3 d2 399 Zdllen z1 z2 ZN 39 0 0 0 WW dN or simply 261W GTW d 3100 2 61 21W 2 Step 4 Therefore using Equations 398 and 3100 Equation 396 can be written as GTWeGm GTWed 3101 The weighted least squares solution mWLs from Equation 389 is thus mWLS GTWeG1 1GTWed 3102 assuming that GTWgG 1 exists of course 373 Weighted Minimum Length The development of a weighted minimum length solution is similar to that of the weighted least squares problem The steps are as follows First recall that the minimum length solution minimizes me By analogy with weighted least squares we can choose to minimize mTWmm 3103 instead of me For example if one wishes to use Wm cov m 1 3104 then one must replace m above with m ltmgt 3105 where ltmgt is the expected or a priori estimate for the parameter values The reason for this is that the variances must represent uctuations about zero In the weighted least squares problem it is assumed that the error vector e which is being minimized has a mean of zero Thus for the weighted minimum length problem we replace m by its departure from the expected value ltmgt Therefore we introduce a new function L to be minimized 55 Geosciences 567 CHAPTER 3 RMRGZ L m ltmgtTWmm ltmgt 3106 If one then follows the procedure in Section 36 with this new function one eventually as in It is left to the student as an exercise is led to the weighted minimum length solution mWML given by mWML ltmgt w GTGW GT 1d Gltmgt 3107 and the weighted minimum length operator GVilVIL is given by GMle w GTGW GT 1 3108 This expression differs from Equation 338 page 54 of Menke which uses Wm rather than W3 I believe Menke s equation is wrong Note that the solution depends explicitly on the expected or a priori estimate of the model parameters ltmgt The second term represents a departure from the a priori estimate ltmgt based on the inadequacy of the forward problem Gltmgt to fit the data 1 exactly Other choices for Wm include 1 DTD where D is a derivative matrix a measure of the atness of m of dimension M 1 X M 1 1 0 0 D 9 1 1 3 3109 0 0 0 1 1 2 DTD where D is an M 2 X M roughness second derivative matrix given by 2 1 0 0 0 1 2 1 s D 3110 0 0 0 1 2 1 Note that for both choices of D presented DTD is an M X M matrix of rank less than M for the first derivative case it is of rank M 1 while for the second it is of rank M 2 This means that Wm does not have a mathematical inverse This can introduce some nonuniqueness into the solution but does not preclude finding a solution Finally note that many choices for Wm are possible 56 Geosciences 567 CHAPTER 3 RMRGZ 374 Weighted Damped Least Squares In Sections 372 and 373 we considered weighted versions of the least squares and minimum length solutions Both unweighted and weighted problems can be very unstable if the matrices that have to be inverted are nearly singular In the weighted problems these are GTweG 3111 and Gwrg1 GT 3112 respectively for least squares and minimum length problems In this case one can form a weighted penalty or cost function given by E 32L 3113 where E is from Equation 391 for weighted least squares and L is from Equation 3106 for the weighted minimum length problem One then goes through the exercise of minimizing Equation 3113 with respect to the model parameters m and obtains what is known as the weighted damped least squares solution mm It is in fact a weighted mix of the weighted least squares and weighted minimum length solutions One finds that mm is given by either mWD ltmgt GTWeG ssz 1GTWgd Gltmgt 3114 or mWD ltmgt wr 1 GTGw 1 GT 32w11d Gltmgt 3115 where the weighted damped least squares operator Gv is given by Gwi GTWeG 82Wm11GTWe 3116 or Gv wr 1 GTGw 1 GT 32w1 1 3117 The two forms for GVQD can be shown to be equivalent The 82 term has the effect of damping the instability As we will see later in Chapter 6 using singular value decomposition the above procedure minimizes the effects of small singular values in GTWeG or Ger GT In the next section we will learn two methods of including a priori information and constraints in inverse problems 57 Geosciences 567 CHAPTER 3 RMRGZ 38 A Priori Information and Constraints See Menke Pages 55 57 381 Introduction Another common type of a priori information takes the form of linear equality constraints Fm 11 3118 where F is a P X M matrix and P is the number of linear constraints considered As an example consider the case for which the mean of the model parameters is known In this case with only one constraint we have M LEM zhl 3119 M 11 Then Equation 3118 can be written as m1 1 m Fm ll 1 ll 2 11 3120 mM As another example suppose that the jth model parameter mj is actually known in advance That is suppose mjh1 3121 Then Equation 3118 takes the form m1 Fm0 0 1 0 0m 111 3 jth column mM 3 122 Note that for this example it would be possible to remove mj as an unknown thereby reducing the system of equations by one It is often preferable to use Equation 3122 even in this case rather than rewriting the forward problem in a computer code 58 Geosciences 567 CHAPTER 3 RMRGZ 382 A First Approach to Including Constraints We will consider two basic approaches to including constraints in inverse problems Each has its strengths and weaknesses The first includes the constraint matrix F in the forward problem and the second uses Lagrange multipliers The steps for the first approach are as follows Step 1 Include Fm h as rows in a new G that operates on the original m G m2 d F E h 3123 NPgtltM Mgtlt1 NPgtlt1 Step 2 The new N P X 1 misfit vector e becomes dob dpre e 3124 h hp NPgtlt1NPgtlt1 Performing a least squares inversion would minimize the new eTe based on Equation 3124 The difference h hPre 3125 which represents the misfit to the constraints may be small but it is unlikely that it would vanish which it must if the constraints are to be satisfied Step 3 Introduce a weighted misfit eTwee 3 126 where We is a diagonal matrix of the form 59 Geosciences 567 CHAPTER 3 RMRGZ 1 0 T 0 1 N W 1 0 0 i 3127 2 0 big 0 T 0 0 b39 1g P 0 0 0 0 0 0 big 1 That is it has relatively large values for the last P entries associated with the constraint equations Recalling the form of the weighting matrix used in Equation 383 one sees that Equation 3127 is equivalent to assigning the constraints very small variances Hence a weighted least squares approach in this case will give large weight to fitting the constraints The size of the big numbers in We must be determined empirically One seeks a number that leads to a solution that satisfies the constraints acceptably but does not make the matrix in Equation 3111 that must be inverted to obtain the solution too poorly conditioned Matrices with a large range of values in them tend to be poorly conditioned Consider the example of the smoothing constraint here P M 2 Dm 0 3128 where the dimensions of D M 2 X m m M X 1 and 0 M 2 X 1 The augmented equations are m D 0 3129 Let39s use the following weighting matrix 0 0 0 1 0 w 1M 0 2 192 0 0 021m 3130 0 0 92 where 192 is a constant This results in the following with the dimensions of the three matrices in the first set of brackets being M X N P N P X N P and N P X M respectively 60 Geosciences 567 CHAPTER 3 RMRGZ igfl lu a gt lt gt lt 1 GT lgzDTJ GT I QZDTJ The lower matrices having dimensions of M X N P N P X 1 GTG 192DTD 1 GTd 3131 Mgtlt M Mgtlt1 G T G PIGTd 0D 0D 3132 The three matrices within 3132 have dimensions M X N P N P X M and M X 1 respectively which produce an M X 1 matrix when evaluated In this form we can see this is simply the mm for the problem G d 3 133 m 19 D 0 By varying 19 we can trade off the misfit and the smoothness for the model 383 A Second Approach to Including Constraints Whenever the subject of constraints is raised Lagrange multipliers come to mind The steps for this approach are as follows Step 1 Form a weighted sum of the misfit and the constraints m eTe Fm hT7t 3134 which can be expanded as m2dx ZGM ZZampZFM 4 3135 11 T 1 T where T indicates a difference from Equation 343 on page 56 in Menke and where there are P linear equality constraints and where the factor of 2 as been added as a matter of convenience to make the form of the final answer simpler 61 Geosciences 567 CHAPTER 3 RMRGZ Step 2 One then takes the partials of Equation 3135 with respect to all the entries in m and sets them to zero That is 8 mo q12M 3136 am which leads to M N N P 3 ZZmIZGHG ZZqud1ZZqEqO q12M 11 11 11 11 where the first two terms are the same as the least squares case in Equation 325 since they come directly from eTe and the last term shows why the factor of 2 was added in Equation 3135 Step 3 Equation 3137 is not the complete description of the problem To the M equations in Equation 3137 P constraint equations must also be added In matrix form this yields GTG FT m GTd 3138 F 0 l h MPgtltMP MPgtlt1 MPgtlt1 Step 4 The above system of equations can be solved as mGTG FT IGTd 3139 x F 0 h 39 As an example consider constraining a straight line to pass through some point z d 39 That is for N observations we have di m1 mzzi i1N 3140 subject to the single constraint d m1m2z 3141 Then Equation 3118 has the form Fm 1 z 1 2 d 3142 We can then write out Equation 3139 explicitly and obtain the following 62 Geosciences 567 CHAPTER 3 RMRGZ m1 N 22 13251 m2 221 2212 2 22161 3143 in 1 z 0 61 Note the similarity between Equations 3143 and 336 the least squares solution to fitting a straight line to a set of points without any constraints m1 N 22 1 2d 336 m2 LS 221 2212 22611 39 If you wanted to get the same result for the straight line passing through a point using the first approach with We you would assign Wit1 ilN 3144 and WN1N1 big 3145 which is equivalent to assigning a small variance relative to the unconstrained part of the problem to the constraint equation The solution obtained with Equation 3103 should approach the solution obtained using Equation 3143 Note that it is easy to constrain lines to pass through the origin using Equation 3143 In this case we have d z 0 3146 and Equation 3 143 becomes N z z 1 1 2 d 221 2212 0 22161 3147 1 0 0 0 gtgt The advantage of using the Lagrange multiplier approach to constraints is that the constraints will be satisfied exactly It often happens however that the constraints are only approximately known and using Lagrange multipliers to fit the constraints exactly may not be appropriate An example might be a gravity inversion where depth to bedrock at one point is known from drilling Constraining the depth to be exactly the drill depth may be misleading if the depth in the model is an average over some area Then the exact depth at one point may not be the best estimate of the depth over the area in question A second disadvantage of the Lagrange multiplier approach is that it adds one equation to the system of equations in Equation 3143 for each constraint This can add up quickly making the inversion considerably more difficult computationally 63 Geosciences 567 CHAPTER 3 RMRGZ An entirely different class of constraints are called linear inequality constraints and take the form Fm 2 h 3148 These can be solved using linear programming techniques but we will not consider them further in this class 384 Seismic Receiver Function Example The following is an example of using smoothing constraints in an inverse problem Consider a general problem in time series analysis with a delta function input Then the output from the quotmode quot is the Greens function of the system The inverse problem is this Given the Greens function find the parameters of the model Input Greens Function A gt model gt AVAVAV39 impulse In a little more concrete form model space data space m gt 0 1 Cl c a 2 c 1 2 2 3 c3 3 i 3 Fm 3 d M IcN Z Fmd t If 1 is very noisy then mLs will have a high frequency component to try to quotfit the noisequot but this will not be real How do we prevent this So far we have learned two ways use mWLS if we know cov d or if not we can place a smoothing constraint on m An example of this approach using receiver function inversions can be found in Ammon C J G E Randall and G Zandt On the nonuniqueness of receiver function inversions J Geophys Res 95 15303 15318 1990 The important points are as follows 64 Geosciences 567 CHAPTER 3 RMRGZ 0 This approach is used in the real world 0 The forward problem is written 9ij j123N 0 This is nonlinear but after linearization discussed in Chapter 4 the equations are the same as discussed previously with minor differences 0 Note the correlation between the roughness in the model and the roughness in the data 0 The way to choose the weighting parameter 039 is to plot the trade off between smoothness and waveform fit 39 Variances of Model Parameters See Pages 58 60 Menke 391 Introduction Data errors are mapped into model parameter errors through any type of inverse We noted in Chapter 2 Equations 261 263 that if mest Md v 261 and if cov d is the data covariance matrix which describes the data errors then the a posteriori model covariance matrix is given by cov m Mcov dMT 263 The covariance matrix in Equation 263 is called the a posteriori model covariance matrix because it is calculated after the fact It gives what are sometimes called the formal uncertainties in the model parameters It is different from the a priori model covariance matrix of Equation 385 which is used to constrain the underdetermined problem The a posteriori covariance matrix in Equation 263 shows explicitly the mapping of data errors into uncertainties in the model parameters Although the mapping will be clearer once we consider the generalized inverse in Chapter 7 it is instructive at this point to consider applying Equation 263 to the least squares and minimum length problems 392 Application to Least Squares We can apply Equation 2 63 to the least squares problem and obtain 65 Geosciences 567 CHAPTER 3 RMRGZ cov m GTG 1GTcov dGTG 1GTT 3149 Further if cov d is given by cov d 039le 3150 then cov m GTG 1GT0392 IGTG1GTT 02 GTG 1GTG GTG 1T 02 GTG 1T 02 GTG 1 3151 since the transpose of a symmetric matrix returns the original matrix 393 Application to the Minimum Length Problem Application of Equation 263 to the minimum length problem leads to the following for the a posteriori model covariance matrix cov m GTGGT 1cov dGTGGT1T 3152 If the data covariance matrix is again given by cov 1 02 IN 3153 we obtain cov m 0392 GTGGT 2G 3154 where GGTl 2 GGTl 1GGT 1 3155 394 Geometrical Interpretation of Variance There is another way to look at the variance of model parameter estimates for the least squares problem that considers the prediction error or misfit to the data Recall that we defined the misfit E as E eTe d dPreTd dpre d GmTd Gm 323 66 Geosciences 567 CHAPTER 3 RMRGZ which explicitly shows the dependence of E on the model parameters m That is we have E Em 3156 If Em has a sharp well defined minimum then we can conclude that our solution mm is well constrained Conversely if Em has a broad poorly defined minimum then we conclude that our solution mm is poorly constrained After Figure 310 page 59 of Menke we have next page 0 model parameter In model parameter In a The best estimate mCSt of model parameter m occurs at the minimum of Em If the minimum is relatively narrow then random uctuations in Em lead to only small errors Am in mes b If the minimum is wide then large errors in m can occur One way to quantify this qualitative observation is to realize that the width of the minimum for Em is related to the curvature or second derivative of Em at the minimum For the least squares problem we have 82E 82 2 2 dm2 3157 Evaluating the right hand side we have for the qth term 82E 82 2 82 i i 2 3158 d Gm d G M 39 8m2 8m 8m 11 F1 1 j amz 22d1 Gmj GW 3159 Geosciences 567 CHAPTER 3 RMRGZ M 2 N 2 312 Zqudl ZGUG19m 3160 F1 q 11 Using the same steps as we did in the derivation of the least squares solution in Equations 324 329 it is possible to see that Equation 3160 represents the qth term in GTd Gm Combining the q equations into matrix notation yields 82 anz d Gm2 2GTd Gm 3161 Evaluating the first derivative on the right hand side of Equation 3161 we have for the qth term a a N M GTd G G d GG 3162 emf 1 m1 U rifl jgw 3163 11 1 8mg N 2 GWqu 3164 which we recognize as the q q entry in GTG Therefore we can write the M X M matrix equation as k a Gm GTG 3165 From Equations 3150 3158 we can conclude that the second derivative of E in the least squares problem is proportional to GTG That is 32E 8 2 constantGTG 3166 m mmLS Furthermore from Equation 3150 we have that cov m is proportional to GTG 1 Therefore we can associate large values of the second derivative of E given by 3166 with 1 sharp curvature for E 2 narrow well for E and 3 good ie small model variance As Menke points out cov m can be interpreted as being controlled either by 1 the variance of the data times a measure of how error in the data is mapped into model parameters or 2 a constant times the curvature of the prediction error at its minimum 68 Geosciences 567 CHAPTER 3 RMRGZ I like Menke s summary for his chapter page 60 on this material very much Hence I39ve reproduced his closing paragraph for you as follows The methods of solving inverse problems that have been discussed in this chapter emphasize the data and model parameters themselves The method of least squares estimates the model parameters with smallest prediction length The method of minimum length estimates the simplest model parameters The ideas of data and model parameters are very concrete and straightforward and the methods based on them are simple and easily understood Nevertheless this viewpoint tends to obscure an important aspect of inverse problems Namely that the nature of the problem depends more on the relationship between the data and model parameters than on the data or model parameters themselves It should for instance be possible to tell a well designed experiment from a poorly designed one without knowing what the numerical values of the data or model parameters are or even the range in which they fall Before considering the relationships implied in the mapping between model parameters and data in Chapter 5 we extend what we now know about linear inverse problems to nonlinear problems in the next chapter 69
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'