Popular in Course
Popular in Mathematics (M)
This 62 page Class Notes was uploaded by Miss Gladys Lubowitz on Tuesday October 20, 2015. The Class Notes belongs to MATH693A at San Diego State University taught by Staff in Fall. Since its upload, it has received 12 views. For similar materials see /class/225279/math693a-san-diego-state-university in Mathematics (M) at San Diego State University.
Reviews for MATH693A
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/20/15
r C iir ue an Peter Blomgren blomgren peter gmail com Department of Mathematics and Statistics Dynamicai Systems Group Computationai Sciences Research Center San Diego State University San Diego CA 9218277720 mttermimus5cisuer u Fall 2009 Peter Blom A vlk xini39 a mean latm Ni quotW ch Metho Step Lengm s 0 Line Search Methods 0 R ecap 0 Step Length Selection 6 Step Length Selection i 0 LS 7 Strong Wolfe Conditions Peter Blom Line Search Method Step Le m Selection 2 24 Line Search Methods Recap Step Length 5 m Quick Recap Last Time Rates of convergence for our different optimization strategies We showed that for a simple quadratic model f0 lSZTQ 7 bTi the steepest descent method is indeed linearly convergent The result generalizes to general nonlinear objective functions for which VHF O and V211 is positive definite We stated the result for Newton s method which says that it is locally quadratically convergent 7 324 Line Search Methods Step Length Selection Peter Blomgren Line Search Methods Recap Step Length 5 m Quick Recap Last Time Further Quasi Newton methods where the search direction is 3 iBglwm exhibit super linear convergence as long as the matrix sequence Bk converges to the Hessian V2fx in the search direction k Bk 7 V2fgtquotlti3kll lim O 00 llPkll Coordinate Descent Methods Slower than Steepest descent Useful of coordinates are decoupled andor computation of the gradient is not possible or too expensive Peter Blomgren Line Search Methods Step Length Selection 7 424 Line Search Methods Step Length Sei 39en Step Length Selection Unconstrained Optimization 7 In the Line Search Universe Global Optimization Problem 1 Local Strategies Line Search Algorithms Search Direction Step Length Sufficient Descent Conditions Convergence Global Convergence Local Rate m Peter Blom Line Search Memo Selection 524 Line Search Methods Step Length Selection leWQilhlakil Tum lTllmg 439 atrstep Length We look at techniques for Best Finding a minimizer to the lD function a fik 043k OK Finding a step length 04k which satisfy a sufficient decrease condition such as the Wolfe conditions We already have one such algorithm Ammhmt de r clk ng Umch 1 SetEgt 0 p 6 01 6 6 01 set 04 E 2 While f ltk am gt ik ca kTvaik 3 Oz pa 4 End While 5 Set 04k Line Search Methods v Step Length 5 Step Length Selection Step Length Selection Assumptions We must assume that p is a descent direction e that ltlgt 0 lt O thus all our steps will be in the positive direction When the objective f is quadratic f6 ZTQ39Z l ET l c the optimal step can be found explicitly Vf le k 04k i q Pk QPk For general nonlinear f we must use an iterative scheme to find the step length ak How the line search is performed impacts the robustness and efficiency of the overall optimization method Peter Blomgren Line Search Methods Step Length Selection 7 724 Line Search Methods v Step Length S Step Length Selection Step Length Selection Classification It is natural to classify line search methods based on how many derivatives they need 0 Methods based on function values only tend to be inef ficient since they need to narrow the minimizer to a small interval 1 Gradient information makes it easier to determine if a certain step is good ie it satisfies a sufficient reduction condition gt1 Methods requiring more than one derivate are quite rare in order to compute the second derivative the full Hessian V2fik is needed this is usually too high a cost Peter Blomgren Line Search Methods Step Length Selection Line Search M i i v Step Lengt Step Length Selection Step Length Selection Our Focus The best bang for bucks line search algorithms use the gradient information hence those will be the focus of our discussion A line search algorithm roughly breaks down into the following components 1 The initial step length 040 is selected 2 An interval amimamax containing acceptable step lengths is identified Bracketing phase 3 The final step length is selected from the acceptable set Selection phase Peter Blomgren Line Search Methods Step Length Selection 7 924 Lim Search Methods quot 9 quot quot39quot quot Step Length Selection Step Length Selection Interpolation First we note that the Armijo condition can be written in terms of ltlgt as Clgt04k ltlgt0 clak ClDO where 1 10 4 in practice This is stronger but not much stronger that requiring descent Our new algorithms will be efficient in the sense that the gradient Vfik is computed as few times as possible If the initial step length 040 satisfies the Armijo condition then we accept 040 as the step length and terminate the search As we get close to the solution this will happen more and more often for Newton and quasi Newton methods with 040 1 Otherwise we search for an acceptable step length in 0040 m Peter Blomgren Line Search Methods Step Length Selection 7 Ill24 terpoiation Line archivleihods Step Length Selection Step Length Selection Interpolation At this stage we have computed 3 pieces of information igt07 0 and a0 we use this information to build a quadratic mode qa Ma Mag 7 4402 e 040 0 042 0Oz O 0 0 Note m mi wm mn W We set 2034 O to find the minimum of the model our next 04 to try 7 7 Mw wawwvm0 27034 204 a2 0 55 Peter Blomgren Line Search Methods Step Length Selection Lim arch Methods Step Length Selection terpolation Step Length Selection Interpolation Hence 7 ag ibO 2 Mao 0 a0 0l We now check the Armijo condition 041 1041 0 041 If it fails then we create a cubic function ltlgtc04 21043 3042 04 0 ltlgt0 which interpolates lgt07 lgt 07 Mao and dgta1 l i l l l l E323E831E33D Peter Blomgren Line Search Methods Step Length Selection 7 1224 Lim Search lvleilwds quot 9 quot quot39quot quot Step Length Selection Step Length Selection Interpolation The next iterate 042 is now the minimizer of ltlgtca which lies in 0041 it is given as one of the roots of the quadratic equation Ca 33042 2th 0 0 it is 042 7 7b ibz 7 3altlgt 0 7 3a In the extremely rare cases that 042 does not satisfy the Armijo condition Clgt042 ltlgt0 1042 0 we create a new cubic model interpolating lgt07 lgt 07 lgt0q7 and 042 ie ltlgt0 ltlgt 0 and the two most recent a39s Peter Blomgren Line Search Methods Step Length Selection 7 1324 Lim Search Methods Step Length Selection Step Length Selection Interpolation At this point we must introduce to following safeguards to guarantee that we make sufficient progress If lak1 7 Ozkl lt 61 or lak1l lt 62 then Otk1 Ozk2 The algorithm described assumes that computing the derivative is significantly more expensive than computing function values However it is often but not always possible to compute the directional derivative or a good estimate thereof with minimal extra cost In those cases we build the cubic interpolant so that it interpolates ak7 04k7 ake17 and ake1 this is a Hermite Polynomial of degree 3 see Math 541 m Peter Blomgren Line Search Methods Step Length Selection 7 1424 Line Senrch Methods Step Length Selection Step Length Selection Interpolation The cubic Hermite polynomial satisfying H3ak71 ake17 H ak71 will Hawk New H Wk WW can be written explicitly as 1 217711 2 ltlgtozke1 l 12L 0 l WWI rum ak aka H3Oz ak aka a 7 aka 2 oQH 2 a 7 04k HM wok ak aka Straight from Math 541 Peter Blomgren Line Search Methods Step Length Selection 7 1524 Line senrcn lvleilwds quot 9 quot quot39quot quot Step Length Selection Step Length Selection Interpolation The minimizer of H3a in 04917 ak is either at one of the end points or else in the interior given setting H a O The interior point is given by ak d2 7 d1 ak1 ak 7 ak 7 ak71 aki ak71 2612 where 1 ak1 CDak 7 3 d2 d12 7 ak1 ak Either oak is accepted as the step length or the search process continues dgtW71 010 akil ak Cubic interpolation gives quadratic convergence in the step length selection algorithm Line Search Memo Selection 7 1524 Line Search Methods Step Length Selection Step Length Selection The Initial Step For Newton and quasiNewton methods the search vector k contains an intrinsic sense of scale being formed from the local descent and curvature information hence the initial trial step length should always be an 1 otherwise we break the quadratic respective superlinear convergence properties For other search directions such as steepest descent and conjugate gradient to be described later directions which do not have a sense of scale other method must be used to select a good first trial step Strategy 1 Assume that the rate of change in the current iteration will be the same as in the previous iteration select an ak71 lilv ikil an t t pZV xk Peter Blon ren Search Medlod Step Le tn Selection 7 1724 Linesearch Methods quot I Isl Step Length Selection 1 quot 9quot Step Length Selection The Initial Step Use the minimizer of the quadratic interpolant to fik1 Rik and gt O k71Vfik1as the initial 04 Strategy 2 alk 2fgt ltk Wed 0 393171Vf62ki1 If this strategy is used with a quadratically or super linearly convergent algorithm the choice of 040 must be modified slightly to preserve the convergence properties alflew min1 101 this ensures that the step length 040 1 will eventually always be 5351 tried 7 1324 Line Search Methods Step Length Selection Peter Blomgren Lim Search Methods Step Length Selection MW 5 WW a LSSm tens 1 Set 040 0 choose 041 gt 0 amax cl and cg i 1 while TRUE Compute lt1gtuz if 4gta gt lt1gt0 clump0 or 4gtoz Z 4gtoz1 and igt 1 at zooma1a and ermimte search Compute lt1gt oz if 4gt oz S 7621 0 at an terminate watch if 4gt uz 2 0 at zoomaa1 and terminate search Choose Dz1 6 04ozmax 39 i 1 end Lim Search Methods 5 9 quot3 59mmquot L5 7 Slro Wolfe Conditions Line Search for the Strong Wolfe Conditions In the first stage of the algorithm either an acceptable step length or a range aj7aj1 containing an acceptable step length is identified none of the conditions 04 O7 09 are satisfied so the step length is increased 11 If in the first stage we identified a range the second stage invokes a function zoom which will identify an acceptable step from the interval Notes 04 establishes that a is too long a step thus 1 must be in the range Oil3971701j If 07 holds then both the strong Wolfe conditions hold since not04 must also hold Finally if 09 holds then the step is too large since we are going uphill at this point m Line Search Memo Selection 7 2u24 Lim Search Methods 5 5 9 quot3 59mmquot Ls roll lfe Conditions Line Search for the Strong Wolfe Conditions The zoom function takes two arguments zoomozl0W7 ahigh satisfying the following 1 The interval bounded by 041W and ahigh contains step lengths with satisfy the strong Wolfe conditions 2 am is the 04 corresponding to the lower function value ie a10w lt ltlgtozhigh 3 am and ahigh satisfy 0410WOzmgh 7 am lt 0 See the figure on slide 23 Peter Blomgren Line Search Methods Step Length Selection 7 2124 Lim Search Methods Step Length Selection u 4 i L5 7 Strong Wolfe Conditions Line Search for the Strong Wolfe Conditions Algorithm zoom function 01 while TRUE 02 Interpolate to find 04 between mow and amgh 03 Compute lt1 04 04 if ltlgtot gt lt1gt0 cla4gt 0 or ltlgtot Z aloW 05 ahigh 04 06 else 07 Compute lt1gt ut 08 if l4gt ogl S C2 0 09 at 04 and returna 10 if D D high Wow 2 0 11 Drhigh Driow 12 mow 04 13 end Line Search Method 1 Selection Lim Search Methods 5 5 9 quot3 59mmquot L5 7 Strong Wolfe Conditions Line Search for the Strong Wolfe Conditions In practical applications c1 10 4 and C2 0 9 enforcing strong Wolfe conditions require a similar amount of work compared with the Wolfe conditions The advantage of the strong conditions is that by decreasing C2 we can force the accepted step lengths to be closer to the local minima of this is particularly helpful in applications of steepest descent or conjugate gradient methods Suf cient Decrease Low High Figure A possible scenario for zoom 7 since mow lt ahigh we must have negative slope at 041W m Line Search Medlod 1 Selection Search Meilw 5quotquot quot3 59399 L5 951m Wolfe Conditions Line Search for the Strong Wolfe Conditions Figure Illustrating the 04icondition since we know we can push the objective down to ltlgtoziow we reject 04 even though it satisfies the Armijo condition m Peter Blom Line Search Method Step Lei ll Selection R39 ecap Trustr ogion Nm vmn Peter Blomgren blomgren peter gmail com Department of Mathematics and Statistics Dynamicai Systems Group Computationai Sciences Research Center San Diego State University San Diego CA 9218277720 ML cerminussi uedu Fall 2009 Peter Blom imLt 9 Recap 0 Hessian Modifications 0 Trust Region Algorithm 9 TrusteRegion Newton 0 NewtoneDogleg 0 Newton72D7ubspaceeMinimization 0 NewtoneNearIy Exact Solution 0 TrusteRegion Newton7PCG 39 n Newton Medio Peter Blom an Modi aliens Hessian Modifications We discussed strategies for modifying the Hessian in order to make it positive definite If we use the Frobenius matrix norm the smallest change is of the type change negative eigenvalues to small positive onesquot B AAA Where AA Qdiagm QT Ti O quot 2 5 67A Ailt6l If on the other hand we use the Euclidean norm the smallest change is a multiple of the identity matrix Le shift the eigenvalue spectrum so all eigenvalues are positivequot B A AA7 Where AA 7397 739 maltO7 6 7 AmmAi on Newton Mediods wagon rzimauv uu W mm Ema 19 1 2 32 3 58 20 am Set k1 E gt 0 A0 5 03 and n E 0 mule opnmahcy condltlon not sanshed Get ink approxmace solunon Today s Dlscusslon Evaluate pk 1f 2 1 and HFkH Ak Am mm2AkA k gt V Xk1 Xk Pk lse ik1 ik endlf k k 1 End whlle R mp Truspnegion Newton sitive Definite is OK Trustregion methods do not require that 3 model Hessian is i 3 It is possible to use the exact Hessian Bk V2f39k directly and find the search direction in by solving the trustregion subproblem E Q m lSAk 1 mIn XkVfXkTP PTBkP7 llp pElRquot 2 Some of the techniques we discussed eg dogleg equire i 39 n DEQSClLEFS deffn We have seen quite few ideas floating around lets review what we have seen in the context of our methods i the dogleg method quot 2Dsubspace minimization nearly exact solution and iv the CG method Peter Blon ren NewtoneD R39 mp TrusteRegion Newton Newton Dogleg When Bk is positive definite the dogleg method minimizing the model over the dogleg path Tr ogrg1 krel fe k 1732 where VfikTVfik B 7 7 1 u 7 pk Bk Wlxkli pk VfikTBkaik The unconstrained minimum of the quadratic model along the steepest descent direction Vf39k The Full Step gives good approximate solutions to the trustregion subproblems which can be computed efficiently However when Bk is not positive definite we cannot safely compute 35 further the denominator Vf39kTBka39k could be zero on Newton Methods In order to make the dogleg method work for general nonpositive definite Bks we can use the Hessian modification from last time to replace Bk 4 Bk Ek W PosDef and use this matrix in the dogleg solution There is a price to pay When the matrix Bk is modified the importance of different directions are changed in different ways This may negatively impact the benefits of the trustregion approach Modifications of the type Ek TI behave somewhat more predictably than modifications of the type Ek cliag73917 7392 777 Usage of the dogleg method for nonconvex problems is somewhat dicey and even though it may work it is not the preferred method Recap ixalion Truslrl39x egion Newton i Newton 2D Su bspace M inim ization In much the same way we modified the dogleg method we can adapt the 2D subspace minimization subproblem to work in the case of indefinite Bk 1 39gf n xkVfxkTPEPTBkPgt iiPiiSAkgt PESPaDVfXkgtPB can be applied when Bk is positive definite and with a modified Bk Blt i Ek which is positive definite in the case when Bk is not positive definite 1 rgg xwVfXkTP PTka iiPiiSAkgt PESPaDVfXkgtPB Peter Blomgren n Newton Medlods Recap Truspnegion Newton mm mimmi wwvz4icinm v we Solidi 611 egion Subproblem Recall the theoretical basis for the exact solution from lecture 9 Tile The vector 3 is a global solution of the trust region problem 1 min f i TW i 739TB 39 Halsm k p l 0 2p kp if and only if is feasible and there is a scalar A 2 0 such that the following conditions are satis ed 1 Bk A01 7 2 Ak 7 ll ll 3i Bk AI 7 7Vf39k O is positive semi de nite In this approach we are already using the Hessian modification in the specific form Ek Al good for small problems Peter Blomgren Maw R mp Trtlseregion Newton Trust Region Newton CG The trustregion subproblem T 17 mIn fXk VfXk p 7p 3m llpll S Ak peRquot 2 can be solved using the Conjugate Gradient CG method with two additional termination criteria one of which we have seen already For each subproblem we must solve Bk k 7Vf39k We apply CG with the following stopping criteria standard The system has been solved to desired accuracy previous Negative curvature encountered new Size of the approximate solution exceeds the trusteregion radius Peter Blon ren 1 Newton Med In the case of negative cunature we follow the direction to the boundary of the trust region we get Steihaug s Method seeming n egt 0 set 30 0 F0 me a fro if HFOH lt 6 return io wh11eTRU 1r dTBd g 0 397 Negative Curvature Find 739 Z 0 such that 393 E 73 satisfies A 115 1 04 iTidTBdw 1 uzd if HIE1 2 A 7 Step outside trus region ind 739 Z 0 such that p p 73 satisfies A return i ndif uzBa H g 5HF07H geturnQiH gir1F15 U 1 le t Hd en W 1 e R39 mp Truslrl39x egion Newton iloim it ll When we get close to the optimum the trustregion constraint becomes inactive the model becomes a good approximation of the objective and the radius of the trustregion grows At thisjuncture we need to pay particular attention to how the e in CGSteihaug is selected It should be given by the forcing sequence m which gives us quadratic convergence ie e N Good properties of TRNewtonCG Globally convergent the first step in the 7Vf39k direction identifies the Cauchy point the subsequent steps improve on if No matrix factorizations are necessary Jigclarantagas over LS NewtonCG Step lengths are controlled by the trust region Directions of negative curvature are explored Peter Blon ren i i i 7 R39 9cm Truslrl39x egion Newton an NJ Less tlzam gccm properties of TR Newton CG Any direction of negative curvature is accepted the accepted direction can give an insignificant reduction in the model There is an extension of CG known as Lanczos method and it is possible to build a TR Newton Lanczos algorithm which does not terminate when encountering the first direction of curvature but continues to search for a direction of suf cient negative curvature TR Newton Lanczos is more robust but comes at a cost of a more expensive solution of the subproblem We leave the discussion of the Lanczos algorithm to Math 643 to be offered in Spring 2049 Peter Blomgren R39 mp Truspnegion Newton As we have seen in other very similar settings adding preconditioning to the CGsolver can cut the number of iterations quite drastically It would seem like a good and natural idea to add preconditioning to the Trust Region NewtonCG scheme We have to be a little careful For the standard CGSteihaug the following is true Tl l The sequence of vectors generated by CG Steihaug satis es 0 lll30ll2 lt lll31ll2 ltltlll31ll2 lt lll311ll2 lt mll llz S A This does not hold for preconditioned PCGMSteihaug This means that the sequence can leave the trust region and then come back R sea Truspnegion Newton It is possible to define a weighed norm in which the PCG M iterates grow monotonically this weighted norm depends on the preconditioner If we express the preconditioning of Bk in terms of a nonsingular matrix D which guarantees that the eigenvalues of D TBkD 1 have a favorable distribution when the subproblem takes the form T 1 T r mIn fXk VfXk p 7p 3m llJ39Pll S Ak pElRquot 2 if we formally makeAthe change of variables 3 Dr and set Ek D TVf k Bk D TBkD 1 the subproblem transform into A A 1A A A A mIn fXk ngp pTka llpll S Ak pERquot 2 to which we can apply CGSteihaug Peter Blomgren R sea Truslrl39x egion Newton Trust Region Newton PCGM As usual we never make this change of variables explicitly Instead the CGSteihaug algorithm is modified so that the wherever we have a multiplication by D 1 or D T we solve the appropriate linear system Note if D TBkD 1 I the preconditioning is perfect Usually D TBkD 1 I E and ifwe multiply by DT from the left and D from the right we see M R So that M m Bk and R captures the inexactness of the preconditioning Peter Blomgren r n Newton Medlods R39 sea Truspnegion Newton We can get a good generalpurpose preconditioner by using a variant of the Cholesky factorization LLT Bk We have discussed two ideas in connection with the Cholesky factorization last time we talked about how to modify it to get an approximate factorization of an indefinite matrix ie L LT 7 choldecompBk choleskyBlt diag73917392rn 7 7 modelhessBk choleskyBlt AI We have also in general terms talked about the incomplete Cholesky factorization which preserves the sparsity pattern of Bk by not allowing fillins Putting the two together we get something like the algorithm on the next slide do not implement this one Peter Blomgren R mp TruspRegion Newton lDUc mm Given6gt0 gt0 for 1n Enga hmmmgdkg rg 7 1 2 cj 7 a 7 d5 9 max Sn CM 1 max ML 5gt gr for i j1 n if aU70 Z Only allow 00 if aU7C cw ay 7 2 sismos U CUd else IU CU 0 endif endfori endfor j Peter Blon ren R39scap c Truspnegion Newton z N M Iil We have looked at Newton methods with quadratic convergence if and only if we implement and solve all the subproblems in the right way for both the linesearch and trustregion approach and have developed quite a powerful framework of algorithms that are suitable and quite stable for large problems Are we done Not quite We have 4 main topics left 1 Estimation of derivatives how to proceed if the gradient andor the Hessian is not available in analytic form 2 Quasi Newton methods how to proceed if the Hessian is not available too expensive 3 Application to Nonlinear Least Squares problems 4 Application to Nonlinear Equations If we can minimize we can also solve F39 0 Peter Blon ren l i 7 Numerical Optimization Lecture Notes 24 Nonlinear Least Squares Problems Orthogonal Distance Regression Peter Blomgren blomgren peter gmail com Department of Mathematics and Statistics ynamical Systems p Computational Sciences Researcn Center San Diego State University San Diego CA 9218277720 httpterminussdsuedu Fall 2009 m Orthogonal Distance Regression 7 119 Outline a Summary 0 Linear Least Squares 0 Nonlinear Least Squares a Orthogonal Distance Regression 0 Error Models 0 Weigthed Least Squares Orthogonal Distance Regression 0 ODR Nonlinear Least Squares Exploiting Structure Orthogonal Distance Regression 7 219 Summary Linear Least Squares Our study of non linear least squares problems started with a look at linear least squares where each residual is linear and the Jacobian therefore is constant The objective of interest is 1 Ha EllJiHoll o v A 0 solving for the stationary point Vf39 0 gives the normal equations JUx iJTFo We have three approaches to solving the normal equations for 70 in increasing order of computational complexity and stability Cholesky factorization of JTJ QR factorization of J and Singular Value Decomposition of J m Orthogonal Distance Regression 7 319 Summary Nonlinear Least Squares 1 of 4 Problem Nonlinear Least Squares 39 arg min f39arg min mgtn 6an 6an 2 j1 J 7 7 7 where the residuals are of the form yj 7 Clgt39 tj Here y are the measurements taken at the locationstimes f and Clgt39 tj is our model The key approximation for the Hessian wax J39ltTJ39lt Z awn2 e J39ltTJ39lt 11 Orthogonal Distance Regression 7 419 Summary Nonlinear Least Squares 2 of 4 Line search algorithm Gauss Newton with the subproblem lJikTJikl 3 ewek Guaranteed descent direction fast convergence as long as the Hessian approximation holds up equivalence to a linear least squares problem used for efficient stable solution Orthogonal Distance Regression 7 519 Summary Nonlinear Least Squares 3 of 4 Trust region algorithm Levenberg Marquardt with the subproblem 1 o piM argrmn llJlxkP rkll subject to llPll S M p6R 2 Slight advantage over GaussNewton global convergence same local convergence properties also locally equivalent to a linear least squares problem Orthogonal Distance Regression 7 519 Summary Nonlinear Least Squares 4 of 4 Hybrid Algorithms c When implementing GaussNewton or Levenberg Marquardt we should implement a safe guard for the large residual case where the Hessian approximation fails If after some reasonable number of iterations we realize that the residuals are not going to zero then we are better off switching to a general purpose algorithm for non linear opti mization such asa quasi Newton BFGS or Newton method Orthogonal Distance Regression 7 719 Fixed Regressor Models vs Errorsln Variables Models So far we have assumed that there are no errors in the variables describing where when the measurements are made Le in the data set 13 yj where tj denote times of measurement and yj the measured value we have assumed that tj are exact and the measurement errors are in yj Under this assumption the discrepancies between the model and the measured data are Ejyji 39tj 1727Hom Today we will take a look at the situation where we take errors in tj into account these models are known as errors in variables models and their solutions in the linear case are referred to as total least squares optimization or in the nonlinear case as orthogonal distance regression Orthogonal Distance Regression 7 319 Least Squares vs Orthogonal Distance Regression Figure left An illustration of how the error is measured in standard fixed regressor least squares optimization right An example of orthogonal distance regression where we measure the shortest distance to to model curve The right gure is actually not correct Why7 Orthogonal Distance Regression 7 919 Orthogonal Distance Regression For the mathematical formulation of orthogonal distance regression we introduce perturbations errors Q for the variables fj in addition to the errors 9 for the yj39s We relate the measurements and the model in to following way 939 YJ 4 07 939 51 and define the minimization problem rn 2 7673 argniiign2 lwflyr 4 07 tj51l dj2512l7 H where d and W are two vectors of weights which denote the relative significance of the error terms m Orthogonal Distance Regression 7 1019 Orthogonal Distance Regression The Weights The weight vectors d and VT must either be supplied by the modeler or estimated in some clever way If all the weights are the same W C then each term in the sum is simply the shortest distance between the point tjyJ and curve Clgt39 t as illustrated in the previous figure In order to get the orthogonal looking figure I set W 105 and dj 14 thus adjusting for the different scales in the t and y directions The shortest path between the point and the curve will be normal orthogonal to the curve at the point of intersection We can think of the scaling weighting as adjusting for measuring time in fortnights seconds milli seconds micro seconds or nano seconds m Orthogonal Distance Regression 7 1119 Orthogonal Distance Regression ln Terms of Residuals r By identifying the 2m residuals 0673 Wjyj7 tj6j j12 dj39sm ljsm m jm17m272m we can rewrite the optimization problem 5argnin J J m 2 2 W12 in 7 07 9 61 1262 1 Orthogonal Distance Regression 7 1219 Orthogonal Distance Regression a Least Squares If we take a cold hard stare at the expression F mlli A We realize that this is now a standard least squares problem with 2m terms and n m unknowns 77 6 We can use any of the techniques we have previously explored for the solution of the nonlinear least squares problem However a straight forward implementation of these strategies may prove to be quite expensive since the number of parameters have doubled to 2m and the number of independent variables have grown from n to n l m Recall that usually In gt n so this is a drastic growth of the problem Orthogonal Distance Regression 7 1319 ODR a Least Squares Exploiting Structure Fortunately we can save a lot of work by exploiting the structure of the Jacobian of the Least Squares problem originating from the orthogonal distance regression many entries are zero g Mw07 Vi7j m7is j adjmojm o iy jimjgtm 66 66 djem iUm7J39gtm 9039 aldjsmfljeml 67X 7 6 17707 1712Ho7n7 Jgtm W and let D diaga and v diagl I then we can write the Jacobian of the residual function in matrix form Letvjwj Orthogonal Distance Regression 7 1419 ODR a Least Squares The Jacobian We now have j V J 39 6 X7 7 where D and V arem x m diagonal matrices D diagd and V diag7 and J is the m gtlt n matrix defined by ex j WV GM WK fj5j 22 I We can now use this matrix in eg the Levenberg Marquardt algorithm Orthogonal Distance Regression 7 1519 ODR a Least Squares a Levenberg Marquardt 1 of 2 If we partition the step vector 393 and the residual vector F into x F1 7 r P lpal lol where x 6 R 35 E Rm and F17F2 E Rm then eg we can write the Levenberg Marquardt subproblem in partitioned form jTjIn JTV a i 7W1 VJ v2DZAIm rao VF1DF2 Since the 27 2 block V2 D2 l Im is diagonal we can eliminate the p5 variables from the system m Orthogonal Distance Regression 7 1519 ODR a Least Squares a Levenberg Marquardt 2 of 2 7T7Aln 7W i jTa V7 V2D2Alm 35 VF1DF2 71 55 7 7 v2 D2 Mm vn m2 viiax This leads to the n X n system A x B where gt 71 7T7 AI 7 TV V2 D2 Aim V7 71 7 77TF17TVV2DZAIml Vi mzl U39l Hence the total cost of finding the LM step is only marginally more expensive than for the standard least squares problem Orthogonal Distance Regression 7 1719 ODR A LSQ 2m gtlt n m A LevenbergMarquardt A LSQ m X n The derived system may be illconditionedsince we have formed a modified version of the normal equations JTJ stuff With some work we can recast isas an m X n linear least squares problem 5X arg mini HA5 7 bllz where N A A 71 A A J AJTT 7 V V2 D2 Aim VJ 71 b AEJF VV2D2Im VFlJrDFZH Where the mystery factor 3TH is the pseudo inverse of jT Expressed in terms of the QRfactorization QR we have jT RTQT7 jTl f QR7T7 Since QR TRTQT I RTQTQR T m Orthogonal Distance Regression 7 1319 Software and References MINPACK Implements the LevenbergMarquardt algorithm Available for free from httpwwwnetliborgminpack UDRPACK Implements the orthogonal distance regres sion algorithm Available for ree from http www netlib orgodrpack Other The NAG Numerical Algorithms Group library and HSL formerly the Harwell Subroutine Library implement several robust nonlinear least squares implementations GVL Golub and van Loan39s Matrix Computations 3rd edition chapter 5 has a comprehensive discussion on orthogonal ization and least squares explaining in gory detail much of the linear algebra eg the SVD and QRfactorization we swept under the rug 5351 Orthogonal Distance Regression 7 1919
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'