### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Intro Nonlinear Prog IOE 519

UM

GPA 3.76

### View Full Document

## 6

## 0

## Popular in Course

## Popular in Industrial Engineering

This 33 page Class Notes was uploaded by Loy King on Thursday October 29, 2015. The Class Notes belongs to IOE 519 at University of Michigan taught by Staff in Fall. Since its upload, it has received 6 views. For similar materials see /class/231594/ioe-519-university-of-michigan in Industrial Engineering at University of Michigan.

## Reviews for Intro Nonlinear Prog

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/29/15

IOE 519 NLP Winter 2009 55 8 Barrier Methods for Constrained Optimization Consider the constrained optimization problem P P minm fz St 9i95 whose feasible region we denote by 5 m ER gZm Oi 1m 0i 1l Barrier methods are designed to solve P by instead solving a sequence of specially constructed unconstrained optimization problems In a barrier method we presume that we are given a point z0 that lies in the interior of the feasible region S and we impose a very large cost on feasible points that lie ever closer to the boundary of S thereby creating a barrier to exiting the feasible region In this subsection we will restrict our attention to instances of problem P that have inequality constraints only Sz Rngim 0 i1n The algorithm we develop can be easily extended to problems that also have linear equality con straints in the spirit of the previous section Barrier methods can also be applied to problems with general equality constraints however an appropriate problem transformation is required Different implementations of barrier methods deal with this issue differently A barrier function for P is any continuous function bz de ned on the the interior of the feasible set S such that bz a 00 as maxi1mm a 0 Two common examples of barrier functions are 1 9M m m 2a 7 211199435 and 2a 7 Z i391 i1 The idea in a barrier method is to dissuade points z from ever approaching the boundary of the feasible region We consider solving B0 minimize fz 0bz st 9a lt 0 m E R u for a sequence of 0k gt 0 such that 0k a 0 Note that the constraints unimportant in B0 as they are never binding in B0 9a lt 077 are effectively Examples 1 1 bz or bz I max Z igix Let r0z fz 0bm Let the sequence 0k satisfy 0k1 lt 0k and 0k a 0 as k a 00 Let zk denote the exact solution to B0k Let N5m denote the ball of radius 6 centered at the point a The next result concerns convergence of the barrier method IOE 519 NLP Winter 2009 56 Theorem 81 Barrier Convergence Theorem Suppose fm 9a and bz are continuous functions and suppose Bt9 has a solution for any 0 gt 0 Let m h 1 00 be a sequence of solutions of Bt9k Suppose there exists an optimal solution f of P for which N5m O m 9a lt 0 7 0 for euery 6 gt 0 Then any limit point i of solues Proof Let i be any limit point of the sequence Without loss of generality assume that i is in fact the limit of the sequence From the continuity of fz and 9a limkaoo ak fi and 11mm 9M 9a 0 Thus a is feasible for P If i is in the interior of S then limkH00 0kbzk 0 If i lies on the boundary of S then by our assumption lim HOQ bzk 00 In either case lim inf 0kbzk 2 0 haoo which implies that lim inf fack0kbzk filim inf saw 2 101 1H kaoo If i were not a global minimizer there would exist a feasible vector f such that fz lt fi that can be approached arbitrarily closely through the interior of the feasible set That is there exists a point i in the domain of the barrier function such that fi lt By the de nition of M ask saw i 0km for all k which by taking the limit at k a 00 implies that fi This is a contradiction therefore proving that i is a global minimizer of I Advantages of the barrier methods 0 Avoid the combinatorial issues associated with nding the active constraints 0 Converge under mild conditions 0 Provide estimates of multipliers at the optimum see below 0 For many barrier functions including the two examples above the barrier function is convex if P is a convex problem Barrier problems are typically solved Via Newton s method which converges rapidly if initial ized close to the solution of the barrier problem Since for suf ciently small 9 the solution of B0 is suf ciently close to the optimal solution of P one may wonder why you should bother with solving a sequence of barrier problems instead of solving just one with a low value of 0 However the Hessian Vimb of the barrier function tends to become ill conditioned when 9 is very small making the problem dif cult to solve Solving a sequence of barrier problems provides a remedy when the solution 4 of the previous barrier problem is used as an initial point for solving the current barrier problem the problem tends to be easier to solve if 0k is not much smaller than 0k1 since then under reasonable assumptions zk l is close to M Suppose more generally IOE 519 NLP Winter 2009 57 where 39yy R a R and assume that 39yy is continuously differentiable for all y lt 0 Then in was Z Lgfiw 7 i1 l and if xk solves 309k then V zk amok 0 that is m k Vf95k at 8Vgf llvgmk 0 lt7 i1 1 Let us de ne 8 k 0 L 8 Then 7 becomes WM Zufvgzk 0 9 i1 Therefore we can interpret the uk as a sort of vector of Karush Kuhn Tucker multipliers In fact we have Lemma 82 KarushKuhn Tucker Multipliers in Barrier Methods Let P satisfy the con ditions of the Barrier Conuergence Theorem Suppose 39yy is continuously di erentiable and let uk be de ned by Then if mk gt i and i satis es the linear independence condition for gradient uectors of actiue constraints then uk gt a where a is a uector of Karush Kuhn Tucker multipliers for the optimal solution i of Proof Let zk a i and let I 0 and N lt 0 For all i E N 0k 8Vgmk a 0 as k a 00 M since 0 a 0 and a lt O and 37295 is nite since is continuously differentiable Also 2 0 for all i and k suf ciently large Suppose uk a a as k a 00 Then a 2 O and u 0 for all i E N From the continuity of all functions involved 9 implies that Thus a is a vector of Kuhn Tucker multipliers It remains to show that uk a a for some unique Ll Suppose has no accumulation point Then a 00 But then de ne k 11k 771k lu H and then 1 for all k and so the sequence ukf 1 has some accumulation point 17 satisfying l 1 For alliEN u 0 for k large whereby 17 0 foriEN and m m 7 k Divem ZleMk Z vgmk 7m ieI i1 i1 H5 IOE 519 NLP Winter 2009 18 5 Overview of algorithms for unconstrained optimization 51 General optimization algorithm Recall we are attempting to solve the problem P min ms st 95 6 Rquot where is differentiable Solutions to optimization problems are almost always impossible to obtain directly or in closed form 7 with a few exceptions Hence7 for the most part7 we will solve these problems with iterative algorithms These algorithms typically require the user to supply a starting point 950 Beginning at 950 an iterative algorithm will generate a sequence of points Ik0 called iterates In deciding how to generate the next iterate7 xk17 the algorithms use information about the function f at the current iterate7 95k and sometimes past iterates x0 7 x 1 ln practice7 rather than constructing an in nite sequence of iterates7 algorithms stop when an appropriate termination criterion is satis ed7 indicating either that the problem has been solved within a desired accuracy7 or that no further progress can be made Most algorithms for unconstrained optimization we will discuss fall into the category of directional search algorithms General directional search optimization algorithm Initialization Specify an initial guess of the solution 950 Iteration For k 017 If 95 is optimal7 stop Otherwise7 0 Determine d1 7 a search directions 0 Determine 06 gt 0 7 a step size 0 Determine 95k 95k o dk 7 a new estimate of the solution 511 Choosing the direction Typically we require that d1 is a descent direction of f at 95 that is7 ack adk lt Va 6 076 for some 6 gt 0 For the case when 1 is differentiable7 we have shown in Theorem 41 that any d1 such that VfxkTdk lt 0 is a descent direction whenever 75 0 Often7 direction is chosen to be of the form d 7Dkvrltxn where D1 is a positive de nite symmetric matrix Why is it important that D1 is positive de nite IOE 519 NLP Winter 2009 19 The following are the two basic methods for choosing the matrix D1 at each iteration they give rise to two classic algorithms for unconstrained optimization we are going to discuss in class Steepest descent D1 I k 012 Newton s method D1 Hxk 1 provided is positive de nite Choosing the stepsize After all is xed 0 ideally would solve the one dimensional optimization problem k k d rung 04 This optimization problem is usually also impossible to solve exactly lnstead 0 is computed via an iterative procedure referred to as line search either to approximately solve the above optimization problem or to ensure a suf cient decrease in the value of Testing for optimality Based on the optimality conditions 95 is a locally optimal if 0 and is positive de nite However such a point is unlikely to be found In fact the most of the analysis of the algorithms in the above form deals with their limiting behavior ie analyzes the limit points of the in nite sequence of iterates generated by the algorithm Thus to implement the algorithm in practice more realistic termination criteria need to be implemented They often hinge at least in part on approximately satisfying to a certain tolerance the rst order necessary condition for optimality discussed in the previous section We begin by commenting on how the line search can be implemented in practice and then discuss methods for choosing all in more detail 52 Stepsize selection onedimensional optimization In this subsection we address the practical methods of implementing the line search called for in the general optimization algorithm Essentially we are interested in methods for solving one dimensional optimization problems There are many methods for solving such problems We are going to describe two the bisection method and Armijo rule 521 A bisection algorithm for a line search of a convex function Suppose that is a continuously differentiable convex function and that we seek to solve 64 argmin ad Dtgt0 where a is our current iterate and CT is the current direction generated by an algorithm that seeks to minimize We assume that CT is a descent direction ie VffTcilt 0 Let M m mi whereby Fa is a convex function in the scalar variable 04 and our problem is to solve for F Oz arg 04 Applying the necessary and suf cient optimality condition to the convex function Fa we want to nd a value 64 for which FO7 0 It is elementary to show that FOz Vf acifci Therefore since cl is a descent direction FO lt 0 IOE 519 NLP Winter 2009 20 Also7 since Fa is a convex function of oz FOz is a monotone non decreasing function of 04 Suppose that we know a value 34 gt 0 such that F z gt 0 Consider the following bisection algorithm for solving the equation F a z 0 Step 0 Set k 0 Set Otl 2 0 and an 2 64 Step k Set 64 my and compute F 39 o If F 64 gt 07 re set an 2 0 Set k lt k 1 o If F 64 lt 07 re set Otl 2 0 Set k lt k 1 o If F 64 07 stop Below are some observations on which the convergence of the algorithm rests2 0 After every iteration of the bisection algorithm7 the current interval 0410 must contain a point 64 such that F z 0 0 At the kth iteration of the bisection algorithm7 the length of the current interval 041 an is L g a o A value of 04 such that la 7 07 g e can be found in at most 34 1 logz Jl Suppose that we do not have available a convenient value of a point 34 for which F z gt 0 One way to proceed is to pick an initial guess of 34 and compute F z lf F 64 gt 07 then proceed to the bisection algorithm if F 64 g 07 then re set 34 lt 264 and repeat the process steps of the bisection algorithm In practice7 we need to run the bisection algorithm with a stopping criterion Some relevant stopping criteria are 0 Stop after a xed number of iterations That is stop when k E where E speci ed by the user 0 Stop when the interval becomes small That is7 stop when an 7 041 g e where e is speci ed by the user 0 Stop when lF zl becomes suf ciently small That is7 stop when lF zl g e where e can be pre speci ed by the user7 or determined by the local properties of the function Fa This criterion typically works best in practice Finally7 our discussion so far assumed that the domain of the optimization problem was the entire Rquot If our optimization problem is P min st 95 6 X7 where X is an open set7 then the line search problem is minfaE 045 st 9 0436 X IOE 519 NLP Winter 2009 21 In this case7 we must ensure that all iterate values of Oz in the bisection algorithm satisfy f04CZ E X As an example7 consider the following problem P min 2 7 lnbi 7 alrx st 1 7 A95 gt Here the domain of is X x E Rquot 2 b 7 A95 gt 0 Given a point a E X and a direction cf the line search problem is m LS min ha 2 at 7 Z lnbi 7 alTaE 043 11 st 1 7 AQE ad gt 0 Standard arithmetic manipulation can be used to establish that b7Aafozclgt0ifandonlyif64ltozltt3z7 where V i diff A diff Oz 2 7 min f and Oz 2 min f 7 adlto 7a cl adgt0 a1 d and the line search problem then is m 7 LS 2 minimize ha 2 7 Z lnbi 7 alr 0451 11 522 Armijo rule or backtracking Very often in practice performing an exact line search by a method such as the bisection method is too expensive computationally if we require good precision in computation Recall that we need to perform a line search at every iteration of our steepest optimization algorithm On the other hand7 if we sacri ce accuracy of the line search7 this can cause inferior performance of the overall algorithm The Armijo rule is one of several inexact line search methods which guarantees a suf cient degree of accuracy to ensure the algorithms convergence Armijo rule requires two parameters 0 lt p lt 05 and 0 lt B lt 1 Suppose we are minimizing a function Fa such that F 0 lt 0 which is indeed the case for the line search problems arising in descent algorithms Then the rst order approximation of Fa at 04 0 is given by F0 04F 0 De ne 3a F0uozF0 see gure A stepsize 64 is considered acceptable by Armijo rule only if F64 g F0547 that is7 if taking a step of size 64 guarantees suf cient decrease of the function IOE 519 NLP Winter 2009 59 9 Penalty Methods for Constrained Optimization Consider the constrained optimization problem P P minm x st 9lm 0 i1m m ER We will write 9a 91z7 gmzT for convenience De nition A function pz R a R is called a penalty function for P if pz satis es 0 pz 0 ifgz 0 and o pz gt 0 ifgz f 0 Example m 2995 Zltmax07gi2 i1 We then consider solving the following penalty program Pc minm x cpz st z E R for an increasing sequence of constants c as c a 00 Note that in the program Pc we are assigning a penalty to the violated constraints The scalar quantity 0 is called the penalty parameter Let ck 2 071 17 700 be a sequence of penalty parameters that satis es Ck1 gt ck for all k and limknoo ck 00 Let qe7 z x epm7 and let zk be the exact solution to the program Pek7 and let denote any optimal solution of The following Lemma presents some basic properties of penalty methods Lemma 91 Penalty Lemma U QCk7k S QCk17k1 W M 2 MM U H S aw w Her 2 QCk7k 2 ask Proof i We have q0k17k1 fk1 Ck1Pk1 Z fk1 Ckp k 2 1 ckpw gem ii 1 t 014 S HM t Ckpk1 and Hank ck1pltzk1 S mki Ck1pmk IOE 519 NLP Winter 2009 60 Thus Ck1 7 ekpzk 2 Ck1 7 ekpzk1 whereby pxk 2 pxk1 iii We have f90k1 Ckpk1 Z mkl t 01449 But 19M 2 WW1 from ii which implies that xkl l 2 folk 1V H S ask Cmmk S J W Cm H I The next result concerns convergence of the penalty method Theorem 92 Penalty Convergence Theorem Suppose that u 9a and pz are con tinuous functions Let zkk 1 00 be a sequence of solutions to Pck Then any limit point i of solues Proof Let i be a limit point of From the continuity ofthe functions involved limknoo fzk Also from the Penalty Lemma q hm mom fm kaoo Thus hm ammo mm mm 7 m m 7 am kaoo kaoo But ck a 00 which implies from the above that lim 19M 0 kaoo Therefore from the continuity of pz and 9a pi O and so 9i O that is i is a feasible solution of Finally from the Penalty Lemma fzk fz for all k and so i fm which implies that i fz and therefore i is an optimal solution of I An often used class of penalty functions is M3 pz I max0gimq where q 2 1 10 i ll We note the following o If q 1 pz in 10 is called the linear penalty function This function may not be differen tiable at points where gi 0 for some i 0 Setting q 2 is the most common form of 10 that is used in practice and is called the quadratic penalty function Notice that the penalty function pz is in actuality a function only of 9z where 9m max0gm the nonnegative part of giz i 1 m Then we can write pz 39ygm where 39yy is a function of y 6 RT Two examples of this type of penalty function are We 7 i1 IOE 519 NLP Winter 2009 61 which corresponds to the linear penalty function and m 2 We 2 y 7 i1 which corresponds to the quadratic penalty function Note that even if 39yy is continuously differentiable pz might not be continuously differentiable since 9m is not differentiable at points z where 0 for some i However if we assume the following an 4 7 0 t 39 0 1 11 ayi a 9i 7 z 7 7m 7 then pz is differentiable whenever the functions 97 are differentiable i 1 m and we can write m 3 9 m We 2 Mvgxz lt12 8 i1 Now let zk solve Pck Then zk will satisfy WM ckva 0 that is Vfzk ck i 8VgakV9izk 0 i1 Let us de ne k8v9k 39 8M Then 0 i1 and so we can interpret the uk as a sort of vector of Karush Kuhn Tucker multipliers In fact we have Lemma 93 Suppose 39yy is t in y is t39 l39jj 39 7 and satis es 1 and that and 9a are di erentiable Let uk be de nied by Then if mk gt i and i satis es the linear independence condition for gradient vectors of active constraints then uk is a vector of Karush Kuhn Tucker multipliers for the optimal solution i of a u where a Proof From the Penalty Convergence Theorem i is an optimal solution of Let I 0 and N lt 0 For i E N lt 0 for all k suf ciently large so 0 for all k suf ciently large whereby u 0 for i E N From 13 and the de nition of a penalty function it follows that 2 0 for i E I for all k suf ciently large Suppose uk involved gt u as k a 00 Then u 0 for i E N From the continuity of all functions WM Zufvgw 0 i1 IOE 519 NLP Winter 2009 62 implies Vfi Zungi 0 71 From the above remarks7 we also have 71 2 0 and 711 0 for all i E N Thus 71 is a vector of Karush Kuhn Tucker multipliers It therefore remains to show that 71k a 71 for some unique 717 which is done along the same lines as in Lemma 82 I Remark The quadratic penalty function satis es the condition 117 but the linear penalty function does not satisfy 11 91 Exact Penalty Methods The idea in an exact penalty method is to choose a penalty function pz and a constant c so that the optimal solution i of Pc is also an optimal solution of the original problem Theorem 94 Suppose P is a convex program for which the Karush Kuhn Tucker conditions are necessary Suppose that m 2995 Z 97 i1 Then as long as c is chosen su iciently large the sets of optimal solutions ofPc and P coincide In fact it su icies to choose 0 gt max u where u is a uector of Karush Kuhn Tucker multipliers Proof Suppose i solves For any x E R we have M3 A E ed 3 A R V V gm M lt97ltzgtgt 2 M 71 2 7ltzgtgumltzgt M 7 9792 we 25 e z M 77vg7lt2gtTltz e 2 fltxgtewltigtTltzmeigt 2 7e M 972t mi Thus qc7 i qc7 z for all m and therefore i solves Pc Next suppose that i solves Pc Then if i solves l7 we have and so 1 S 1 CZ979 14 IOE 519 NLP Winter 2009 63 However if e is not feasible for P then we 2 fiYnfiTiii we 7 ungiiTi e 22gt 1 WWW 7 gm re 7 we gt m2 7 piw l A 2H Mgu ll M3 39 1 m which contradicts 14 Thus i is feasible for That being the case fi fi7c Z 9 i1 fi from 14 and so i solves I 92 Penalty Methods for Inequality and Equality Constraints The presentation of penalty methods has assumed that the problem P has no equality constraints If the problem has equality constraints hz O we could convert them into inequality constraints by writing hz 077 and ihim 0 but such conversion usually violates good judgement in that it unnecessarily complicates the problem Furthermore it can cause the linear independence condition to be automatically violated for every feasible solution Therefore instead let us consider the constrained optimization problem P with both inequality and equality constraints P minm 1 st gz 0 hm 0 m E R where gz and hz are vector valued functions that is gz 91z gmzT and hz h1z hlmT for notational convenience De nition A function pz R a R is called a penalty function for P if pz satis es 0 pz 0 ifgz 0 and hz 0 o pz gt 0 ifgz f 0 or hz 7 O The main class of penalty functions for this general problem are of the form m 1 2990 Z lmax079ilq 2 WWW where q 2 1 i1 i1 All of the results of this section extend naturally to problems with equality constraints and for penalty functions of the above form For example in the analogous result of Theorem 94 it suf ces to choose 0 gt maxn f n n lv 93 Augmented Lagrangian methods A modi cation of the classical penalty methods involves explicitly incorporating the KKT multi pliers into the method This complicates the algorithm but has the potential of ameliorating the ill conditioning encountered in the penalty method and improving rates of convergence IOE 519 NLP Winter 2009 64 Consider the problem of the form min x st hx O The Lagrangian of this problem is de ned to be La 12 35 Jim lfKKT conditions are necessary at a local minimizer x then for 12 12 the set ofKKT multipliers x is a stationary point of the Lagrangian with 12 12 but usually not a minimizer It is a minimizer of the problem min x 12Thx st hx 0 Applying the penalty approach to the latter problem consider 1 T 1 T mmmmz u p M U W 5mm W to obtain x The above function is referred to as the augmented Lagrangian Intuitively a minimizer of Ax12 p is close to the solution x if 0 12 is close to 12 and p gt 0 is suf ciently large 0 p is very large Example Consider min x subject to x1 1 The optimal solution is x 1 0T and 12 71 The augmented Lagrangian is i 1 1 mmmAWw p 5121 xi um i 1 W14 By setting its gradient to zero the unique unconstrained minimizer is C 7 U 95101710 m7 90201710 0 Thus for all p gt 0 lim x112p 1 lim x112p 0 124112 124112 On the other hand for all 12 ggx1ltvpgt1 plggomw 0 An augmented Lagrangian algorithm is initialized with some x0120 and p0 gt O and proceeds as follows 1 If VLxk 12k 0 then stop 2 Compute xk by solVing minm Ax12kpk k 3 Determine 12 1 and pk1 For example take Uk1 Ukpkhxk17 and pk1 2 pk IOE 519 NLP Winter 2009 12 4 Optimality conditions for unconstrained problems The de nitions of global and local solutions of optimization problems are intuitive but usually impossible to check directly Hence we will derive easily veri able conditions that are either necessary for a point to be a local minimizer thus helping us to identify candidates for minimizers or suf cient thus allowing us to con rm that the point being considered is a local minimizer or sometimes both P min ms st 95 E X where x x1 xnT E Rquot f Rquot a R and X 7 an open set usually X Rquot 41 Optimality conditions the necessary and the su icient Necessary condition for local optimality if a is a local minimizer of P then a must satisfy77 Such conditions help us identify all candidates for local optimizers Theorem 41 Suppose that f is di erentiable at at If there is a vector d such that VfaTd lt 0 then for all A gt 0 su ciently small Ad lt d is called a descent direction it satis es the latter condition Proof We have Hf Ad N AVffTd AHdHMi A507 where 04aE Ad a 0 as A a 0 Rearranging f0 Ad N A VffTd HdHozaE Ad Since VfaETd lt 0 and ozaEAd a 0 as A gt 0 Ad 7 lt 0 for all A gt 0 suf ciently small Corollary 42 Suppose f is di erentiable at 9 fat is a local minimizer then 0 such a point is called a stationary point Proof If 75 0 then d 7Vff is a descent direction whereby 9 cannot be a local minimizer I The above corollary is a rst order necessary optimality condition for an unconstrained minimiza tion problem However a stationary point can be a local minimizer a local maXimizer or neither The following theorem will provide a second order necessary optimality condition First a de ni tion De nition 43 An n X n matrix M is called symmetric Mij Mi Vij A symmetric n X n matrix M is called 0 positive de nite xTMx gt 0 V95 6 Rquot x 75 0 0 positive semide nite xTMx 2 0 V95 6 Rquot 0 negative de nite xTMx lt 0 V95 6 Rquot x 75 0 0 negative semide nite xTMx g 0 V95 6 Rquot IOE 519 NLP Winter 2009 13 o inde nite 3xy E Rquot xTMx gt 0 yTMy lt 0 Hz 2 Example 1 is positive de nite Example 2 is positive de nite To see this note that for x 75 0 xTMx 895 7 2951952 95 795 x1 7 2 gt 0 Since M is a symmetric matrix all it eigenvalues are real numbers It can be shown that M is positive semide nite if and only if all of its eigenvalues are nonnegative positive de nite if all of its eigenvalues are positive etc Theorem 44 Suppose that f is twice continuously di erentiable at a E X fat is a local mini mizer then 0 and the Hessian at a is positive semide nite Proof From the rst order necessary condition 0 Suppose is not positive semide nite Then 3d such that dTHaEd lt 0 We have m Ad m Avm l AZdTHGc l AZHdHZOdaE Ad 7 ms AZdTHWd AZHdHZWZ M where 0493 Ad 7 0 as A 7 0 Rearranging ff 412 W dTH l lldeodaE Ad Since dTHaEd lt 0 and 0493 Ad 7 0 as A 7gt 0 Ad 7 lt 0 for all A gt 0 suf ciently small 7 contradiction I Example 3 Let 1 95le 295 7 4x1 7 4x2 7 I Then T 951 952 7 4951 4952 7 47 395 1 1 HI 1 476952 0 has exactly two solutions a 40 and 9E 31 But my 32 is inde nite therefore the only possible candidate for a local minimizer is a 4 0 7 and IOE 519 NLP Winter 2009 14 Necessary conditions only allow us to come up With a list of candidate points for minimizers Su cient condition for local optimality if 9 satis es then a is a local minimizer of 1377 Theorem 45 Suppose that f is twice differentiable at at If 0 and is positive de nite then a is a strict local minimizer Proof 1 1 ix fTHfx 9 HI 931126493796 93 Suppose that a is not a strict local minimizer Then there exists a sequence 95 a a such that 95 75 a and g for all k De ne dk 23 Then Hz rm M Hm 7 22H Gamma Mm 7 22gt so ack N lt 0 HM fHZ d H lk 0492 x1 7 a Since 1 for any k there exists a subsequence of dk converging to some point d such that 1 by Theorem 39 Assume wolog that dk gt d Then 0 2 1331010 dgH lk m x 7 92 dTHem Which is a contradiction With positive de niteness of I m o If 0 and is negative de nite then a is a local maximizer o If 0 and is positive semide nite we cannot be sure if a is a local minimizer Example 4 Consider the function 2951952 7 x2 9 Stationary points are candidates for optimality to nd them we solve Vim lt x x12xz 0 21I271 Solving the above system of equations results in two stationary points 95 171T and 95b 2 73 The Hessian is H05 2x121 i Hxalt iandHxblt 2 Here Hxa is inde nite hence 95a is neither a local minimizer or maximizer Hxb is positive de nite hence 95b is a local minimizer Therefore the function has only one local minimizer i does this mean that it is also a global minimizer In particular IOE 519 NLP Winter 2009 15 42 Convexity and minimization De nitions 0 Let xy 6 1Rquot Points of the form Ax 1 7 Ay are called convex combinations of x and y for A 6 01 More generally point y is a convex combination of points x1xc if y 2 104ixi where 04 2 0 Vi and 2 1041 1 o A set S C Rquot is called convex if Vxy E S and VA 6 01 Ax 1 7 Ay E S o A function f S 7 R where S is a nonempty convex set is a convex function if fW lt1 7 My AM lt1 7 my my 6 s w e 01 o A function f as above is called a strictly convex function if the inequality above is strict for all x 75 y and A 6 01 0 A function f S 7 R is called concave strictly concave if is convex strictly convex Consider the problem CP minm st x E S Theorem 46 Suppose S is a nonempty convex set 1 S 7 R is a convex function and x is a local minimizer of UP Then at is a global minimizer of 1 over S Proof Suppose x is not a global minimizer ie 3y 6 S lt Let yA Ax 1 7 Ay which is a convex combination of x and y for A 6 01 and therefore yA E S for A 6 01 Note that yA 7 x as A 7 1 From the convexity of f yw 1 09 1 7 My S AM 1 Amy lt V0 1 AWE f0 for all A 6 01 Therefore lt for all A 6 01 and so at is not a local minimizer resulting in a contradiction I Note 0 A problem of minimizing a convex function over a convex feasible region such as we considered in the theorem is a convex programming problem o If f is strictly convex a local minimizer is the unique global minimizer o If f is strictly concave a local maximizer is a unique global maximizer The following results help us to determine When a function is convex Theorem 47 Suppose X g Rquot is a nonempty open convex set and f X 7 R is di erentiable Then 1 is convex if and only if it satis es the gradient inequality y 2 ms WltxgtTlty 7 9 WW 6 X Proof Suppose f is convex Then for any A 6 01 fy1x Afy1fxgt fyf96 IOE 519 NLP Winter 2009 16 Letting A 7gt 0 we obtain VfxTy 7 x g 7 establishing the only if77 part Now suppose that the gradient inequality holds Vx y E X Let u and 2 be any two points in X Let A 6 01 and set x Aw 1 7 Az Then NW 2 M VfrTw 7 I and W 2 x WWW 7 96 Taking a convex combination of the above inequalities A w 1 7 MW 2 1 VfrTAw 7 I 1 7 AW 7 96 7 1 WWTO fw 1 7 AM so that fx is convex I In one dimension the gradient inequality has the form y 2 at f xy 7 x Vx y e X The following theorem provides another necessary and suf cient condition for the case when 1 is twice continuously differentiable Theorem 48 Suppose X is a nonempty open convex set and f X 7 R is twice continuously di exentiable Then 1 is convex the Hessian of f Hx is positive semide nite Vx E X Proof Suppose f is convex Let x E X and cl be any direction Then for A gt 0 suf ciently small xAcl E X We have 1 g Ad 93 V f m AdTHfAd llAdllzaUCi A507 where 049331 7 0 as y 7 0 Using the gradient inequality we obtain A2 dTH l Hcleozx A50 2 0 Dividing by A2 gt 0 and letting A 7 0 we obtain dTHxd 2 0 proving the only if77 part Conversely suppose that is positive semide nite for all z E X Let xy E S be arbitrary lnvoking the second order version of the Taylor7s theorem we have 1 y 96 V x y 7 96 3 7 96THZI 7 96 for some 2 which is a convex combination of x and y and hence z E S Since is positive semide nite the gradient inequality holds and hence f is convex I In one dimension the Hessian is the second derivative of the function the positive semide niteness condition can be stated as f 2 0 Vx E X One can also show the following suf cient but not necessary condition Theorem 49 Suppose X is a nonempty open convex set and f X 7 R is twice continuously di exentiable Then 1 is strictly convex the Hessian of f Hx is positive de nite Vx E X For convex unconstrainted optimization problems the optimality conditions of the previous sub section can be simpli ed signi cantly providing a single necessary and suf cient condition for global optimality IOE 519 NLP Winter 2009 23 53 Steepest descent algorithm for minimization The steepest descent algorithm is a version of the general optimization algorithm that chooses dk 7Vfmk at the hth iteration As a source of motivation note that fz can be approximated by its linear expansion it d z i VfiTd It is not hard to see that so long as Vf 7 0 the direction M 1095 WWW VfiTVfi is the unit length direction that minimizes the above approximation Indeed for any direction at with 1 the Schwartz inequality yields J WW 2 HWWH H dH HWWH WWW Of course if Vf 0 then i is a candidate for local minimizer ie i satis es the rst order necessary optimality condition The direction at 7Vfi is called the direction of steepest descent at the point i Note that d 7Vfi is a descent direction as long as Vf 7 0 To see this simply observe that dTV i 7VfiTVfi lt 0 so long as V e 7g 0 A natural consequence of this is the following algorithm called the steepest descent algorithm Steepest Descent Algorithm Step 0 Given m0 set h e 0 Step 1 dk 7Vfzk If dk 0 then stop Step 2 Choose stepsize 04k by performing an exact or inexact line search Step 3 Set 3W1 e xk akdk k e k 1 Go to Step 1 Note from Step 2 and the fact that dk 7Vfmk is a descent direction it follows that fzk1 lt 1 The following theorem establishes that under certain assumptions on f the steepest descent algo rithm converges regardless of the initial starting point do ie it exhibits global conuergence Theorem 51 Convergence Theorem Suppose that f R a R is continuously di erentiable on the set S m E R fz0 and that S is a closed and bounded set Suppose further that the sequence is generated by the steepest descent algorithm with stepsizes 04k chosen by an exact line search Then euery point i that is a limit point of the sequence satis es 0 Proof The proof of this theorem is by contradiction By the Weierstrass7 Theorem at least one limit point of the sequence must exist Let i be any such limit point Without loss of generality assume that limknoo zk i but that Vf 7 O This being the case there is a value ofamp gt 0 such that 6 Q i 7 fi6id gt 0 where d 7Vfi Then also i6id E intS Let dk be the sequence of directions generated by the algorithm ie dk 7Vfzk Since fiis continuously differentiable limknoo dk d Then since i64d E intS and pk 64dk a i64d for h suf ciently large we have ashed Sfi6wltfi6fi IOE 519 NLP Winter 2009 24 However 6 M S ask akdk ask adk M 7 5 which is of course a contradiction Thus d 7Vfi O I Under additional assumptions on f it can be also shown that the steepest descent algorithm will demonstrate global convergence properties under the Armijo line search rule the additional assumption basically ensures that the gradient of 1 does not change too rapidly An example Suppose fz is a simple quadratic function of the form 1 T T 1095 595 Qq 907 where Q is a positive de nite symmetric matrix The optimal solution of P is easily computed as 90 Q lq since Q is positive de nite it is non singular and direct substitution shows that the optimal objective function value is 1 T 7 fW 7g c2 lq For convenience let x denote the current point in the steepest descent algorithm We have 1 T T M 75 Qzq z and let d denote the current direction which is the negative of the gradient ie d 7Vfm iQmiq Now let us compute the next iterate of the steepest descent algorithm If a is the generic stepsize then 1 m ad adTQx ad qTm ad 1 1 EmTQm adTQm EaZdTQd qu aqu fa 7 ade aZdTQd Optimizing the value of a in this last expression yields W a 7 dTCQd7 and the next iterate of the algorithm then is z zad7m de d wherediiQmi dTQd q39 1 W2 M m ad M 7 cvde gem M 7 5m IOE 519 NLP Winter 2009 Suppose that 7 4 72 7 2 Q 72 2 and 4 72gt Then 7 4 72 1 1 2 W 72 2 12gt 72gt and so 1 3 and W 71 Suppose that 1 0 0 0 Then we have 22 008 etc 1 10404 and the even numbered iterates satisfy zz 1 7 02 2 7 2 202 752 01 702 and and so H12 7 M 0272 W 7 fz 02 Therefore sta1ting from the point 1 0 00 distance from the current iterate to the optimal solution goes down by a factor of 02 after every two iterations of the algorithm a similar observation an be made about the progress of the objective function values The graph below plots the progress of the sequence sz 7 2 as a function of iteration number notice that the yiaxis is drawn on 39 39 i this allows us to visualize the progress of the algorithm better as values of a logarit c scale sz 7 2 approach zero Convergence of steepese descent Iterates e um Humu my Ilnrallan u Although it is easy to nd the optimal solution of the quadratic optimization problem in closed form the above example is relevant in that it demonstrates a typical performance of the steepest descent IOE 519 NLP Winter 2009 26 algorithm Additionally most functions behave as near quadratic functions in a neighborhood of the optimal solution making the example even more relevant Termination criteria Ideally the algorithm will terminate at a point zk such that Vfzk 0 However the algorithm is not guaranteed to be able to nd such point in nite amount of time Moreover due to rounding errors in computer calculations the calculated value of the gradient will have some imprecision in it Therefore in practical algorithms the termination criterion is designed to test if the above condition is satis ed approximately so that the resulting output of the algorithm is an approximately optimal solution A natural termination criterion for the steepest descent could be HV zk g e where e gt 0 is a pre speci ed tolerance However depending on the scaling of the function this requirement can be either unnecessarily stringent or too loose to ensure near optimality consider a problem concerned with minimizing distance where the objective function can be expressed in inches feet or miles Another alternative that might alleviate the above consideration is to terminate when elfmkl 7 this however may lead to problems when the objective function at the optimum is zero A combined approach is then to terminate when HWMW S 61 WWW The value of e is typically taken to be at most the square root of the machine tolerance eg e 10 8 if 16 digit computing is used due to the error incurred in estimating derivatives 54 Newton7s method for minimization Again we want to solve P min M m E R The Newton s method can also be interpreted in the framework of the general optimization algo rithm but it truly stems from the Newton s method for solving systems of nonlinear equations Recall that if j R a R to solve the system of equations WE 0 one can apply an iterative method Starting at a point i approximate the function 1 by ME d z Mi V gtiTd where V gtiT E Rn is the Jacobian of j at i and provided that V gti is non singular solve the system of linear equations WWW we to obtain 1 Set the next iterate z i d and continue This method is well studied and is well known for its good performance when the starting point i is chosen appropriately The Newton s method for minimization is precisely an application of this equation solving method to the system of rst order optimality conditions Vfz 0 Here is another view of the motivation behind the Newton s method for optimization At z i fz can be approximated by M z W 2 me WltigtTltz e 22gt gm 7 EgtTHlt22gtltx e 22gt IOE 519 NLP Winter 2009 27 which is the quadratic Taylor expansion of x at z i qz is a quadratic function which is minimized by solving Vqm O ie VfiHimii 0 which yields an e i 7He 1Vfe The direction 7Hi 1Vfi is called the Newton direction or the Newton step This leads to the following algorithm for solving P Newton7s Method Step 0 Given m0 set k e 0 Step 1 dk 7Hzk 1Vfmk If dk 0 then stop Step 2 Choose stepsize 04k 1 Step 3 Set 3W1 e xk ekdk k e k 1 Go to Step 1 Proposition 52 If is 1011 then at 7Hm 1Vfm is a descent direction Proof It is suf cient to show that VfmTd 7VfzTHm 1Vfz lt 0 Since is positive de nite if v 7 O 0 lt Hm 1UTHzHx 1U UTHz 1U completing the proof I Note that 0 Work per iteration 0n3 o The iterates of Newton s method are in general equally attracted to local minima and local maxima Indeed the method is just trying to solve the system of equations Vfz O The method assumes is nonsingular at each iteration Moreover unless is positive de nite dk is not guaranteed to be a descent direction 0 There is no guarantee that fzk1 Step 2 could be augmented by a linesearch of zk adk over the value of a then previous consideration would not be an issue What if becomes increasingly singular or not positive de nite Use d In general points generated by the Newton s method as it is described above may not converge For example Hmk 1 may not exist Even if is always non singular the method may not converge unless started close enough77 to the right point Example 1 Let me 7x7lnx Then Vfm at 77 and H f m 52 It is not hard to check that f 0142857143 is the unique global minimizer The Newton direction at z is 1 f 2 1 2 d7Hz Vfm7miz lt77 z77m and is de ned so long as x gt 0 So Newton s method will generate the sequence of iterates with M zk zk77zk2 2zk77mk2 Below are some examples of the sequences generated IOE 519 NLP Winter 2009 by this method for di erent sta1ting points 0142857143 0142857143 toooqoacnpwmido H o note that the iterate in the rst column is no in the domain of the objective function so the algorithm has to terminate Below is a plot of the progress of the algorithm as a functi n of iteration number for the two sequences that did converge Convergence of Newton iterates u am a nun u men ixgk x 1 000000 0 uuuuum u wanna u uuuuuuum neranon number Example 2 fz 7n1 7 11 12 7 11111711112 4 7 WI W J m 7 2 2 2 143 9 m m we we Hz IOE 519 NLP Winter 2009 1 fz 3295836866 ummpwmwow 7 15 l W i 75H l 05 0717006802721088 00965986394557823 0450831061926011 0512975199133209 0176479706723556 0238483249157462 0 0 M Amm A 0 0 41 OIIOUI 0338449016006352 032623807005996 U U llbbb 0333333343617612 033333332724128 1195322118554435 8 0333333333333333 0333333333333333 1570092458683785 16 000874716926379655 4 was 5 Convergence of Newton iterates erkxquot H xteratmn number Termination criteria Since Newton7s method is Working with the Hessian as We as the gradient it would be natural to augment the termination criterion We used in the Steepest Descent algorithm with the requirement that Hzk is positive semiide nite or taking into account the potential for the computational errors that Hzk 51 is positive semiide nite for some 5 gt 0 this parameter may be di erent than the one used in the condition on the gradient 55 Comparing performance ofthe steepest descent and Newton algorithms 551 Rate of convergence Suppose We have a com579mg sequence limknoo 5k and We would like to characterize the speed or rate at which the iterates 5k approach the limit 5 A converging sequence of numbers 5k exhibits lmem convergence if for some 0 g C lt 1 IOE 519 NLP Wm 2009 so c m Lhe ebme eszeeeen ls quot ned m as we mt wmmnt 1f c e 0 L115 sequence ahbxbs 19 de cumagach A convayng sequence of numbas 5k athlbxm qwde cumagach 1f Examples sweruneeuomergeme Skew he e ec o sme Id 1 7 ieoukew we Mm 10 Quadratic cmvergence 5k 7 quot 0 1 o 01 0 com 0 00000001 etc 3 e ms 111m an campus Lhe we of cumagach af ne ebme sequaqca mumaunn of rates of convergence mm 54 Ileralinn k HnDav u supcmnear mum algemhms Indeed smce an algemhm m mnhneu epummum mums m as mum mm r y e y L y m makes 0 Equot W1quot 7 MN J sad discuss Lhe mm of cumagach of Lhe sequence ku mm mm have 1m 0 IOE 519 NLP Winter 2009 31 552 Rate of convergence of the steepest descent algorithm for the case ofa quadratic function In this section we explore answers to the question of how fast the steepest descent algorithm converges Recall that in the earlier example we observed linear convergence of both the sequence Ek and 5k We will show now that the steepest descent algorithm with stepsizes selected by exact line search in general exhibits linear convergence but that the rate constant depends very much on the ratio of the largest to the smallest eigenvalue of the Hessian matrix at the optimal solution x mquot In order to see how this dependence arises we will examine the case where the objective function x is itself a simple quadratic function of the form 1 1095 TQ qT907 where Q is a positive de nite symmetric matrix We will suppose that the eigenvalues of Q are Aa12a222anagt0 ie A and a are the largest and smallest eigenvalues of Q We already derived that the optimal solution of P is with the optimal objective function value is 7 7 1 T 71 7 5q Q q Moreover if z is the current point in the steepest descent algorithm then 1 T T 1095 595 Qq 907 and the next iterate of the steepest descent algorithm with exact line search is M d dTQd mmadm where d 7Vfm and 1 1 de 2 M M 7 cvde a2dTQd m 7 5 Therefore T M 7 W 7 M 7 e 2 m 10W 7 1 fW 1M 2 dTQd 1 mTQx qu qTQ 1q IOE 519 NLP Winter 2009 32 1de2 2 dTQd Qm qTQ 1Qz q de2 dTQdWTQ ld 1 17 B 1 1 where dTQdWTQ ld WV 39 In order for the convergence constant to be good which will translate to fast linear convergence we would like the quantity 3 to be small The following result provides an upper bound on the value of B Kantorovich Inequality Let A and a be the largest and the smallest eigenvalues of Q respec tively Then 2 5 g M 4Aa We will skip the proof of this inequality Let us apply this inequality to the above analysis Continuing we have 1lt1 4Aa 41471 i ltAa71gt2 fzf 3 Aa27Aa27 Aa1 39 Note by de nition that Aa is always at least 1 If Aa is small not much bigger than 1 then the convergence constant will be much smaller than 1 However if Aa is large then the convergence constant will be only slightly smaller than 1 The following table shows some sample values on to A 17 the Optimality Gap by 010 Note that the number of iterations needed to reduce the optimality gap by 010 grows linearly in the ratio Aa Two pictures of possible iterations of the steepest descent algorithm are as follows IOE 519 NLP Winter 2009 IOE 519 NLP Winter 2009 34 Some remarks 0 We analyzed the convergence of the function values the convergence of the algorithm iterates can be easily shown to be linear with the same rate constant The bound on the rate of convergence is attained in practice quite often which is unfortunate The ratio of the largest to the smallest eigenvalue of a matrix is called the condition number of the matrix What about non quadratic functions If the Hessian at the locally optimal solution is positive de nite the function behaves as near quadratic function in a neighborhood of that solution The convergence exhibited by the iterates of the steepest descent algorithm will also be linear The analysis of the non quadratic case gets very involved fortunately the key intuition is obtained by analyzing the quadratic case 0 What about backtracking line search Also linear convergence The rate constant depends in part on the backtracking parameters 553 Rate of convergence of the pure Newton7s method We have seen from our examples that even for convex functions the Newton s method in its pure form ie with stepsize of 1 at every iteration does not guarantee descent at each iteration and may produce a diverging sequence of iterates Moreover each iteration of the Newton s method is much more computationally intensive then that of the steepest descent However under certain conditions the method exhibits quadratic rate of convergence making it the ideal method for solving convex optimization problems Recall that a method exhibits quadratic convergence when llekll Hm ill 70 and Her l l 7 C kin HekHZ Roughly speaking if the iterates converge quadratically the accuracy ie the number of correct digits of the solution doubles in a xed number of iterations There are many ways to state and prove results regarding the convergence on the Newton s method We provide one that provides a particular insight into the circumstances under which pure Newton s method demonstrates quadratic convergence Let denote the usual Euclidian norm ofa vector namely VuTu Recall that the operator norm of a matrix M is de ned as follows HMH mgx leH Hell 1 As a consequence of this de nition for any x g Theorem 53 Quadratic convergence Suppose fz is twice continuously di erentiable and f is a point for which Vfz 0 Suppose satis es the following conditions 0 there exists a scalar h gt 0 for which g 0 there exists scalars B gt 0 and L gt 0 for which 7 LHm for all m and y satisfying 7 mll and 7 mll IOE 519 NLP Winter 2009 35 Letx satisfy Hx7mH 6y where 0 lt 6 lt1and39y min 3 and letmN m7Hm 1Vfx en L 2 Hm 7 m 11257sz M ii HmN 7 mll lt lm 7 mll and hence the iterates converge to f L 2 HmN ll S H90 95ll2 37 The proof relies on the following two elementary facts Proposition 54 Suppose that M is a symmetric matrix Then the following are equivalent 1 h gt 0 satis es HM 1H 2 h gt 0 satis es 2 h for any vector v Proposition 55 Suppose that is twice di erentiable Then 1 W2 7 we 7 He tlt2 7 we 7 zgtdt Proof Let gtt Vfz tz 7 Then gt0 Vfz and gt1 V z and gtlt Hz tz 7 z 7 From the fundamental theorem of calculus7 we have V z Vim gt17 MW 1 gt tdt 0 1 Ha to 7 gm 2 7 mt I 0 Proof of Theorem 53 We have zN 7 f m 7 H71Vf 7 K m 7 35 Hz 1Vfm WM 1 z 7 f Hz 1J tm 7 7 zdt from Proposition 55 1 Hm 1 tm 7 7 x 7 zdt 0 Therefore 1 Huh ltHH 1H HlHmt77HlHllm7xlldt 0 1 S ll7llllH 1llO L tll7mlldt 1 7 llx7zll2llHz 1llL tdt W 7 mllzllH 1HL 2 IOE 519 NLP Winter 2009 36 We now bound Let U be any vector Then llHmv HHW HM HWWH HHWVJH llH90 HWWH h 7 7 from Proposition 54 h Hull Lll 90H ll UH h Lll 90H Hull W W W Invoking Proposition 54 again7 we see that this implies that 1 71 lt HHltzgt LMLWW Combining this with the above yields 7 7 lt 7 2 mm sz zllmiLW H which is of the theorem Because LHm 7 g 6 27h lt we have Lll QEH 5 llNl lll S lll5lll7 l l2h7LHz7zH 2M7 l which establishes 2392 of the theorem Finally7 we have L L 3L 7 7 lt 7 7 2 lt 7 7 2 7 7 27 Maw 25 H Hz ml 2 h LW 7m Hz zH 2 hi 2 Hz zH 2h which establishes iii of the theorem I Notice that the results regarding the convergence and rate of convergence in the above theorem are local7 ie7 they apply only if the algorithm is initialized at certain starting points the ones suf ciently close77 to the desired limit In practice7 it is not known how to pick such starting points7 or to check if the proposed starting point is adequate With the very important exception of self concordant functions 56 Further discussion and modi cations of the Newtonls method 561 Global convergence for strongly convex functions with a twophase Newton7s method We have noted that7 to ensure descent at each iteration7 the Newton s method can be augmented by a line search This idea can be formalized7 and the ef ciency of the resulting algorithm can be analyzed see7 for example7 Convex Optimization by Stephen Boyd and Lieven Vandenberghe7 available at httpwwwstanfordeduquotboydcvxbookhtml for a fairly simple presentation of the analysis Suppose that fz is strongly convex on it domain7 ie7 assume there exists In gt 0 such that the smallest eigenvalue of is greater than equal to u for all z and that the Hessian is Lipschitz continuous everywhere on the domain of 1 Suppose we apply the Newton s method with the IOE 519 NLP Winter 2009 37 stepsize at each iteration determined by the backtracking procedure of section 522 That is at each iteration of the algorithm we rst attempt to take a full Newton step but reduce the stepsize if the decrease in the function value is not suf cient Then there exist positive numbers 7 and V such that if WWW 2 n then Mk 7 x 7 and o if lt 7 then stepsize 04k 1 will be selected and the next iterate will satisfy HV zkH l lt 7 and so will all the further iterates Moreover quadratic convergence will be observed in this phase As hinted above the algorithm will proceed in two phases while the iterates are far from the minimizer a dampening of the Newton step will be required but there will be a guaranteed decrease in the objective function values This phase referred to as dampened Newton phase cannot take more than M iterations Once the norm of the gradient becomes suf ciently small no dampening of the Newton step will required in the rest of the algorithm and quadratic convergence will be observed thus making it the quadratically convergence phase Note that it is not necessary to know the values of 7 and 39y to apply this version of the algorithm The two phase Newton s method is globally convergent however to ensure global convergence the function being minimized needs to posses particularly nice global properties 562 Other modi cations of the Newton7s method We have seen that if Newton s method is initialized suf ciently close to the point i such that Vf 0 and is positive de nite ie i is a local minimizer then it will converge quadrat ically using stepsizes of Oz 1 There are three issues in the above statement that we should be concerned with o What if is singular or nearly singular o How do we know if we are close enough and what to do if we are not 0 Can we modify Newton s method to guarantee global convergence In the previous subsection we assumed away the rst issue and under an additional assumption showed how to address the other two What if the function f is not strongly convex and may approach singularity There are two popular approaches which are actually closely related to address these issues The rst approach ensures that the method always uses a descent direction For example instead of the direction 7Hmk 1Vfzk use the direction ekI 1Vfmk where 6k 2 0 is chosen so that the smallest eigenvalue of 6k is bounded below by a xed number 6 gt 0 It is important to choose the value of 6 appropriately 7 if it is chosen to be too small the matrix employed in computing the direction can become ill conditioned if is nearly singular if it is chosen to be too large the direction becomes nearly that of the steepest descent algorithm and hence only linear convergence can be guaranteed Hence the value of 6k is often chosen dynamically The second approach is the so called trust region method Note that the main idea behind the Newton s method is to represent the function x by its quadratic approximation qkm x

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.