### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Numerical Analysis MATH 128A

GPA 3.93

### View Full Document

## 16

## 0

## Popular in Course

## Popular in Mathematics (M)

This 120 page Class Notes was uploaded by Kavon Feest on Thursday October 22, 2015. The Class Notes belongs to MATH 128A at University of California - Berkeley taught by Staff in Fall. Since its upload, it has received 16 views. For similar materials see /class/226600/math-128a-university-of-california-berkeley in Mathematics (M) at University of California - Berkeley.

## Reviews for Numerical Analysis

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/22/15

Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Prof W Kahan Math Dept and EB amp Computer Science Dept University of California at Berkeley Abstract Begun in 1986 these course notes concern the solution of one real equation fz 0 for one real root 2 also called a real zero 2 of function fx They supplement not supplant textbooks and deal mathematically with troublesome practical details not discussed in my reprint 1979quot about a calculator s SOLVE key which should be read rst it offers easyitoiread advice about real root nding in general to anyone who wishes merely to use a rooti nder to solve an equation in hand These course notes are harder to read intended for the wouldibe designer of a rooti nder they exercise what undergraduates may learn about Real Analysis from texts like Bartle 1976 Collected here are proofs mostly short for mathematical phenomena some little known worth knowing during the design of robust and rapid root nders Almost all Numerical Analysis texts cover the solution of one real equation fz 0 for one real root 2 by a variety of iterative algorithms like x 7gt Ux for some function U that has 2 Uz as a xedpoint The best known iteration is Newton s x 7gt x fxfx Another is Secant iteration pair x y 7gt w x where w x fxxy fx fy But no text I know mentions some of the most interesting questions Is some simple Combinatorial Homeomorphically invariant condition both Necessary and Suf cient for convergence of x 7gt Ux Yes 5 Is that condition relevant to the design of root nding software Yes 6 Do other iterations x 7gt Ux besides Newton39s exist Not really 3 Must there be a neighborhood of 2 within which Newton s iteration converges if f x and x fxfx are both continuous Maybe Not 7 Do useful conditions less restrictive than Convexity suf ce Globally for the convergence of Newton39s and Secant iteration Yes 8 Why are these less restrictive conditions not Projective Invariants as are Convexity and the convergence of Newton39s and Secant iterations I don t know A3 Is slow convergence to a multiple root worth accelerating Probably not 7 Can slow convergence from afar be accelerated with no risk of overshooting and thus losing the desired root In certain common cases Yes 10 When should iteration be stopped Not for the reasons usually cited 6 Which of Newton39s and Secant iterations converges faster Depends 7 Which of Newton39s and Secant iterations converges from a wider range of initial guesses at z Secant unless 2 has even multiplicity 9 Therefore Why Use Tangents When Secants Will Do Have all the foregoing answers been proved Yes Most were proved over twenty years ago 1979 and in uenced the design of the SOLVE key on HewlettPackard Calculators Work in Progress NOT READY FOR DISTRIBUTION Page 163 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Contents l Overview page 3 2 Three Phases ofa Search 6 3 Models and Methods 7 4 Global Convergence Theory from Textbooks ll 5 Global Convergence Theory 15 6 A OneSided Contribution to Software Strategy 19 7 Local Behavior of Newton s and Secant Iterations 25 8 SumiTopped Functions 32 9 The Proj ective Connection between Newton s and Secant Iterations 36 10 Accelerated Convergence to Clustered Zeros incomplete 43 l 1 All Real Zeros of a Real Polynomial to appear 12 Zeros of a Real Cubic to appear l3 Error Bounds for Computed Roots to appear ccc Conclusion Al Appendix A2 Appendix A3 Appendix A4 Appendix A5 Appendix Citations Adobe AcrobatTM Divided Differences Brie y Functions of Restrained Variation Projective Images Parabolas Running Error Bounds for Polynomial Evaluation to appear Reader PDF les http www csberkeley edu wkahan Math 128 RealRootspdf Work in Progress NOT READY FOR DISTRIBUTION SOLVEkeypdf Page 263 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am 1 Overview Before a real root 2 of an equation fz 0 can be found six questions demand attention 1 Which equation In nitely many equations some far easier to solve than others have the same root 2 2 What method Usually an iterative method must be chosen there are in nitely many of them too 3 Where should the search for a root begin A global theory of the iteration s convergence helps compensate for a vague guess at z 4 How fast can the iteration be expected to converge A local theory helps here Convergence much slower than expected is ominous 5 When should iteration be stopped Erroranalysis helps here And the possibility that no 2 exists may have to be faced 6 How will the root s accuracy be assessed Erroranalysis is indispensable here and it can be done in more than one way The questions are not entirely independent nor can they always be answered in order If question 2 is answered by some available software that contains its own rooti nder the method it uses should in uence the answer to question 1 Question 5 may depend upon question 6 which may be easier to answer after 2 has been found Anyway these questions do not have tidy answers Instead the following notes answer questions that resemble the foregoing six and the reader must decide whether available answers pertain well enough to his own questions Different contexts may call for different answers Two contexts are worth distinguishing during the design of rooti nding software Generalipurpose rooti nders have to be designed without knowing the equations they will be asked to solve specialipurpose rooti nders are designed to solve one equation Fz p 0 for a root 2 zp regarded as a function of the parameters p over some preassigned range Generalipurpose rooti nders must be robust above all they cope with very diverse equations and with poor rst guesses at roots that need not be unique or in other cases need not exist speed matters only because a rooti nder that runs too slowly will be abandoned by impatient users before it nds a root Speed is the reason for a specialipurpose rooti nder s existence and to that end it exploits every advantage that mathematical analysis can wrest from the given expression Fx p Applicability to many such special cases justi es the inclusion of much of the theory presented in these notes Rooti nders are almost always iterative they generate a sequence of approximations intended to converge to a desired root For reasons outlined in 2 3 gives the in nite variety of iterative methods short shrift Whereas textbooks concentrate mostly upon questions of local convergence answerable often by appeals to Taylor series these notes concentrate mostly upon questions of global convergence Does global convergence theory differ from local It s a distinction with a small difference Local theories touched in 3 and 4 describe what happens and how fast in every suf ciently small neighborhood of a root this kind of theory applies to practically Work in Progress NOT READY FOR DISTRIBUTION Page 363 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am all cases A global convergence theory provides ways to tell whether a root exists whether an iteration will converge to it from afar and whether slow convergence from afar can be sped up without jeopardizing convergence to the desired root these questions have usable answers only in special cases The special cases discussed in these notes arise often enough to make their study generally worthwhile Most iterations discussed in these notes have the form Xn1 UxIQ which may seem very general but isn t really there is a sense see Thesis 31 below in which every such scalar not vector iteration is really Newton s iteration in disguise Textbooks and our 4 treat iterations whose U is a Contraction lU39l lt l throughout some domain supposed to contain the desired root and all but nitely many initial iterates Finding that domain can be as hard as nding the root and futile too because Contraction in a wide domain surrounding the root is a condition merely suf cient not necessary for convergence There is a conceptually simpler combinatorial condition necessary and suf cient for convergence from every starting point in a wide domain see Sharkovsky s Noiswap Theorem 51 below This theorem provides an invaluable Oneisided criterion by which to decide when a program must intervene to force an iteration to converge That decision may be necessitated by the intrusion of rounding errors whose worst effects can be avoided only by using appropriate criteria to stop the iteration Such criteria and other software issues are discussed in 6 Newton s iteration an xn fxnf xn and Secant iteration xn xn fxnfIxn xn1 are treated next here fI is a F irstDivided Di erence whose analogy with the rst derivative f is explained below in Appendix Al on Divided Differences Both iterations have such similar local convergence properties that they are treated together in Theorems 74 75 and 76 The weakest known global conditions suf cient for convergence are named in Theorem 82 and Corollary 83 roughly speaking they require that lf l not vary too much A connection with the classical theory of Functions of Bounded Variation is covered in Appendix A2 Both iterations have similar global convergence properties because those properties are invariants of certain plane Projective Maps that are the subject of yet another Appendix A3 Unfortunately the aforementioned weakest known global conditions suf cient for convergence are not invariant under projective maps to nd usable weaker invariant conditions remains an open problem The projective invariance of Newton s and Secant iteration is the source of an astonishing Theorem 92 which says roughly that if f reverses sign wherever it vanishes in some interval and if Newton s iteration converges within that interval from every starting point therein then Secant iteration converges too from every two starting points in that interval Of course they converge then to the unique zero of f in the interval This theorem has no converse Secant iteration can converge but not Newton s The discovery of this theorem over twenty years ago had a profound effect upon the design of rooti nders built into HewlettiPackard calculators Slow convergence of Newton s and Secant iteration to a multiple root is a problem that has received more attention in the literature than it deserves in the light of Theorem 76 which is too little known This theorem provides good reasons to expect computed values of fx to drop below the noise due to roundoff or else below the under ow threshold rapidly no matter how slowly iterates x converge so iteration cannot be arbitrarily prolonged Convergence slowly Work in Progress NOT READY FOR DISTRIBUTION Page 463 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am from afar to a simple root that appears from afar to belong to a tight cluster of roots is a problem deserving more attention The problem is not how to accelerate the iteration but how not to accelerate it too far beyond the desired root In cases covered by Theorem 101 the problem has a simple solution that roughly halves the number of Newton iterations when they converge slowly A similar solution works for Secant iteration but the details of its proof are not yet complete I have tried to prove every unobvious unattributed assertion in these notes The proofs are as brief as I could make them and not merely by leaving steps out Still the proofs should be skipped on rst reading to make doing so easier each proof is terminated by END OF PROOF To ease the location of this document s sections theorems lemmas corollaries examples they will be numbered consecutively when the notes are complete Yet to be transcribed are sections about nding all real zeros of a polynomial all zeros of a real cubic error bounds for computed zeros and running error bounds for computed values of a polynomial Meanwhile the author will welcome corrections and suggestions especially for shorter and more perspicuous proofs Work in Progress NOT READY FOR DISTRIBUTION Page 563 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am 2 Three Phases of a Search Root nding software invoked to solve fz 0 searches for a root 2 by following a procedure generally more complicated than the mere iteration of some formula X until it converges Watching such software at work when it works we can usually discern three phases Phase 1 Flailing Initial iterates X approximate the desired root 2 poorly They may move towards it or wander or jump about as if at random but they do not converge rapidly Phase 2 Converging Differences between successive iterates X dwindlei rapidly we hope Phase 3 Dithering Indistinguishable from Flailing eXcept that different iterates X differ much less from a root and may very nearly repeat Dithering is due entirely to roundoff Dithering is a symptom of an attempt to solve fz 0 more accurately than roundoff will allow Ultimately accuracy is limited by what roundoff contributes unavoidably to the computed values of fX Accuracy much worse than that should be blamed upon an inept implementation of the iteration formula X or upon some other defect in the software or else upon intentional premature termination of the iteration because its accuracy was judged adequate Judgments like this posit the eXistence of a trustworthy error estimate which is a nontrivial requirement It looks easy at rst the possession of a Straddle 7 two iterates Xltlt and Xgtgt where fX fX lt 0 7 suf ces if f is continuous to locate a root 2 between them with an error less than lX X l However the purchase of a suf ciently close straddle may cost almost twice as much computation as a simple iteration X that converges from one side unless error analysis can be brought to bear Error analysis will be discussed at length later without it dithering could waste a lot of time Converging is what we hope the chosen iteration does quickly and usually it does and when it does the search for a zero can spend relatively little time in Phase 2 Why then is so much of the literature about numerical methods concerned with this phase Perhaps because it is the easiest phase to analyze Ultimately superlinear fast convergence is rarely dif cult to accomplish as we shall see Newton s iteration usually converges quadratically Convergence faster than that is an interesting topic omitted from these notes because it reduces only the time spent in Phase 2 higher order convergence is worth its higher cost only if eXtremely high accuracy is sought We shall devote more consideration than usual to Phase 1 because it is the least understood and potentially most costly A long time spent ailing is a symptom of mismatch between the given equation fz 0 and the rooti nder chosen to solve it Footnote A Straddle is to the Navy what a Bracket is to the Army7 a pair of shots red one beyond and the other short of a target to intimidate it or to gauge its range straddle and Bracket have distinct meanings in these course notes Work in Progress NOT READY FOR DISTRIBUTION Page 663 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am 3 Models and Methods Every iterative method for solving fz 0 is motivated by a model this is a family of easily solved equations from which is drawn a sequence of ever better approximations to the given equation over a sequence of ever narrowing intervals around the desired root 2 For example the given equation may be rewritten as an equivalent equation gz hz with the same root 2 but with hx slowly varying approximately constant when x is near 2 and with gx easily inverted The last equation is turned into an iteration by solving gxn hxIQ for each new approximation Xn1 to replace the previous approximation xn to z If h39xg39x is continuous and lh39zg39zl lt 1 the iteration can easily be proved to converge to z from any initial x0 close enough to z Look at xn1zxnz hlxnzglxn1z as xn 7gt z here hl is a divided difference analogous to the derivative h39 and explained in Appendix A1 For instance take the equation 3eZ e32 It can be solved for z 3 lnz3 to construct an iteration Xn1 3 lnxn3 or for z 3eZ393 to construct an iteration Xn1 3 expxn3 Each iteration is attracted to a different root 2 Find them Why are there no more roots More generally the given equation fz 0 may be rewritten gnz hnz in a way that can change with every iteration that solves gnxn1 hnxn for xn and can depend also upon previous iterates xn1 xn2 These dependencies are motivated by a model all the same but now reinterpreted as a family of convenient curves from which is drawn a sequence of ever better approximations to the graph of the given function f over a sequence of ever narrowing intervals around the desired root 2 The wider the interval over which f resembles a member of that family and the closer the resemblance the faster the iteration derived from the model converges A substantial body of theory connects the qualities of a model to the ultimate speed of the derived iteration s convergence see Traub 1964 or Ostrowski 1973 Like most of today s texts on Numerical Analysis these notes draw little more from that theory than two items of terminology Rate and Order are two measures of the ultimate speed with which a sequence x1 x2 X3 xn may converge to its limit 2 as n 7 oo Its Rate lim inf lnlxn zln and its Order lim inf lnlxn 21 n Linear convergence has Order 1 and apositive nite Rate which means the number of digits to which XI and 2 agree grows ultimately linearly with n slower than linear convergence is almost intolerable For most practical purposes we expect Superlinear convergence with Rate 00 and Order gt 1 which means that ultimately every iteration multiplies the number of agreeing digits by Order on average Here are examples Newton s iteration Xn1 xn fxIQf xIQ approximates the graph of f by its tangent Tn at a point xn fxn that the iteration tries to move closer to z 0 by moving the point of tangency to the point Xn1 fxn1 on the graph above Tn s intersection with the x7 axis Convergence is typically Quadratic Order 2 Similarly the Secant iteration Xn1 xn fxIQflxn xn1 xn fxnxnxn1fxIQ fxn1 approximates the graph of f by its secant through two points xn fxIQ and xn1 fxn1 and replaces the latter by the point Xn1 fxn1 above where the secant cuts the xiaxis The iteration s Order m 1618 typically Work in Progress NOT READY FOR DISTRIBUTION Page 763 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am fX tangent X 7 V Newton s Xn1 NXn where NV V fVf V fX secant X 11 v Su w Secant Xn1 SXn Xn1 where Su w u fuuw fu fw Work in Progress NOT READY FOR DISTRIBUTION Page 863 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am David Muller s method ts a parabola through three points on the graph of f and replaces one of them by a point on the graph above the nearer intersection of the parabola with the horizontal axis An hyperbola with vertical and horizontal asymptotes can also be tted through three points on the graph of f and provides an iteration simpler than Muller s and better suited to nding a simple zero z close to a pole of f A pole of f is an argument 6 at which fo oo The hyperbola is the graph of Mx xn1x 6 for constants u 6 Xn1 chosen by making that expression interpolate match fx at three consecutive iterates xn xn1 xn2 Both these iterations converge typically at Orderz 1839 Given two iterates xltlt and xgtgt that straddle a signchange of f because fx fx lt 0 we may well wish to continue the iteration in such a way that straddling persists even if preserving it slows convergence The simplest way is Binary Chop this method models f by a stepfunction that disregards everything about f but its sign and in each iteration replaces either xlt or xgtgt by xV x x 2 according to signfxv so that straddling persists Regula F alsi differs from Binary Chop only by determining xV as the place where a secant through x fx and x fx cuts the horizontal axis Both methods usually converge linearly too slowly Regula Falsi can converge arbitrarily slower than Binary Chop when the graph of f is more nearly Lshaped than straight so D Wheeler s method see program F2 in Wilkes et al 1951 speeds up Regula Falsi by halving whichever of fx or fx has not been supplanted after two iterations CJF Ridder s method promoted by WH Press et al 1994 chooses H B and xA to make the expression Lx ptxxAeBX interpolate fx at x xV x x 2 and xgtgt and then retains whichever pair of x xv xA xgtgt most closely straddles the signchange of f One of the pair is always xA This method is plausible when the graph of f may be very nearly Lshaped but not necessarily monotonic Ridder s and Wheeler s methods usually converge superlinearly for the latter see Dahlquist et al 1974 Vastly many more models and iterative methods have been published Do we need all of them Perhaps not most of them converge superlinearly so they spend similar small numbers of iterations in Phase 2 Reducing these small numbers by increasing the Order of convergence is relatively straightforward if enough derivatives of f are available For instance convergence typically at Order 3 is obtained by tting osculatory hyperbolas instead of tangents to the graph of f to derive Halley s iteration Xn1 xn 2fxIQ2f xn f xIQfxnf xrg Widening the range of initial guesses from which convergence will follow is harder but worth a try when dawdling in Phase 1 indicates a mismatch between the model and the equation to be solved Acquaintance with many models improves our prospects of nding one that matches the given equation well Alternatively when possession of a software package implies the use of its rooti nder awareness of the models that motivated its rooti nder may suggest how to recast equations so as to match its models better Because all models include the straight line graph of a linear equation as a special or limiting case equations fz 0 incur fewer iterations in Phases 1 and 2 according as f is more nearly linear over a wider range around z This observation motivates attempts to recast a given equation into an equivalent but more nearly linear form A successful attempt will be described below after Theorem 82 Work in Progress NOT READY FOR DISTRIBUTION Page 963 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am The two motivations one to t a model as closely to the equation as is practical the other to linearize the equation as nearly and widely as possible may become indistinguishable in the nal analysis ofa real or complex rooti nder s performance Here is a reason for thinking so Thesis 31 Newton s Iteration is Ubiquitous Suppose that U is differentiable throughout some neighborhood Q of a root 2 of the given equation fz 0 If the iteration Xn1 UXn converges in Q to z from every starting point X0 in Q then this iteration is Newton s iteration applied to some equation gz 0 equivalent on Q to the given equation in other words UX X gXg39X and gX 7gt 0 in Q only as X 7gt z Defense gX ieXp I dXX UX with a constan of integration that may jump when X passes from one side of z to the other re ecting the fact that U is unchanged when gX is replaced by say 3gX for all X on one side of z There is no need for g39z to eXist since it need not be computed when gz 0 however the jump in the constant of integration can often be so chosen that g39X is continuous as X passes through 2 The iteration s convergence in Q to 2 alone implies rst that X UX vanishes only at X z in Q and then that X UX has the same sign as X z The opposite sign would compel the iteration to ee from z Therefore the integral decreases monotonically as X approaches 2 from either side To complete the defense we shall infer from the differentiability of U that the integral descends to w implying that gX 7gt 0 as X 7gt 2 as claimed For the sake of simpler arithmetic shift the origin to make 2 0 and write 939 for what remains of Q when 0 is removed from it This makes UXX lt l at all X in Q39 Since U390 eXists there also must eXist some constant 1 lC lt UXX lt l for all X in Q39 whence it follows that the integral IdXX UX lt another constant CIdXX 7gt w as X 7gt 0 in Q39 from one side or the other END OF DEFENSE What if U were merely continuous instead of differentiable Then g could be discontinuous at 2 like gX ifX Z 0 then 1 WX2 else X2 In general then must gzgz 0 Don t read too much signi cance into Thesis 31 It does suggests that an iteration derived from a family of curves that osculate match tangent and curvature of the graph of f more closely than tangents do could equivalently have been derived as Newton s iteration applied to a function g whose graph is more nearly linear than the graph of f around the zero 2 that g and f have in common For instance Halley s third order iteration above is Newton s applied to gX fXlf Xl But Thesis 31 does not say which derivation will be the more convenient Thesis 31 implies that most of these notes will never generalize to the iterative solution of systems of equations nor to multiipoint iterations UX x g39x391gx generally cannot be solved for a vectorivalued function g of a vector x Iterations Xn1 UXn Xn1 Xnk generally do not behave like Newton s if k 2 l so Theorem 92 will come as a surprise Work in Progress NOT READY FOR DISTRIBUTION Page 1063 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am 4 Global Convergence Theory from Textbooks The behavior of iterations Xn1 Uxn also called Discrete Dynamical Systems has become much better understood over the past half century Iterations xn Uxn xn1 xn2 xnk fall under the same rubric when rewritten as vector iterations xn Uxn in which the vector xn xn xn1 xn2 xnk Large values of k offer little advantage although they do promise ultimately faster convergence because ultimately need not arrive much sooner than the achievement of adequate accuracy by simpler means Anyway so much less is known about the global behavior of iterations with k 2 1 that we shall keep k 0 except when discussing the Secant iteration whose k l And almost all variables will be kept real Presumably the roots z of the given equation fz 0 are also roots of the equation z Uz so the desired roots lie among the xeaLpoz39nts of U if any exist The existence of xedpoints some of which may be spurious because they are not roots is anontrivial issue For example the xedpoints of Newton s iteration for which Ux x fxf x include the poles of f as well as those zeros z of f at which f z 0 plus those zeros of both f and f at which a justi able rede nition of U sets Uz z Justi cation will be tendered later Fortunately poles are repulsive and zeros are usually attractive xedpoints of Newton s iteration in general A xedpoint z Uz is called Attractive if it belongs to some nondegenerate interval Q from whose every other point x0 the iteration Xn1 UxIQ converges to z though iterates may stray outside Q while converging A xedpoint z Uz is called Repulsive if it belongs to some nondegenerate interval Q throughout which lUUx zl gt lxzl when x at z then if Q contains only every other iterate Xn1 UxIQ consecutive iterates in Q still move away from z A xedpoint can be both attractive from one side and repulsive from the other as are all the nonzero xedpoints of Ux x sin2lx its xedpoint z 0 is neither attractive nor repulsive Global convergence theory is concerned with the existence of attractive xedpoints In general the best known conditions suf cient for at least one xedpoint to exist gure in the following Lemma 41 If U maps a closed interval Q continuously into itself then Q contains at least one xedpoint z Uz Proof If neither endpoint of Q is a xedpoint of U then U must map both endpoints strictly inside Q in which case they constitute a Straddle for the equation Uzz 0 END OF PROOF Q must include its two endpoints lest the xed point lie not in Q but on its boundary If Q is in nite it must include its endpoints at 00 andor 00 and the continuity of U there must be understood in an appropriate sense U is deemed continuous at 00 if either of Ulw and lUlw approaches a nite limit as w 7 0 Similarly for 00 And Q must have distinct endpoints the lemma may be rendered inapplicable if 00 and 00 are declared equal thereby Work in Progress NOT READY FOR DISTRIBUTION Page 1163 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am turning Q topologically into a circle 0 that can be mapped continuously to itself by a rotation without a xedipoint Lemma 41 is a special case of the BrouwerSchauder theorem valid for compact convex regions Q in spaces of arbitrarily high even in nite dimension Do not misconstrue the interval Q as something introduced merely for the sake of additional inessential generality Such a misapprehension could arise from the observation that U may be extended continuously to the whole real axis and without introducing any new nite xedipoint by declaring U39 l or else U39 0 in each interval outside Q However generality is not the motive for not thus dispatching Q It is essential to the following theory because U will be assumed to satisfy convergence conditions that need not be satis ed everywhere in general yet they must be satis ed in an interval Q wide enough to support useful inferences The foregoing lemma is easier to prove than apply because given U and Q the con rmation that UQ is contained in Q is tantamount to an assertion about the extrema of U in Q why should such an assertion cost much less computation than the location of a xed point Besides the mere existence of xed points cannot ensure that the iteration Xn1 Uxn will converge to any ofthem For example in l S x S l Ux sin11x has three xedipoints z 0 and z i0736484448 all repulsive UUx has seven therein all repulsive iteration cannot converge to any of them except by an unlikely accident In general if we desired no more than to nd a xedipoint whose existence is guaranteed by the lemma s hypotheses we should proceed from those hypotheses to the construction of a xedipoint by Binary Chop guided in accordance with the following now obvious Corollary 42 If U maps a closed interval Q continuously into itself and if x in Q is not a xedipoint of U then there is at least one xedipoint z Uz in Q on the same side of x as Ux It makes Binary Chop foolproof But such is not our purpose now Our purpose is to investigate whether and how the iteration Xn1 UxIQ converges Faster than Binary Chop we hope The best known conditions suf cient for this iteration to converge require U to be a Contraction lUx Uyl lt lx yl for all distinct x and y in some interval Q Contraction U must be continuous if not differentiable with lU39l lt 1 almost everywhere in Q and its interval Q can contain at most one xedipoint z Uz Can you see why Lemma 43 If U contracts Q into itself then the iteration Xn1 UxIQ converges in Q to the xedipoint z Uz from every initial guess x0 in Q Proof outlined Iterating contraction U decreases lxnzl monotonically so the iterates have one or two points of accumulation v and w which if different would have to be swapped by U thereby satisfying 0 lt lvwl lUw Uvl lt lwvl paradoxically instead v w z END OF But a contraction need contract no interval into itself lnx for x Z l is an example Under what conditions can we ascertain that an interval Q is contracted into itself Conditions typical of the kind that appear in textbooks appear in the following lemmas Work in Progress NOT READY FOR DISTRIBUTION Page 1263 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Lemma 44 Suppose l lt U39 S 0 throughout an interval Q that includes both X0 and X1 UXO then the iteration Xn1 UXIQ converges in Q to the one Xedipoint z Uz therein Convergence is alternating Proof is left to the reader Lemma 45 Suppose 0 S U39 S u lt l for a positive constant p throughout an interval Q that includes both X0 and X1 HX01J where X1 UXO then the iteration Xn1 UXIQ converges monotonically to the unique Xedipoint z Uz in Q Proof is left to the reader Lemma 46 Suppose l lt U39 S u lt l for a positive constant p throughout an interval Q that includes both X0 and X1 HX0lH where X1 UXO then the iteration Xn1 UXn converges with decreasing an zl to the unique Xedi point z Uz in Q Proof Since U is a contraction on Q the Xedipoint z Uz is unique if it exists in Q That 2 does eXist in Q between X0 and X X1 HX01J follows from the observation that X1 UXX0 X UXO UXX0 X S u unless X1 X0 z consequently X UX X0 UX0 S 0 and therefore X UX changes sign at some X 2 between X and X0 In fact 2 lies between X and X0 X12 since XIZX0Z UXO UZX0Z gt 1 consequently z X0 X12Z X0 gt 0 which implies that z X0 X12 has the same sign as z X0 which has the same sign as X X0 To complete the proof we shall show that U contracts a subinterval of Q including X0 into itself and then invoke Lemma 43 To simplify the argument suppose that X0 lt X1 otherwise reverse the signs of X and U Now we have X0 lt X0 X12 lt z S X X1 HX01J Set w z luXzlu and v z lHXZ1H X0 22 Xo X1lp evidently X0 lt v lt z lt w lt X Now we shall con rm that UX contracts the subinterval X0 S X S w into itself First we obtain upper bounds for UX When X0 S X S v UX S UXO MXXO S X1 MVXO w when v S X S 2 UX S Uz Xz 22X S 22V w when zltXSw UXSUZJXZltZXZSW NeXt we obtain lower bounds for UX When XOSXltz UXgtXZX0 when zltXSw UXgtUzXz22XZZZwvgtX0 Evidently X0 lt UX S w too as claimed END OF PROOF The foregoing lemma is nearly the most general of its kind and yet often too dif cult to apply Dif culty arises from the possibility that u and the minimum width le Xollu of Q may chase after each other For eXample given X0 and X1 UXO and U39X0 lt l we have to make a guess at Q at least as wide as le XollU39X0 then somehow we must estimate the range of U39Q hoping it will be narrow enough to satisfy a lemma s requirements But if that Work in Progress NOT READY FOR DISTRIBUTION Page 1363 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am estimated range is too wide say if p Z U39Q is so big that x1 ux0lu lies beyond Q we must widen Q to include this point thereby perhaps increasing u and forcing Q to be widened again and so on This can go on forever for examples like Ux Nl x2 lz Wl 22 when 0 lt x0 lt zl although its iteration always converges The chase need never end because the lemmas requirements that l lt U39 S u lt l in Q merely suf ce for convergence they are not necessary For example iteration converges from every x0 to z 0 for Ux arctanx with U39z l and for Ux x tanh3x with U39z l though both examples converge sublinearly extremely slowly lxnzl Oln The foregoing three lemmas are really local convergence theorems posing as global They are applicable only in a suf ciently small neighborhood Q of a xedipoint z Uz at which lU39zl lt l in which case lxnzl ultimately decreases with every iteration converging to zero linearly like lU39zln or superlinearly if U39z 0 However nding a neighborhood to which a lemma above is applicable can be almost a hard as nding 2 Besides convergence can occur without ultimate monotonic decline in lxnzl as when Ux equotX l for this example the iteration converges to z 0 altematingly sublinearly and invariably as we shall see Apparently the global theory of iterations convergence presented in most textbooks answers questions that the designers of rooti nding software are unlikely to ask More likely designers will ask What pattern of behavior distinguishes convergent iterations from the others How can my software exploit this pattern When should iteration cease These questions are addressed in the next two sections of these notes Answers to these questions augment not supplant what has to be known by a wouldbe designer of software to solve ef ciently some narrowly speci ed class of equations for their real roots For instance polynomial equations have properties surveyed succinctly in texts like Householder s 1970 that should be taken into account by a computer program that needs good initial guesses to start an iteration towards a desired root After iteration is under way the foregoing questions answers come to the fore Work in Progress NOT READY FOR DISTRIBUTION Page 1463 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am 5 Global Convergence Theory What pattern of behavior distinguishes convergent iterations from the others This question matters to software designers because by mimicking this pattern in our rooti nding software we hope to enhance its prospects for success The pattern is slightly more complicated than a monotonic decline in anzl as n increases To suppress super uous complexity we shall try to describe only the pattem s essentials What is essential It is whatever persists after inessential changes of variables ie after homeomorphisms Consider any change from X to a new variable X XX which is continuous and invertible and therefore monotonic on the domain Q of X we shall let X XX denote the inverse change of variable also continuous and monotonic on its domain XQ Usually both changes of variable shall be differentiable too in which case X39X and x39X lX39XX must keep the same constant nonzero sign inside their domains UX changes into HX XUxX If the iteration Xn1 UXIQ converges from X0 to z Uz we eXpect Xn1 HXIQ to converge too from X0 XXo to Z Xz HZ though divergence either to 00 or to oo may have to be rede ned as convergence to in nity in case 2 is an in nite endpoint of Q or Z an in nite endpoint of XQ Besides Xedipoints and convergence what qualities must each of U and H inherit from the other independently of X Continuity Separation X lies between UX and UUX if and only if X XX lies between HX and HHX Differentiability H39X X39UXX U39xX x39X if all derivatives are nite When they eXist both derivatives H39X and U39xX have the same sign but they usually have different values eXcept at Stationary Points where they both vanish and at Xedipoints whenever z Uz and consequently Z Xz HZ then also H39Z U39z Then if both Xedipoints z and Z are nite and if the respective iterations Xn1 UXn and Xn HXn 1nan zrln 1nlU39zl z 0 Sublinear convergence has Rate zero linear convergence has a positive Rate And when this Rate is in nite then both iterations may be shown to converge with the same superlinear converge to them both converge at the same Rate lim infn gt 00 Order lim infn gt w lann 201 Z l higher Order implies faster convergence Like the foregoing qualities conditions for convergence should ideally be inheritable by each of U and H from the other By this criterion typical tethook conditions like the uninheritable bounds upon U39 in lemmas 44 to 46 above are not ideal Ideal conditions follow Theorem 51 Sharkovsky s No Swap Theorem Suppose U maps a closed interval Q continuously into itself then the iteration Xn1 UXn converges to some Xedipoint z Uz from every X0 in Q if and only if these four conditions each of which implies all the others hold throughout Q No Swap Condition U eXchanges no two distinct points of Q in other words if UUX X in Q then UX X too Work in Progress NOT READY FOR DISTRIBUTION Page 1563 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am No Separation Condition No X in Q can lie strictly between UX and UUX in other words if X UXX UUX S 0 then UX X No Crossover Condition If UX S y S X S Uy in Q then UXYXUY One Sided Condition If X1 UXo 7E X0 in Q then all subsequent iterates Xn1 UXn also differ from X0 and lie on the same side of it as does X1 Compare Corollary 42 above These conditions have been rediscovered several times since they were rst established by AN Sharkovsky 1964 1965 The proof that each implies all others is too long to reproduce fully here but elementary enough to leave to the diligent reader helped by the following suggestions Think of 9x9 as a square whose lowerilef titoiupperiright diagonal is touched or crossed at every Xedipoint by the graph of U which enters the square through its left side and eXits through its right That graph and its re ection in the diagonal touch or cross nowhere else when the NoiSwap condition holds When the No Separation condition is violated all attempts to draw both graphs must violate the NoiSwap condition too Similarly for the No Crossover condition therefore these three are equivalent conditions The OneiSided condition obviously implies No Separation and a violation of OneiSidedness can be shown soon to violate No Crossover too Thus all four named conditions are equivalent to each other though not yet proved equivalent to convergence from every starting point in Q that proof follows the neXt lemma Besides pertaining to an iterating function U the OneiSided condition is satis ed by any sequence X0 X1 X2 X3 regardless ofits provenance whose every member Xn lies on the same side of all subsequent members Xnm m gt 0 In other words that sequence is One7 Sided just when rst if any two members are equal so are all members between and after them and secondly for every integer n 2 0 no members of the sequence of differences XnHXn Xn2Xn Xn3Xn have opposite nonzero signs Note that every subsequence of a One7 Sided sequence is OneiSided too Some OneiSided sequences are Ultimately M onotom39c in the sense that all but nitely many differences XHH Xn have the same sign such sequences obviously converge perhaps to in nity Other OneiSided sequences are the subject of the neXt lemma Lemma 52 The No Man s Land Lemma If the OneiSided sequence X0 X1 X2 X3 is not ultimately monotonic then it can be partitioned into two disjoint in nite subsequences one of which ascends strictly monotonically to a limit no larger than the limit to which the other descends strictly monotonically if these limits differ the gap between them is a noiman siland containing no member of this sequence Proof outlined The ascending subsequence consists of those Xn lt Xn1 and the descending subsequence consists of those Xngt Xn1 For instance if Xm is a local maXimum and XJ39 the subsequent local minimum in the sequence whereupon Work in Progress NOT READY FOR DISTRIBUTION Page 1663 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am xm1ltxmgtxm1gtgtxj1gtxjltxj1 mltj then xm1 and xJ are consecutive members of the ascending subsequence note that One7 Sidedness implies xm1 lt xJ while xm Xm1 xJ1 are consecutive members of the descending subsequence It soon follows that each subsequence is strictly monotonic and bounded by the other END OF PROOF Return to the proof of Sharkovsky s NoSwap theorem suppose U satis es the four named conditions of his theorem on Q Then the iteration Xn1 Uxn generates a Oneisided sequence If it did not converge then according to the noiman siland lemma it would have two points of accumulation with no iterate between them and then because U is continuous it would swap them contrary to the Noiswap condition Therefore the iteration does converge I am indebted to the late Prof Rufus Bowen for pointing out Sharkovsky s work It answers easily many convergence questions that would be awkward without it Here are two examples Example 53 Suppose Ux equot l and Q is the whole real axis the iteration xn Uxn converges to z 0 from every starting point because U39 lt 0 so U has just one xedipoint and U cannot swap two points in Q Noiswap follows from the fact that the graphs of U and its inverse intersect just once which follows from the fact that e39X l lnlx cannot vanish if 1 lt x at 0 which follows after differentiation from eX gt lx Convergence is alternating because U390 l lt 0 and xn O6n because UUx x x36 END EX 53 Example 54 Suppose f is a rational function with simple real interlacing zeros and poles one of them a pole at 00 An instance is fx pxp39x whereApx is a polynomial all of whose zeros are real Another instance is fx detxI AdetxI A Hi x ziHj x 6 in which A is an hermitian matrix A is obtained from it by striking off its last row and column and the I s are identity matrices the zeros zi lie among the eigenvalues of A and the poles oj are the distinct eigenvalues of A that are not also eigenvalues of A That they interlace ie z0lt61ltz1lt62ltz2ltlt6KltzK is a welliknown theorem attributed to Cauchy We do not know the zeros zi but like Y Saad 1974 propose to compute them by running Newton s iteration Xn1 xn fxIQf xIQ Does it converge If so to what These are thorny questions considering how spiky is the graph of f and yet Newton s iteration can be proved to converge to some zero zi from every real starting value except a countable nowhereidense set of starting values from which the iteration must converge accidentally after nitely many steps to a pole oj The proof outlined below is extracted from one rst presented in my report 197939 For the proofs sake express f in the forms fx x B ZJ wjxoj lZivixzi in which the coef cients 5 wj and vi are determined as sums products and quotients of differences among the zeros zi and poles 6 by matching the behavior of fx as x approaches each pole or zero By counting negative differences we nd every wJ gt 0 and every vi gt 0 and by matching behavior at 00 we nd Zivi l Newton s iterating function now takes the forms Work in Progress NOT READY FOR DISTRIBUTION Page 1763 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am NX X fXf X except at poles oj of f B zj 2X6jwjX6j2 1 zj wjXoj2 ian 6 ZiZiViXZi2 EiviXzi2 if no zi X From these we infer easily that N maps the whole real aXis continuously into an interval whose endpoints are the outermost zeros zo and zK and every zero zi is a strongly attractive Xedi point of N because N39zi 0 and every pole oj is a strongly repulsive Xedipoint because N396J 2 and N has no more Xedipoints To conclude that the iteration always converges almost always to a zero zi we have to con rm that N cannot swap two points If N did swap X X and y at X the equations y NX and X Ny could be turned into 2i viyziXzi2 0 and 21 viXziyzi2 0 which when subtracted and divided by yX would simplify to 0 Zivi XZi2 Xzi391yzi391 yzi392 gt 0 which is impossible END EX 54 The foregoing eXample is an instance of a general algebraic decision procedure based upon Sharkovsky s Noiswap theorem Suppose an interval Q and a rational function U are given Then the question Does the iteration Xn1 UXn converge in Q from every initial X0 in Q can be decided by performing nitely many rational operations without solving any nonlinear polynomial equation U satis es the Noiswap condition if and only if the simpli ed form of the rational function 1 UUx UX UX x has no zeros in Q which are not also zeros of UX X This can be tested by removing common divisors from certain polynomials and then counting their signwhanges in Q by computing Sturm sequences Whether U maps Q continuously into itself can also be determined from certain polynomials signichanges in Q counted by computing Sturm sequences The details were worked out by R Fateman 1977 in a program written to run on the computerized algebra system MACSYMA The procedure is practical only on a fairly big computer because some of the polynomials in question can have large degrees as large as the square of the degree of the numerator or denominator of U Sharkovsky s Noiswap theorem is the simplest of a family of relationships he discovered for the properties of the Xedpoints zk Ukzk of a continuous iterating function U and of its compounds UkX UUU UX k times For instance if UB has a Xedpoint that is not a Xedpoint of U then for every integer k gt 1 there are Xed points of UM that are not Xedpoints of Um for any divisor m of k For an elementary treatment of Sharkovsky s relationships see Huang 1992 For a brief discussion of these and related results and other proofs see Misiurewicz 1997 Work in Progress NOT READY FOR DISTRIBUTION Page 1863 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am 6 A One Sided Contribution to Software Strategy Suppose an iterating function U has been chosen because its xedipoints z Uz coincides with the roots of a given equation to be solved and because the iteration X111 UxIQ is expected to converge to a root quickly When should this iteration be stopped or amended When it appears to have converged well enough or about as well as it ever will When it will converge too slowly When it will not converge How can nonconvergence be predicted A portent at least when U is continuous is a violation of the OneiSided condition in Sharkovsky s NoiSwap theorem That condition is the only one of the theorem s conditions that software can check until OneiSidedness fails or until so many iterations have been executed as must arouse suspicions that convergence will be too slow the software has no better option than to persist in the chosen iteration Xn1 Uxn How can software detect slow convergence or a failure of OneiSidedness The answer to this question at least for continuous iterating functions U is Brackets A Bracket is an ordered pair x x of arguments normally both iterates between which all subsequent iterates must lie if they are to constitute a OneiSided sequence A bracket is usually a straddle but this is not obligatory Uxx need not take opposite signs at the ends of a bracket Initially xltlt and xgtgt are set to the endpoints possibly in nite of the interval Q in which a xedipoint of U is being sought Subsequently as suggested by the noiman siland lemma xltlt is the most recent of any iterates xn that satis ed xn lt Uxn and xgtgt is the most recent xn that satis ed UxIQ lt xn if any Consequently once abracket becomes a straddle it stays a straddle Normally every iteration narrows the bracket by moving one end closer to the other Normally at least one end of the bracket converges monotonically to the sought xedipoint of U Software must cope with whatever abnormal behavior a bracket exposes For instance bracket x x need not be a straddle Ux xltlt and Ux xgtgt may have the same sign at rst because U does not map Q into itself and later perhaps because Q contains no xedipoint of U or more than one A new iterate Uxn may stray outside the current bracket perhaps because xn is too close to a strongly repulsive xedipoint or perhaps because U violates the NoiSwap condition or because U does not map Q into itself Normal behavior consistent with the noi man siland lemma may require software intervention too if the width of the bracket does not shrink fast enough as may happen because convergence is alternating but very slow or because both ends of the brachet are converging to different limits swapped by U or because one end stopped moving after the iteration s convergence became monotonic Tactics can be chosen to cope with aberrations only after they have been diagnosed For instance splitting the difference as in Binary Chop copes well with alternating slow nonconvergence a better expedient is Steffenson s which is tantamount to one step of Secant Iteration to solve Uz z 0 Occasional difference extension extrapolation helps to accelerate monotonic but slow behavior a way to do it is Aitken s A2 Process which takes Ux m z xzu to be an approximate model for unknown constants z and u determined from three consecutive iterates z m 2H Xn1 Xn1 Xn2Xn1 2xn xn1 Such expedients afford software the possibility Work in Progress NOT READY FOR DISTRIBUTION Page 1963 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am of extracting tolerable rates of convergence from iterations that would otherwise converge too slowly or not at all The programmer s options and the occasions that call for them would bewilder but for diagnostic information furnished by brackets and by Sharkovsky s theorem at least when U is continuous Diagnosis is complicated when U may be discontinuous Then a straddle may enclose a jump or pole instead of a root or xedipoint Reasons to doubt whether a pole can always be distinguished from a root by solely numerical means will be presented later Diagnosis is interesting also when Ux may be unde ned for some arguments x What should software do if an attempt to compute UxIQ produces instead an erroriindication like INVALID OPERATION In the past that has served as an excuse to abandon computation but nowadays the temptation to quit should be resisted Unless it is trapped an Invalid operation like 00 or 3 on most computers today will produce a NaN and subsequent arithmetic operations upon it will almost all propagate it It can be detected because the predicate NaN NaN is False this ostensible paradox merely con rms that NaN is Not aNumber Consequently when Uxn turns out to be NaN instead of a number the appropriate inference is that xn has fallen outside U s domain The appropriate response is to supplant xn by something else closer to xn1 and therefore presumably inside U s domain Then computation can be resumed A policy of continued computation past an invalid operation may seem reckless and sometimes it is However the opposite policy that abandons computation after any Invalid operation is tantamount to abandoning the search for an equation s root merely because the computer signaled Look elsewhere for what you seek That policy of abandonment frustrates software users who wish to solve an equation without rst ascertaining the boundary of its domain Why should its domain be much more obvious than the equation s root Except for examples contrived for classroom purposes an equation s domain is generally found by an exploration that resembles the search for a root Combining both searches by forgiving Invalid operations makes more sense than abandonment does Searching continued past Invalid operations is now the policy built into the SOLVE keys on HewlettPackard calculators starting with the hp18C Business Consultant and the hp28C see McClellan 1987 Consequently they can be used with far less fuss than other unforgiving software requires to solve dif cult equations Here is my favorite example We wish to decide whether the equation tanz arcsinz z4 0 has a positive root 2 or not Unforgiving software will fail to nd it despite repeated attempts each of which starts say Newton s iteration an Nxn whose iterating function is Nx x 14x l tan2x lW1xlx tanx arcsinx from small positive initial guesses like x0 01 For the sake of realism we must pretend not to know that the equation s domain is the interval 0 lt x S 1 Whatever its domain the iteration behaves as if doomed to move through it from left to right and escape Nx gt 1 whenever 046137 lt x lt 099964 A few such escapes followed by Invalid operations suggest fairly persuasively that no positive root 2 exists but in fact 2 09999060 From random initial Work in Progress NOT READY FOR DISTRIBUTION Page 2063 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am guesses x0 scattered uniformly between 0 and 1 Newton s iteration is more than 1000 times more likely to encounter an Invalid operation than to converge to this 2 Despite these odds the hp28C solves this equation quickly by means of a modi ed Secant iteration from any initial guesses between 0 and 1 thereby vindicating a policy of continued computation past forgiven Invalid operations Sharkovsky s Noiswap theorem contributes more than a convergence criterion to the strategy and theory of iteration It changes our attitudes Rather than focus exclusively upon conditions suf cient for convergence we also make use of criteria that tell us when an iteration may not converge unless we do something more than merely iterate As we pursue this line of thought we come to understand why successful root nding software need not always nd a root especially if none exists Satisfactory software should almost always nd a root if any are to be found and usually nd it fast and come to a conclusion soon if a root is not going to be found Deemed unsatisfactory are indecisive iterations that meander interminably Our foray into iteration theory is a search for conditions under which an iteration won t meander We ll nd some later What if the object sought is nowhere to be found Rooti nding software can cope with this possibility by nding something other than a root provided the substitution is made manifest to the user of the software An obvious candidate to supplant a zero of f that cannot be found is a local minimum of lfl However this substitution poses two challenges one for the designer of the software and one for its user The designer must devise an algorithm whose ef ciency is not too much degraded by the necessity to switch sometimes repeatedly between two tasks seeking a nonzero minimum and seeking a zero After the software has found one the user may be unable to decide which of the two has been found in some cases For example fx x 7 x 7x 2 and f x 6 x 7 x 7x DON T REMOVEPARENTHESES will be calculated exactly unblemished by roundoff on every computer or calculator built in the Western world for all x close enough to 143 4666 and therefore neither calculated value can vanish when computed in oatingipoint arithmetic since 143 is not a oatingipoint number on any ofthose machines Consequently if 2e 1000001 1 is a small positive number like roundoff in numbers near 1 no way exists to distinguish f and its derivative from f 4 and its derivative using only their values computed in oatingipoint arithmetic In other words software that nds a positive local minimum of lfl instead of a double zero deserves no opprobrium if it cannot tell which it has found from numerical values alone Discriminating between a pole and a zero across which a function changes sign can be dif cult too in certain very rare cases The numerical values of fx 1 x 7 x 7x and of Fx 1x 7 x 7x ze4 x 7 x 7x are the same although f has a pole and F azero at x 143 Work in Progress NOT READY FOR DISTRIBUTION Page 2163 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Despite a few ambiguous cases rooti nding software can describe its nd to its user suf ciently well to make the attempt worthwhile The software can deliver its latest bracket Xltlt X and to help the user interpret it an indicator that points to one of the following cases A zero 2 Xltlt Xgtgt has been found because the computed fz 0 A signireversal has been found Xltlt and Xgtgt differ only in their last signi cant digits and fX fX lt 0 Three subwases have to be distinguished Probably a zero since lfXl grows as XX XX increases from 0 Probably a pole since lfxl drops as XX XX increases from 0 Otherwise probably a jump discontinuity A local minimum of lfxl has been found Three subicases have to be distinguished Probably a double zero since lfxl grows rapidly as XX XX increases from 0 Apparently fX is a nonzero constant when X is near or between Xltlt and Xgtgt Otherwise probably a nonzero local minimum of lfXl at some X near Xltlt and Xgtgt Good rooti nding software able to present all those possibilities to its users without violating Albert Einstein s maXim Everything should be made as simple as possible but not simpler has to be more complicated to use than any single user might like and harder to design than most programmers will like Well4lesigned software is J 39 39 39 J by inputs and outputs The necessary outputs as we have seen are now obvious The latest bracket Xltlt X found in lieu of a zero and to help interpret it An integer indicator for use in an indexed branch or Case statement The inputs needed by good rooti nding software are unobvious because the equations to be solved are so diverse Equations are like canap s one leads to another Often the equation to be solved has the form fz p 0 with a parameter p that will take several values for each of which a root zp has to be computed For some equations the derivative 6fX p6X is easy to compute for others dif cult Often the equation has more than one root some users seek all the roots other users wish to avoid all but one root Sometimes high accuracy is desired often not Only a cluttered menu can cater to all tastes To promote parsimony I offer here my suggested list of inputs to good rooti nding software The name of the program that computes either fX p or else fX p 6fX p6X One or two initial guesses X0 X1 to start the search for a root 2 of fz p 0 An initial bracket Xltlt X to constrain that search It can be 00 00 A place for optional parameters p to be passed to the named program f Initial guesses are essential inputs even if brackets are supplied because for example when a root zp is plotted as a function of a slowly changing parameter p the old value of zp is often a good rst guess at the new zp The program that provides initial guesses should be able to nd arecord in SAVEd or static variables ofthe old p and zp for use when the new p is not too different 626p 6f6p6f6XlX Z usually helps too Work in Progress NOT READY FOR DISTRIBUTION Page 2263 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am In programming languages that allow argument lists of variable lengths the parameters p can be the rooti nder s last arguments and then can be passed verbatim to the named program f as its last arguments thereby avoiding unnecessary prejudice against parameters of mixed types arrays lists pointers procedures strings integers oatingipoint The conveyance of optional parameters p has to be fast not encumbered by excessive def referencing overheads because the rooti nder invokes f many times for each computation of zp This kind of computation either the inversion of a given function fz p or the conversion of an implicit de nition fz p 0 to an ostensibly explicit reference to the solution zp is the rooti nder s most frequent application and deserves software engineers attention Conspicuous omissions from my list deserve explanation The list includes no upper limit upon the number of iterations There are three reasons to omit it First such a limit is dif cult to choose might the search have succeeded had it been allowed two more iterations Second abandoning a search prematurely may be justi ed after the expiry of some preassigned quantum of time worth more than the root being sought but f can take longer to compute for some arguments than for others so a stopping criterion should count clockiticks not iterations Third by using brackets good software need never get stuck in an interminable sequence of iterations besides as we shall see in the course of developing the theory below well4lesigned software can practically always ensure that f becomes negligible after a moderate number of iterations no matter how slowly they converge By stopping after f becomes negligible or else after the clock runs out we can can omit iteration counts from our stopping criteria Also conspicuously absent from my list of inputs are two tolerances to serve in stopping criteria one for the negligibility of f and a second for the negligibility of the difference between consecutive iterates Such tolerances will be chosen cavalierly if they must be constants chosen in advance Chosen properly they generally depend upon the same arguments as f depends upon therefore these tolerances should be computed inside the program that computes f For example consider fx xl 2x66x220x495x792x924x792x495x220x66xl 2xl and pretend not to notice that this is an unfortunate way to compute xl12 Error analysis reveals that the difference due to roundoff between fx and its computed value must be smaller than roughly Afx l2 lxlel 111 2e but not often enormously smaller Here 2e l000001 l is the roundoff threshold for the computer s oatingipoint arithmetic typically 2e l252 2221016 for 8byte oatingipoint For arguments x near the zero 2 l of f its error bound Af m 551012 is not enormously bigger than observed errors almost as big as 21013 in computed values of f How can someone be expected to guess either constant 551012 or 21013 in advance Computing or guessing a tolerance Af for the negligibility of f within the program that computes f lets Af serve in a simple way to stop the search for a zero as soon as f becomes negligible Whenever the computed f would be no bigger than Af return 0 in place of f This immediately stops the rooti nder at what it thinks is a zero Techniques for computing Af include Running ErroriAnalysis and Interval Arithmetic both described in atext by Higham 2002 with ample references to the literature These techniques can add considerably to the time Work in Progress NOT READY FOR DISTRIBUTION Page 2363 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am needed to compute f alone so they should not be employed indiscriminately The subprogram can record in SAVEd or static variables the last few arguments at which f was computed and compute Af only at new arguments close enough to an old one that stopping is a plausible possibility In any event iterations prolonged much beyond the time when ifl S Af will waste time dithering so computing Af too often may waste less time than not computing Af at all My list of inputs also omits atolerance Ax for the difference between consecutive iterates or the width ix x i of a bracket or straddle because the use of Ax to stop iteration lends itself too easily to misinterpretation The clear intention is to stop when the iteration has come within iAx of the desired zero 2 and that is what happens when convergence is so fast as it usually is that ixn xni is rather bigger than ixn zi but then little is gained by stopping the iteration before ifl S Af Only when convergence is slow can Ax be used to stop iteration in time to save much time but then this stopping criterion becomes treacherous If convergence is slow because 2 is a multiple zero see 10 on Accelerated Convergence to Clustered Zeros below then ixn xni can stay arbitrarily smaller than ixn zi even though ifxn1i usually plunges below any practical threshold Af fairly soon see Theorem 76 then not much is gained by stopping sooner say when an1 xni S Ax beyond the illusion that an1 zi S Ax too If roundoff interferes severely with convergence not even a straddle x x can be trusted to contain 2 not even approximately For example recall fx x l xl12 above The uncertainty f in f propagates into an uncertainty iAf112 in the computed zero zm l 39 for 8byte floating7point arithmetic carrying the equivalent of about 15 sig dec the computed z is uncertain in its second decimal after the point In fact root7 nders frequently stop with a straddle xw xltlt whose ends differ only in their 13th decimal or beyond but which both differ from 1 by more than 007 How could a tolerance Ax be chosen meaningfully in a case like this Generally an appraisal of uncertainty in a computed zero 2 of f begins with an estimate of uncertainty Af in the computed value of f After that uncertainty in z is either trivial or very dif cult to ascertain see Higham 2002 Including atolerance Ax among the root7 nder s inputs to stop iteration sooner deceives users occasionally while contributing little to speed and less to error7analysis in my experience so I have omitted it from my root7 nding software Other programmers think otherwise Rather than argue longer here about where outside the root7 nder error7analysis should play its role I prefer to develop root7 nding iterations that nd roots fast enough to render early termination before ifl S Af of the iteration uninteresting Still if an iteration s convergence is normally superlinear and never worse than linear here is a strategy that may save an iteration or two if monotonic convergence shrinks brackets too slowly Suppose Di erence Qua enls xk 7 xQxk 7 xk71 7 0 as k 7 00 Provided while roundoff is insigni cant these quotients will constitute a decreasing sequence as xk 7 z after L xk 7 xQxk 7 xk71 lt l we can soon deduce that the error iz 7 xk i S ixk 7 xki Ll7L Therefore iteration can be stopped after at least two or three consecutive difference quotients all less than 1 have strictly decreased to a latest difference quotient L small enough that ixk 7 xkiLl7L lt Ax If this ever happens xk can be delivered with a reasonable expectation that ixk 7 zi lt Ax and without having to compute fxk1 though checking that it is negligible would be prudent Don t omit the divisor 17L lest iteration be stopped far too soon when convergence is slowly slowing Work in Progress NOT READY FOR DISTRIBUTION Page 2463 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am 7 Local Behavior of Newton s and Secant Iterations The two bestiknown iterations for solving a given equation fz 0 come from approximations to the graph of f by linear graphs one a tangent and the other a secant They have the following iterating functions NX X fXf X for Newton s iteration Xn1 NXIQ and SX y X fXflX y Sy X for Secant iteration Xn1 SXn Xn1 See the AppendiX on Divided Differences for an eXplanation of the rst Divided Difference flog y fx fy x y if y x f X if y X Programmers can handle 00 in these formulas by stopping both iterations as soon as fXIQ 0 and otherwise by perturbing Xn slightly whenever Xn Xn1 during Secant iteration For a mathematician the limiting value SX X NX is the obvious eXpedient Not so obvious is how to rede ne Nz when f z fz 0 because then NX might oscillate too wildly to approach a limit as X approaches 2 as happens for the eXample fX IOXt sin2lt dt None the less rede ning Nz 2 whenever fz 0 can be justi ed by the neXt lemma Lemma 71 Suppose f is nite throughout some neighborhood of a zero 2 of f and NX approaches a limit as X 7gt z Then NX 7gt z so de ning Nz z conserves the continuity of N near 2 whenever possible Proof If necessary shrink the neighborhood around 2 to eXclude any other point at which N is unde ned or in nite then this neighborhood eXcludes every zero of f39 eXcept perhaps 2 and by Rolle s theorem eXcludes also every zero of f other than 2 Consequently the derivative lnlfXl39 f XfX lXNX must be nite throughout this neighborhood eXcept at X z Therefore lnlfl is eligible for an application of the Mean Value Theorem of the Differential Calculus to its rst divided difference for any distinct V and w on the same side of z in this neighborhood some X between V and w must satisfy lnfVfwVw lnlfVl lnlfwlV w f XfX lX NX Now suppose for the sake of argument that NX 7gt L at 2 as X 7gt z we shall infer a contradiction For all distinct V and w close enough to z and much closer to 2 than L is but not separated by z we would nd lnfVfwVw lX NX m lzL at some X between V and w The last appr0Ximation could be kept as close as we please by keeping V and w close enough to z But then by Xing one of V and w and letting the other tend to z we would infer that lnlfzl is nite so 2 could not be a zero of f But it is therefore L z W Now that NX and SX y are de ned properly and practically always continuous around the zero 2 we turn to their local convergence properties Their convergence to a simple zero 2 is typi ed by their behaVior when fX XzX6 at l for this eXample NX z Xz26z and SXy z Xzyz6z Simple computations con rm rst that Newton s iteration Xn1Z6Z XnZ6Z2 converges quadratically to z from every X0 closer to 2 than the pole 6 is and second that Secant iteration Xn1Z6Z Xnz6z Xn1z6z Work in Progress NOT READY FOR DISTRIBUTION Page 2563 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am converges at order 1W52 to z from a wider range of starting iterates X0 and X1 satisfying XOZ BJS lX1zl 5391 lt 622 Orders of convergence different from these are uncommon for the functions f typically encountered in practice as we shall see Typical or not these iterations local convergence to a zero 2 depends upon how f behaves in the neighborhood of z What kind of behavior guarantees convergence The graph of f has to resemble its tangents or secants closely enough in the sense that uctuations in the derivative f39 have to stay suf ciently small compared with f39 How small is suf ciently small It s not obvious yet The rst hypotheses that come to mind do not suf ce Non Theorem 72 Suppose f X and NX X fXf X are continuous at every X in some open neighborhood Q ofazero z of f Then it seems at least plausible that Newton s iteration Xn1 NXn should converge to z from every initial X0 in Q close enough to z but it ain t necessarily so if fz 0 CounteriEXample A function fX will be contrived with these properties f X and NX are continuous everywhere fX gt 0 for all X at 0 and z fz f z Nz 0 However around 2 every open neighborhood Q no matter how small contains in nitely many closed subintervals all of positive width from each of which Newton s iteration tends to twocycle jumping back and forth across 2 forever instead of converging to z The construction of this perverse f begins with an integervalued stepifunction kX IntegerNearest lnleln2 and a quartic polynomial qX 1x 13 9V2x 13 1 3V8x 14 monotone increasing over lNZ S X S WZ This q meets the following speci cations ql 2 q39l 1 qquotl 0 q2158 1 4ql2 q392 12 8 2q39l2 Note that lNZ S 2100le S WZ note too that kX is ambiguous when X is a halfiinteger but then either choice k X i 12 is acceptable Finally de ne f0 f 0 0 and fX signX q 2100le 4kx for X at 0 The continuity of fX and of f X q39 2100le 2kx are easily con rmed along with the identities fX r x r 2kltxgtlxl 4kltxgt and fX f X r 2kltxgtlxl 2kltxgt gt 0 for Xe 0 The ranges of values taken by lfXlX2 and by f XXl over all X at 0 are the same respectively as the ranges of qXX2 and q39XX over lNZ S X S WZ so as X 7gt 0 we nd lfXl S 3X2 and f X S 12le con rming continuity at X 0 And NX X fXf X N 2100le 2kx is continuous there too because lNXl S 2le similarly The design of fX ensures that NX X and N39X 0 whenever X i2k for every integer k moreover 2k09935 lt lNXl lt 2kl0064 whenever 2k09935 lt M lt 2kl0064 so from any X0 in those intervals Newton s iteration tends rapidly to a twowycle 2k lt7gt 2k as claimed Numerical eXperiments suggest that such a twoicycle though with a large negative k is the likeliest outcome of iteration from a randomly chosen X0 END OF COUNTERiEXAMPLE Work in Progress NOT READY FOR DISTRIBUTION Page 2663 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am In general the convergence of Newton s and Secant iterations cannot be taken for granted Their local convergence depends upon whether as x 7gt z and y 7gt z the limiting values of certain rst divided differences like NIxz Nx Nzxz Nx zxz 7gt N39z and slxzy Sxy szyxz Sxy zxz a asx ywx I exist and are small enough In particular convergence is superlinear if these derivatives vanish because then lxn zllxn zl 7gt 0 as the iterations converge also the Order of convergence depends then upon whether limiting values exist for certain second divided differences Nllxzz NIxz N39zxz Nx zxz2 and sllxzyz slxzy slxzz yz Sxy zxzyz from which bounds for quotients Xn1 zllxn 2 2 and Xn1 zlKxn zxn1 zl respectively can be obtained Such bounds will be obtained from rst and second derivatives and divided differences of f by invoking recondite identities like Identities 73 fSu w Su w u Su w w fIISu w u w This includes the limiting case fNv Nv v2 fIINv v v Taking fz 0 into account yields the identity Sxy zxzyz fIIxyzfIxy and its limiting case Nx zxz2 f fIIxxzf x The identities proofs are entirely mechanical and left to readers who have reviewed the notation and formulas in the rst two pages of the Appendix on Divided Differences Conditions suf cient locally for convergence have been found in two ancient theorems of which at least one applies in almost all practical situations The rst theorem is as old as Taylor series Theorem 74 Suppose f is continuous throughout some neighborhood Q of a zero 2 of f at which f z 0 Then N39z0 therefore Newton s iteration converges superlinearly to z from every initial x0 close enough to z Similarly Secant iteration converges superlinearly to z from every initial x0 and x1 close enough to z If fquot exists and is bounded throughout Q then Nquotz fquotzf z and the convergence of Newton s iteration is at least quadratic Order 2 and the convergence of Secant iteration has Order at least 1 W52 1618 Proof As u 7gt z and w 7gt 2 independently the continuity of f39 carries fIu w 7gt f z Consequently Nx zxz f x flx zf x 7gt 0f z 0 as x 7gt z and so N39z 0 as claimed whence Newton s iteration converges superlinearly Similar reasoning shows that Sxy zxz fIxy fIxzfIxy 7gt 0 as x 7gt z and y 7gt z so Secant iteration converges superlinearly too Work in Progress NOT READY FOR DISTRIBUTION Page 2763 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am When fquot exists and is bounded some constant Cgt lfquotxf zl throughout Q Therefore Nx zxz2 fllxxzf x lies between iC for all x close enough to z and therefore lxn1 zxn z2l lt C if x0 is close enough to z convergence is at least quadratic as claimed And Sxy zxzyz fllxyzflxy also lies between iC for all x and y close enough to z so lxn1 zxn zxn1 zl lt C if x0 and x1 are close enough to z thereby vindicating the claimed Order of convergence here is an outline of how that works cf Ostrowski 1966 or Dahlquist et al 1974 For that constant C gt lfquotxf zl throughout Q let Dn lnlC xn zl then the Secant iteration s lxn1 zxn zxn1 zl lt C means that Dn1 gt Dn Dn1 gt 0 if x0 and x1 are close enough to z Next Dn1 gt FnDl Fn1D0 by induction where the Fibonacci numbers Fn Fn1 Fn2 911 Q39n391Q 19 for another constant C 1 V52 1 lC Thus Dn approaches 00 at least as fast as some multiple of CH END OF PROOF Continuity of f in Theorem 74 cannot be replaced by mere existence of f and its consequent Darboux Continuity lest N oscillate violently for examples like fx IOX sin2ltdt whose f0 0 and fO 12 In general a function perhaps too wildly oscillatory to be continuous is called Darboux Continuous if among the values it takes on every closed subinterval of its domain lie all values between those taken at that subinterval s ends Every derivative has that property For more about Darboux Continuity see Bruckner and Ceder 1965 The ultimate speeds of convergence of Newton s and Secant iteration should not be compared by considering only their orders of convergence As many a textbook points out nowadays the two iterations yield correct decimal digits ultimately at about the same rate if the computation of the derivative f39 too adds about 44 to the time taken to compute f alone If f costs much more than that Secant iteration goes faster in the likeliest cases But Theorem 74 says nothing about the iterations speeds when fz 0 in which case a different approach is needed Theorem 75 Suppose lf xl increases as x moves away from 2 through some neighborhood Q on one side ofa zero 2 of f Then 0 lt Nx zxz lt l and so Newton s iteration converges monotonicallyto z from every initial x0 in Q Similarly 0 ltSxy zxz ltl for all x and y in Q and so Secant iteration converges monotonically to z from every initial x0 and x1 in Q In other words this theorem s hypothesis is that the graph of fx is convex towards the xiaxis as is the case for example when f fgt 0 inside Q Theorems like this appear in many texts for instance Ostrowski 1960 et seq ch 9 and 10 and Dahlquist et al 1974 p 225 Texts written in France attribute theorems like this to Dandelin andor Fourier as if it had not been geometrically obvious before them Let the reader compare the limpidity of his own proofibyi pictures with the turgidity that follows Work in Progress NOT READY FOR DISTRIBUTION Page 2863 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Proof Regardless of whether f z 0 the growth of lf Xl as X moves away from 2 implies that fX can t reverse sign and therefore 0 lt fX f yf X lt l at that y strictly between 2 and X where f y fIXz Therefore 0 lt NX zXz fXfIX zf X lt l for every X at z in Q This implies that the iteration Xn1 NXIQ converges monotonically to a limit between 2 and X0 inclusive from every initial X0 in Q Where is that limit Since fXnf XIQ Xn Xn1 7gt 0 and lf XIQl S lf X0l so does fXIQ 7gt 0 whence Xn 7gt 2 as claimed Similarly SXy zXzyzfIIXyzfIXy for all X and y in Q from one of Identities 73 what we do with this depends upon which of X and y lies closer to z If X lies strictly between y and 2 then SXy zXz flyX fIXzflyX if y lies strictly between X and 2 then SXy zXz yzXz fIXy flyzfIXy Either way the quotient in question lies strictly between 0 and l so the Secant iteration s Xn1 SXn Xn1 converges monotonically to some limit between 2 and the closer of any two starting iterates X0 and X1 in Q Where is that limit Since fXnfIXn Xn1 Xn Xn1 7gt 0 and lfIXn Xn1l S lf X0l so does fXIQ 7gt 0 whence Xn 7gt 2 as claimed END OF PROOF For practical purposes Theorems 74 and 75 tell us to eXpect Newton s and Secant iteration to converge ultimately superlinearly or monotonically or both if started close enough to z Alas the speed of convergence is not mentioned in Theorem 75 and for good reason its conveXity hypothesis is compatible with arbitrarily slow convergence For eXample when fX lem for any constant m gt 1 Newton s iteration yields Xn l lmn X0 convergent arbitrarily slowly for m big enough however fXIDKXO l lmmnlt e39n tends to 0 quickly When m is a negative constant tiny enough fXIQfX0 tends to 0 arbitrarily slowly although Xn diverges to z oo quickly Both XI1 and fXIQ converge arbitrarily slowly if m eXceeds 12 by little enough but then the conveXity hypothesis is violated What light do these eXamples shed upon the general case The case m gt 1 turns out to be typical of what happens when the graph of fX is conveX towards the XaXis and XI converges to a nite zero 2 of both f and f39 Theorem 76 Under the conveXity hypothesis of Theorem 75 the iterates Xn may converge to z arbitrarily slowly though monotonically but fXn tends monotonically to 0 at least so fast that 2112nfXn2 S fXO2 X0 ZX0 X1 The 2 in 2n fXIQ cannot be replaced by a bigger constant since fXn1fXn 7gt 2 when Secant iteration is applied to the eXample fX X eXplX with X0 gt X1 gt 0 An eXample fX appropriate for Newton s iteration is too complicated to reproduce here Proof For de niteness restrict attention to nonnegative functions fX and f X increasing over an interval 2 S X S X0 gt z and for Secant iteration suppose too that X1 lies inside that interval Theorem 75 implies z lt Xn1 lt Xn 0 fz lt fXnH lt fXn and 0 S f z lt fXn1 lt f Xn without constraining the rapidity with which Xn 7gt z Given any such sequence Xn convergent monotonically downwards to z no matter how slowly convergent do conveX functions fX Work in Progress NOT READY FOR DISTRIBUTION Page 2963 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am exist from which Newton s or Secant iteration would have generated that sequence of iterates To answer this question a sequence of values fn and f n will be derived from xn and then a continuously once differentiable convex function fx satisfying fxn fn and f39xn f n will be constructed out of parabolic arcs this is a function from which Newton s or Secant iteration generates the given sequence of iterates xn Consider Secant iteration rst because it is easier The values fn will have to satisfy xn xn xn xn1fnfn fn1 which xes fn n1 xn xn1xn1 xn recursively for n l 2 3 starting from any arbitrarily chosen f0 gt 0 Since x0 gt x1 gt x2 gt gt xn gt xn also f0 gt f1 gt f2 gt gt fn gt fn gt 0 and less obviously 0 lt fn 39 fn1 Xn 39 Xn1 Xn 39 Xn1 Xn 39 Xn2fnl 39 fnXnl 39 Xn lt fnl 39 fnXnl 39 Xn Therefore leeway exists to choose a positive descending sequence of values f n satisfying fn fn1xn xn1lt f n lt fn1 fnxn1 xn lt f n1 for n l 2 3 After choices for all values fn and f n have been assigned fx is de ned in each subinterval xn S x S xn1 as the function whose graph is a convex parabolic are subject to the constraints fxn fn lt fxn1 fn1 and fxIQ f n lt f xn1 f n1 The existence ofthis parabola its axis need not be vertical is the gist of Lemma A4l in Appendix A4 Parabolas The triangle QRS in that lemma has Q at xn fr R at xn1 fn1 and sides QS and RS with slopes f n and f n1 respectively The are lies inside the triangle and joins Q to R Taken together all such arcs make up the graph of a function fx over the interval 2 lt x S x0 This fx is convex and continuously once but not likely twice differentiable What remains to be proved is that this fx 7gt 0 as x 7gt z it will be proved later A different fx is needed for Newton s iteration whose descending iterates xn determine all quotients fnf n xn xn gt 0 but leave the values fn and f39n partially arbitrary Let us choose any positive f0 and any positive fn lt fn1xn xn1xn1 Xn1 recursively for n l 2 3 thereby determining also f n fnxn Xn1 Obviously 0 lt fn lt fn1 less obviously 0 lt fn39 fnlXn39 XnH 1 39 fnlfn fVn lt Fn fnXn39 Xn1 lt fnl 39 fnXnl 39 Next de ne fx in each subinterval xn S x S xn1 to be the function whose graph is a convex parabolic are subject to the constraints fxn fn lt fxn1 fn1 and f xIQ f n lt f xn1 f n1 as before Once again all such arcs make up the graph of a convex and continuously once but not likely twice differentiable function fx over the interval 2 lt x S x0 What remains to be proved is that this fx 7gt 0 as x 7gt z What remains to be proved not just for the functions f constructed above but for every f that satis es the theorem s convexity hypothesis is that the values fxn tend to fz 0 faster than the terms of a geometric progression with common ratio l2 Attention is still restricted to nonnegative functions fx and f39x increasing over the nite interval 2 S x S x0 gt z and for Work in Progress NOT READY FOR DISTRIBUTION Page 3063 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Secant iteration x1 lies inside that interval Now the abbreviations fh fxn and FH f xn stand for values computed during the iteration and because 2 lt xn lt xn1 they satisfy both 0 lt fn lt fn1 and 0 lt f n lt fn1 fIQxIH x1 lt f n1 the latter because f is increasing Consider Secant iteration xn xn xn xn1fnfn fn1 rst because it is easier It has 2fn fn1 21 xn1 xIQxn xn1 S 1xn xn1xn1 xn Newton s iteration Xn1 xn fnf n has fnxn Xn1 f n lt fn1 fIQ xn1 x1 from which follows again 2fn fn1 lt 21 xn1 xIQxn xn1 S 1xn xn1xn1 xn For both iterations repeated multiplication implies 2n fnfo2 S xn xn1x0 X1 now sum over n END OF PROOF Theorems 74 75 and 76 are best regarded as contributions to local convergence theory since they say too little about convergence from afar Monotonic convergence is what disquali es the global pretensions of the latter two although their convexity hypothesis might hold in a wide neighborhood Q More often the rst few if not all iterates of a convergent iteration approach 2 nonimonotonically in which cases the convexity hypothesis can hold in at most a bounded domain Therefore theorems 75 and 76 unable to discriminate between noni monotonic convergence and interminable meandering are too often applicable only locally For example if f is a cubic polynomial monotonic over anoni nite including 00 or oo or both interval Q but not convex thereon the iterations cannot meander in Q but will either escape from it or converge to a zero of f therein this follows from Theorem 82 below not from theorems 75 and 76 above On the other hand if f is a quintic polynomial monotonic over a noni nite interval Q but not convex thereon Newton s iteration can meander in Q forever fx 5x5 l8x3 45x is an instance with all the real axis for Q and with f 2 1584 but alternate iterates xn approach 1 and 1 if ever 1 S lxnl lt 1076570927 What distinguishes monotonic cubics from other 39 I 39J 39 39 The J39 quot quot will become clear later when we deduce Theorem 82 from hypotheses that are the weakest and thus most widely applicable conditions now known to suf ce for convergence Work in Progress NOT READY FOR DISTRIBUTION Page 3163 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am 8 Sum Topped Functions A function SumiTopped on an interval Q is by de nition a function qX that lies between 0 and qu qw inclusive throughout every closed subinterval u S X S w of Q The term SumiTopped has been born out of desperation for lack of a better term Also lacking is a neat characterization of sumitopped functions Some of their properties are obvious for instance functions sumitopped on an interval Q are also sumitopped on every subinterval of Q but not vice versa If q is sumitopped on Q so is pq for every real constant p positive or negative Monotonic functions that do not reverse sign are sumitopped but nonimonotonic sumitopped functions exist too Here are some examples plot them to illustrate their diversity Any quadratic q on an interval none of which falls between two simple zeros of q 3 coseX on the whole real XaXis lX sinXl on the whole real XaXis ll X2 on the interval lw S X S w for any w gt 0 and 200X X u on the positive real XaXis for any constant p S 2 2113 2V3 063 Some properties of sumitopped functions are almost obvious Lemma 81 A function q sumitopped on Q cannot reverse sign by taking both positive and negative values therein and if qz 0 at some z in Q then lqXl is anon4lecreasing function of lXzl while X is in Q Proof If qu qwS 0 for some u and w in Q then since qu qw qX qXZ 0 for all X between u and w inclusive setting XW implies that qu qw0 And if qz0 at some z in Q then because qy must lie between 0 and qz qX qX for all y between z and X we infer that 0 S qyqX S 1 if qX 0 therefore qX may vanish throughout some closed subinterval of Q but must then become nonzero and monotonic as X departs from that subinterval END OF PROOF In the light of this lemma the unobviously sumitopped functions q on Q are the non7 monotonic ones that retain the same nonzero sign throughout suppose q gt 0 to simplify the following eXposition Whether a continuous nonimonotonic q is sumitopped is determined solely by the values it achieves at its local eXtrema maXima and minima in Q Suppose all its local minima are qJ qvj gt 0 and all its local maXima strictly inside Q are Qi qmi for v0 lt m1 lt v1 lt m2 lt lt mK lt vK all in Q Then q is sumitopped if and only if every Q S minjlti qj minJ 2 i qj This decision procedure can be inconvenient as it is for large K and gets worse when q has in nitely many eXtrema or is discontinuous Among sumitopped functions the easiest to recognize are those of Restrained Variation which are eXplained below in appendiX A2 Before digressing to that eXplanation let us see how sumitopped functions gure in Newton s and Secant iteration Theorem 82 A Sum Topped Derivative Suppose f39 is continuous and sumitopped throughout a closed interval Q Then Newton s iteration Xn1 Xn fXnf Xn started from any X0 in Q either converges in Q to the zero z of f or leaves Q the iteration cannot meander in Q endlessly Work in Progress NOT READY FOR DISTRIBUTION Page 3263 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Proof At the cost perhaps of replacing f by f we may assume that f 2 0 throughout Q since Lemma 81 prevents P from reversing sign Newton s iterating function NX X fXf X is therefore continuous except possibly where f X vanishes This possibility must be dispatched rst later we shall deal with cases in which f never vanishes in Q The theorem s hypotheses allow f andor f to vanish at most once in Q To see why observe rst that f must vanish between any two distinct zeros of f Next suppose f39u 0 Then Lemma 81 implies that there must be some closed subinterval 11 S X S 1391 of Q throughout which f X 0 and fu fX fu while 11 S X S 1391 but f X is positive and decreasing and fX lt fu while X lt 11 in Q and f X is positive and increasing and fX gt fu while X gt 1391 in Q Only in the subinterval it may be a single point u need N be rede ned Wherever f39z 0 fz de ne Nz 2 39 Wherever f39u 0 7E fu de ne Nu signfu 00 as if f u 0 At most one of these two cases can arise In the rst case Theorem 75 guarantees the convergence of Newton s iteration to an endpoint of the subinterval of Q wherein fz 0 The same theorem dispatches the second case 39 too because so long as fXn has the same nonzero sign as fu iteration must move monotonically in the direction that decreases lfl until one of the following three eventualities occurs i An iterate escapes from Q perhaps by jumping to ice or else ii Iterates stay in Q and converge monotonically to 00 or oo in Q or else iii An fXn reverses sign and subsequent iterates reverse course and converge to z Only eventuality iii delivers a nite root 2 of fz 0 in Q and f z 0 there Whether eventuality ii delivers a root depends upon whether the limit to which lfXl declines as X approaches that in nite endpoint of Q at which f vanishes is zero Eventuality i must arise also when neither f nor f vanishes in Q since then too the iteration must move monotonically in a direction that decreases l Now only one case is left to consider Suppose henceforth that f39 gt 0 throughout Q and fz 0 lt f z at some 2 in Q Now N must be continuous in Q and its sole Xedipoint therein is z Nz If nitely many iterates lie on one side of z and in nitely many on the other side in Q then the iteration must converge ultimately monotonically because eXcept for nitely many initial iterates every subsequent iteration with Xn at z maintains 0 S Xn1 zXn z lt l and 0 S fXn1fXn lt l as is easily con rmed of course the iteration converges to z But if the iteration neither escaped from Q nor converged to z as we shall assume henceforth for the sake of argument in nitely many iterates would have to fall on both sides of z which would have to lie strictly inside Q We shall complete the proof of theorem 82 by demonstrating that its hypotheses are not consistent with the last assumption By virtue of Theorem 74 the iterates could not come arbitrarily close to z they would all have to stay at least some positive distance away from z Let u and w be the iteration s points of accumulation nearest 2 on both sides say u lt z lt w Then every open neighborhood of u would contain in nitely many iterates as would every open neighborhood of w but any closed interval strictly between u and w could contain at most nitely many iterates Since Nu Work in Progress NOT READY FOR DISTRIBUTION Page 3363 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am would be another point of accumulation and Nu gt u we would nd Nu Z w too Similarly for Nw S u Let s scrutinize the last two inequalities they would imply respectively that 0 lt wuf u S fu and 0 lt wuf w S fw Adding them would produce wu f u f w S fw fu IuW f xdx which simpli es to 0 z IuWru f w f x dx But the theorem s hypotheses force the integrand to be nonnegative and continuous so it would have to vanish at every x between u and w inclusive which would force f u f w 0 contrary to the supposition f gt 0 made when this case began to be considered END OF PROOF In showing that N cannot swap distinct points u and w of Q the foregoing proof resembles an application of Sharkovsky s NoSwap Theorem but the resemblance is super cial for two reasons First the theorem s hypotheses merely suf ce for its conclusion they are not necessary Second N was not required to map Q to itself determining whether such a requirement has been ful lled can be harder than solving the given equation fz 0 An easier expedient is to incorporate whatever may be known about f and Q into a bracketing procedure that decides whether an excursion out of Q should stop the iteration or be returned to Q After that the only hazard to prevent is the possibility that left alone the iteration may meander in Q forever This hazard is precluded if f39 is sumitopped but as we have seen just before Theorem 82 deciding whether f is sumitopped can be inconvenient Fortunately some oftiencountered sumitopped con gurations are easy to recognize Corollary 83 A Weak Convexity Condition Suppose f gh is a differentiable difference between two convex functions one nonidecreasing and the other nonincreasing throughout a closed interval Q Then Newton s iteration an xn fxnf xn started from any x0 in Q either converges in Q to the zero z of f or leaves Q the iteration cannot meander in Q endlessly Proof See Corollary A23 in Appendix A2Functions ofResm39cted Variation apparently f is one of those and therefore continuous and sumitopped over 2 Therefore Theorem 82 applies END OF PROOF Since f determines neither Q nor the splitting gh f uniquely arbitraryness can complicate the application of Corollary 83 Take the admittedly contrived example fx arctanx for which Newton s iteration converges to z 0 from any x0 strictly between the points il39l7452 swapped by N but diverges otherwise These points cannot serve as endpoints for Q in Corollary 83 indeed no Q that includes both points il in its interior can sustain a splitting gh f satisfying the theorem s requirements because fO is too big for f to satisfy the sumitopped condition 0 S f vf u f w S 1 whenever v lies between u and w both in Q that every splittable f must satisfy On the other hand for every L gt 0 the interval Q L lL sustains such a splitting thus Work in Progress NOT READY FOR DISTRIBUTION Page 3463 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am gx arctanx xl L2 for L S x S 0 xl1L2 for 0 S x S 1L and hx gx arctanx But then N maps Q to itself only when 086033359 S L S 11623398 Otherwise the iteration may escape from Q and after that it may come back and converge or else diverge according to whether it started between i13917452 or not In general the weak conditions in Theorem 82 and Corollary 83 are not necessary for convergence but are at best suf cient Their virtue is their ease of application compared with attempts to apply Sharkovsky s Noiswap theorem to N Example 84 Corollary 83 was discovered rst before Theorem 82 in 1976 while I was helping Dr DW Harms and RE Martin to design a nancial calculator see Martin 1977 The equation fz 0 to be solved for a positive root 2 1 interest rate or z 1 discount rate was put into the theorem s partitioned form f gh thus fx mem C3x3 C2x2 Clx Co clx 02x2 C3x3 ckxk with nonnegative coef cients C and c representing cash ows perhaps investments and returns or borrowings and repayments Q was the positive real axis and was mapped to itself by Newton s iterating function N for this f However because m and k could be huge many thousands a complicated initial guess x0 had to be contrived to prevent instances of intolerably deferred convergence The complexity of x0 cast a shadow over the design s integrity R Carone and I got rid of that complexity when we worked on the hp12C nancial calculator introduced in 1982 and still selling twenty years later It solves a different but equivalent equation fz 0 for its real root 2 ln1 interest rate or z ln1 discount rate The partitioned form f gh required for Theorem 82 is obtained thus fx 1n Cmemx C3e3X C2ex2 Clex 1n Co cle39X 02e392X c3e393X cke39kx with the same coef cients as before The convexity of g and h is less obvious than before Q is all the real axis Because this fx is so nearly linear when is big the iteration s dependence upon the initial guess x0 has become so mild that a crude guess provably suf ces END EX 84 The hypotheses of Theorem 82 and Corollary 83 are the weakest global conditions known to be suf cient to prevent Newton s iteration from meandering forever Their hypotheses suf ce also to prevent Secant iteration from meandering as we shall see Work in Progress NOT READY FOR DISTRIBUTION Page 3563 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am 9 The Projective Connection between Newton s and Secant Iterations Before proving that Theorem 82 and Corollary 83 above apply as well to Secant as to Newton s iteration we explore a connection that the reader may have anticipated roughly f Newton s iteration converges to a simple zero of f so does Secant iteration This connection grows out of the straight lines the tangents and secants that gure in both iterations The straightness of both kinds of lines is preserved by a family of Projective Maps of the plane to itself consequently both iterations convergence is invariant under these maps as is the convexity hypothesis in Theorems 75 and 76 above See Appendix A3 Proj ective Images and especially Lemma A32 for details of which very few will gure directly in what follows Lemma 91 An Intermediate Value If Su w u fuu wfu fw does not lie between u and w Le if fufw gt 0 and if f x is nite throughout u S x S w then at some v strictly between u and w either Nv v fvfv Su w or fv f v 0 fx secant g N v u V Su w Proof There is atrivial case when u v w and Su w Nv A different special case can arise with fu fw 0 in this case Su w oo Nv at some v strictly between u and w where Rolle s theorem implies fv 0 The lemma generalizes this special case For nite s Su w the proof is constructed from a projective map that preserves u and w but pushes s off to 00 Then like scaffolding under a newly built bridge the projective map is removed to leave only a slender proof standing Let 0x fxsx Since s does not lie between u and w gtu 0x and 039x are nite throughout u S x S w And 0u 0w because of how s was de ned so Rolle s theorem implies 039v 0 at some v strictly between u and w 039v f vsv fvsv2 0 implies that this v is where either Nv s or fv f v 0 END OF PROOF Work in Progress NOT READY FOR DISTRIBUTION Page 3663 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Digression Where between u and w may the lemma s V fall In general V need not be unique However in the special case that signfquotX stays constant throughout u lt X lt w the aforementioned proj ectiVe map can be used to show that the equation NV Su w has just one root V between u and w and then the smaller the Variation of loglf xl the more closely can V be located Its location is obtained from the rst of Identities 73 fSuw Suw u Suw w fllSuw u w and fNV NV v2 fllNv v v When flT m f 2 is nearly constant so that f is nearly a quadratic polynomial combining these identities with the equation NV Su w of Lemma 91 implies that its root v Su w V u Su w w Su w sto w u wfllNv v v signu Su w m Su w u Su w w Su w signu Su w END OF DIGRESSION Lemma 91 joins Newton s iterating function N and the Secant s S by a bridge that breaks only 0Ver a zero of f across which f does not reVerse sign otherwise the bridge bears a big load Theorem 92 Suppose f and N are continuous throughout a closed nite interval Q strictly inside which f does not Vanish without reVersing sign there too If Newton s iteration conVerges in Q from eVery initial X0 in Q then it conVerges to the sole zero 2 of f in Q and Secant iteration also conVerges in Q to z from eVery two starting points X0 and X1 in Q That Newton s iteration always conVerges within Q is an essential assumption independent of the others see NoniTheorem 72 ab0Ve Unless z is anendpoint of Q the assumption that f reVerses sign across its zero 2 is essential otherwise two consecutiVe Secant iterates astride 2 could send athird to 00 The assumption that N is continuous is essential too otherwise as Example A33 shows the theorem s conVerges would haVe to be replaced by a complicated assertion about conVergent subsequences of iterates like the one in my report 197939 This theorem was dis00Vered in 1977 in time to affect decisions made during the design of the root7 nder behind the SOLVE key on HewlettPackard handheld calculators beginning with the hp34C described in my reprint 1979quot The proofis long but because it cannot now be found elsewhere it is presented here despite its length Proof of Theorem Because N maps Q continuously into itself otherwise Newton s iteration could escape from Q it must contain at least one xedipoint z Nz which has to be a zero of f This zero 2 cannot be a subinterval of Q because f reVerses sign at z Another zero is ruled out by Rolle s theorem which would imply a point between them where f would Vanish and N would jump out of Q to 00 In fact f cannot Vanish in Q except perhaps at z elsewhere f is strictly monotonic in Q At the possible cost of replacing f by f we may assume that f is strictly increasing throughout Q Finally N satis es all four conditions that U satis es in SharkOVsky s Noiswap Theorem 51 ab0Ve These conditions will gure at seVeral places in the rest of the proof which is presented below as a sequence of shorter propositions Proposition 93 All Secant iterates Xn1 SXn X114 stay in Q This follows from Intermediate Value Lemma 91 ab0Ve and the assumption that N stays in Q Work in Progress NOT READY FOR DISTRIBUTION Page 3763 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Proposition 94 For n 2 l we might as well assume that every xn 7E xn at z Otherwise nothing would be left to prove because xn Sxn xn1 xn if and only if fxn 0 and hence Xn1 xn z The possibility that x1 x0 and x2 Nx1 is harmless Proposition 95 If a subsequence of differences xn xn 7gt 0 then for every integer k2 0 xed in advance the corresponding subsequence xnk 7gt z Since divided difference lies between the minimum and maximum values taken by derivative f on Q the corresponding subsequence fxn Xn1 xn flxn xn1 7gt 0 and therefore xn 7gt 2 since f is strictly increasing and then Xn17gtZ too For each n in the subsequence Intermediate Value Lemma 91 implies that some yn exists between XI and Xn1 satisfying either xn2 yn or xn2 Nyn either way the subsequence xn2 7gt 2 too because N is continuous Repeat as often as necessary to infer that xnk 7gt z Does the continuity of N then imply by itself that all xnk 7gt 2 too no matter how k varies with n De nitions A Variance is an iterate vn Sxn1 xn2 for which fxn1fvn lt 0 and then both 2 and xn Svn xn1 must lie strictly between VH and xn1 A Permanence is an iterate pn Sxn1 xn2 for which fxn1fpn gt 1 and then both 2 and Xn1 Spn xn1 must lie strictly on the side of pH opposite from xn1 The Wraith of Permanence pH is its nearest solution wn of Nwn Xn1 strictly between pH and xn1 the existence of wn is assured by Intermediate Value Lemma 91 Proposition 96 For n 2 2 every iterate xn Sxn1 xn2 is a Permanence or a Variance The possibility that 0 lt fxn1fxIQ lt l is ruled out by the strictly increasing nature of f as follows For the sake of argument suppose 0 lt fxn1 lt fxIQ This supposition would imply z lt xn1 lt xn since f is increasing and then flxn1 xn2 fxn1xn1 xn lt 0 which is contradictory The other impossibility 0 gt fxn1 gt fxIQ is dispatched similarly Therefore every iterate xn can be renamed either pn or vn Proposition 97 If two consecutive iterates vn Sxn1 xn2 and Vn1 Svn xn1 are both Variances then vn lies strictly between VH and xn1 and then both xn2 and 2 lie strictly between Vn1 and vn and also vn xn1xn2 Vn1 gt 4 Vn 7 Xn2 if Vn1 quotquotquotquotquotquotquotquotquotquot quot Xn1 lt7 7 z 7 7 7 Only the last inequality requires unobvious con rmation The de nition of Variance implies that fxn1fvIQ lt 0 and fvIQfvn1 lt 0 so signfvn1 signfxn1 and then since f is monotonic fxn1fvn1 gtl because Vn1 is closer to 2 than xn1 is Consequently Work in Progress NOT READY FOR DISTRIBUTION Page 3863 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Vu Xn1Xn239 Vn1 39Vn1 mam fxH1fv fv1 ftv1 v9 fon fx1 fv 1 fv fvfv1 1 foom fxH1fvo forafoal 2 1 2 fxn1fvn1 fxn1fvn1 gt 4 as claimed Proposition 98 If among the Secant iterates xn at most nitely many are Variances or if at most nitely many are Permanences then the iteration converges to z If all but the rst nitely many iterates are Permanences they must converge monotonically in Q to something and it must be 2 by Proposition 95 If all but the rst nitely many iterates are Variances then the subsequences xzn and xzn must ultimately converge monotonically in opposite directions with lxzn xzmll 7gt 0 at least as fast as l4n thanks to Proposition 97 so the iteration converges to 2 as claimed Henceforth only those sequences xn containing in nitely many Permanences and in nitely u 39 many Variances need be considered Think of Per as 1 marks strings of consecutive Variances What matters most about such a string is whether its length is even or odd Even lengths including 0 will be treated rst Proposition 99 If a Permanence pH is followed by an even number 2k 2 0 of consecutive Variances vn vn2 vn2k before the next Permanence pn2k1 then the numbers Xn l Wn l Pn Vn2a Vn4a Vn2ka Wn2k1a pn2k19 Z Vn2kl Vn3a Vn1 are exhibited here in strictly monotonic order perhaps reversed If 2k0 then xn1 wn pn Wn1 and pn lie on the same side of z If 2k 2 only Vn1 lies on the side of z opposite the other four iterates and two Wraiths For 2kZ 2 this proposition follows from Proposition 97 Proposition 910 If at most nitely many strings of Variances have odd lengths the iterates xn converge to z Discard as many of the earliest iterates as necessary and renumber the rest to obtain a sequence of iterates Xn1 Sxn xn1 in which no string of Variances has odd length Proposition 99 J39I 1 implies that the Permanences and their 39 iterates quot a subsequence bounded by z In other words if the successive Permanences are pm pm pm then xn11 pm xn21 pm xn31 pn3 z are exhibited here in monotonic order but perhaps not strictly so This subsequence of iterates must converge and by Proposition 95 it must converge to z Recall now the Permanences Wraiths for j l 2 3 each Wraith wIli lies between an and pIli and satis es Nwm xni Evidently the Wraiths converge to z and since N is continuous so must the subsequence of iterates xnl xnz xn3H Among these lie all the initial Variances in strings of consecutive Variances each string having nonzero even length With the aid of Proposition 99 again we conclude that the Variances converge to 2 too Were N not continuous the Variances might not all converge to 2 see Example A33 Work in Progress NOT READY FOR DISTRIBUTION Page 3963 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Only the possibility that in nitely many strings of Variances have odd lengths remains to be addressed to complete the proof of Theorem 92 For this purpose we introduce three more De nitions A Scout sn is a Permanence followed by astring of Variances of odd length A Guard gn is the rst Variance following in the sequence of iterates a Scout sn A Convoy is the set of Wraiths belonging to the Permanences that come after in the sequence of iterates a Guard but not after its subsequent Scout For instance if Scout sn is followed by 2k1 Variances gn vn2 vn2k1 followed by Permanence pn2k2 then the numbers Xn l Wn Sn Vn29 quot399 Vn2k9 Z 9 pn2k29 Wn2k29 Vn2k19 quot399 Vn39 gnl appear here in monotonic order according to the de nitions of Permanences Variances and Wraiths If 2k1 1 then vn2 vn2k do not appear here Only the desired zero 2 and the Wraiths wn and wn are not iterates That last Permanence pn2k2 might be a Scout too or it might not We have to con rm next that every Wraith belongs to a Convoy escorted by a Scout ranging ahead of it and a Guard bringing up the rear and that alternate Convoys approach 2 from opposite sides Proposition 911 In the sequence of iterates suppose sn and sm are consecutive Scouts with m gtn Then mZn2 and the numbers xn1 wn gm z sm wm gn appear here in monotonic order and the Convoy of Wraiths wJ for n lt j S m lie numerically between Guard gn and the next Scout sm on the other side of which lie rst 2 and then gm and then wn In the sequence of iterates Scout sn is followed by some odd number 2k1 of Variances gn vn2 vn2k1 followed by Permanence pn2k2 followed perhaps by more strings of Variances of even lengths separated by Permanences up to the PermanenceiandiScout sm followed by an odd number of Variances gmH How are all these numbers ordered numerically It is easy to verify that Xn la Wn Sn Vn29 quot399 Vn2k9 Z 9 Sm Wm pn2k29 Wn2k29 Vn2kl quot9 gnl appear here in monotonic order except that if m n2k2 then pn2k2 and wn2k2 are redundant and should be dropped If m gt n2k2 then every string of Variances between in the sequence of iterates pn2k2 and sm has even length so Proposition 99 ensures that every Permanence after in the sequence of iterates gn but not after sm has its Wraith strictly between the Guard gn and the Scout sm of this Convoy of Wraiths all on the side of sm opposite 2 Moreover wm is this Convoy s Wraith nearest 2 The Guard gm following Scout sm falls somewhere on the other side of z where Here Intermediate Value Lemma 91 combines with Sharkovsky s NoiSwap Theorem 51 to explain why this new Guard gm must come between 2 and the previous Convoy s Wraith wn nearest 2 If that were not so if gm fell on the side of wn opposite z then the numbers Work in Progress NOT READY FOR DISTRIBUTION Page 4063 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am gml NWm Wn Z 9 Wm gnl NWn would appear here in monotonic order and violate the No Crossover Condition that N must satisfy if Newton s iteration is to converge to z from every x0 in Q Thus has every claim in Proposition 911 been vindicated and without saying which of gm and SH lies between the other and z it is impossible to say and does not matter What matters is that alternate Convoys of Wraiths proceed monotonically towards z from opposite sides on each side every Convoy is separated by its Guard from the preceding Convoy on the same side Proposition 912 If the sequence of Secant iterates contains in nitely many Guards then the Secant iteration converges to the desired zero 2 Let gltlt S 2 be the least upper bound for those Convoys and their Guards less than 2 they constitute a subsequence of iterates and Wraiths converging monotonically upward to gltlt Similarly for greatest lower bound ggtgt Z z For every guard gn gt ggtgt there is a Wraith wn lt gltlt for which gn NWn as the subsequence WI 7gt g the corresponding subsequence gn NwIQ 7gt g and so because N is continuous ggtgt Ng Similarly Ng gltlt Now the No Swap Condition satis ed by N implies that ggtgt gltlt z Then all the Wraiths must converge to z pushing their Permanences including the Scouts ahead of them to converge to 2 also Then Permanences and Guards squeeze the rest of the Variances to converge too Propositions 98 910 and 912 leave no alternative but convergence for the Secant iteration and hence prove Theorem 92 END OF PROOF Note that Theorem 92 just proved has no converse in many situations Secant iteration converges from all starting points but Newton s does not fx 5x5 18x3 45x is a strongly monotonic f gt 1584 example for which Secant iteration always converges but Newton s iteration gets trapped when 1 S lxnl lt 1076570927 as we have already seen after Theorem 76 Proposition 97 prevents Secant iteration from meandering in this example Another example is fx arctanx discussed after Corollary 83 where we saw that Newton s iteration converges if started between i13917452 but diverges otherwise Apparently Secant iteration converges if started anywhere in a wider interval between about i225 but can cycle on four points x4n 475048222 x4n1 112143673 x4n2 x4n and x4n3 x4n1 and certainly diverges from starting points both greater than about 25 Theorem 92 shows how slightly an ability to solve fz 0 depends upon the computability of the derivative f39x This is not to say that Secant iteration obsoletes Newton s Instead the theorem simpli es the choice between them Secant iteration is preferable to Newton s when Computing the derivative f adds more than about 44 to the cost of computing f and The desired zero 2 is one across which f reverses sign and The desired accuracy requires at least several iterations and The contribution of roundoff to f is not so bad that its effect has to be minimized Work in Progress NOT READY FOR DISTRIBUTION Page 4163 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am The last consideration arises out of Secant iteration s greater susceptibility than Newton s to roundoff especially if its contribution has been seriously underestimated If roundoff has been assessed reasonably well and if iteration can be stopped as soon as the computed value of lfl drops below or near its uncertainty due to roundoff that last consideration becomes unimportant Anyway the global convergence properties of the two iterations rarely provide a strong reason to prefer one over the other Finally Theorem 92 also contributes to Theorem 82 another corollary that is easy to prove Corollary 913 Suppose f is continuous and sumitopped throughout a closed interval Q or suppose f gh is a differentiable difference between two convex functions one nonidecreasing and the other nonincreasing throughout a closed interval Q Then Secant iteration Xn1 Xn fXIQflxn Xn1 started from any X0 and X1 in Q either converges in Q to the zero 2 of f or leaves Q the iteration cannot meander in Q endlessly Work in Progress NOT READY FOR DISTRIBUTION Page 4263 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am 10 Accelerated Convergence to Clustered Zeros Where do multiple zeros come from They would be eXtremely rare if the equations we solve were chosen at random multiple zeros z imply an unlikely coincidence f39z fz 0 Since they are not so rare their sources must be systematic One such source is optimization Suppose we wish to minimize the largest root zp of an equation Fp z 0 containing a parameter p Values p at which dzdp 0 are candidates but they need not yield the desired minimum It may occur when the two largest roots coincide as is the case for Fp z z2 plpz l its optimal p l For nearioptimal values of p the two largest roots nearly coincide Where else may clustered zeros come from Consider an analytic function fX with several real and complex zeros zl z2 zm in a region g in the complex plane and suppose that g lies deep inside a far larger region C that contains no other zeros nor singularities of f Let the average of those zeros be u ZJ zJ m then fX and its derivative must closely resemble another analytic function fXXumHj XZJ39 and its derivative at all X in C far enough from g For all such X their respective Newton s iterating functions NX X lf XfX and x 1 fXfX ZJ uzJfxuf Cow4 must resemble each other closely too In other words to Newton s iterating function any collection of several zeros may appear from far enough away like clustered zeros practically indistinguishable from a multiple zero at that distance We have seen already before and during Theorem 76 that convergence to a multiple zero can be slow Consequently we should eXpect convergence to a cluster from afar to be retarded too Usually it is but not always Take fX 3eX e3 X for eXample All its zeros are simple Two ofthem z 017856 and Z 3 are real and in nitely many are compleX falling not far from 2 ln2k1t i 2k l2m for positive integers k From any X0 lt l Newton s iteration Xn1 Xn fXIQf XIQ converges to z almost immediately because z 0003 lt X2 lt z no matter how huge and negative X0 is From any big X0 gt 2 Z Newton s iteration converges to Z slowly at rst taking about X0 iterations to get between Z and Z 0001 because Xn1 m Xn l for a while Thus from far away on the positive but not the negative real aXis z and Z look to Newton s iteration like roots of in nite multiplicity towards which it must move very slowly A simple way to cure this lethargy is to replace fX by X 3 lnX3 which has the same real zeros but none compleX In general lethargic convergence has no simple cure And when found a cure rarely saves much time No matter how slowly Newton s or Secant iterates Xn converge usually 2nfXn 7gt 0 because of Theorem 76 Then fXn 7gt 0 so fast that it must soon fall below the threshold of rounding error noise in f or else below the computer s Under ow threshold Since the amount of time that can be saved is usually limited no cure for lethargic convergence is worthwhile if it adds much to the cost of Newton s or Secant iteration neither is a cure satisfactory if it spawns disagreeable consequences like convergence to an undesired zero Work in Progress NOT READY FOR DISTRIBUTION Page 4363 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am When the multiplicity m gt 1 of a desired zero 2 of f is known superlinear convergence can be achieved by applying Newton s or Secant iteration to lm signf instead of f then Newton s iteration takes the form Xn1 xn m fxnf xn However 2 is usually computable more accurately as a simple zero of the derivative fm39l if this can be computed Could m be known but not fm39l Perhaps that s why an m gt 1 is usually unknown An unknown m known to exceed 1 is probably 2 since larger multiplicities are extremely unlikely An unknown multiplicity can be estimated For instance if z is a zero of f with multiplicity m then lllnlfxl3939 7gt m as x 7gt z This appears to require the computation of fquot but that can be circumvented by introducing a multiplicity estimate n1n into an accelerated version of Newton s iteration thus n1n Max 1 Integer Nearest xn xn1 fxnf xn fxn1f xn1 Xn1 xn n1n fxIQf xIQ When xn converges to a zero 2 of an analytic function fx it converges at least quadratically and mn converges to the zero s multiplicity which must be an integer This convergence is faster than if Secant iteration had been applied to fxfx of which 2 is a simple zero From far enough away however a cluster of zeros complex as well as real of f can appear so much like a multiple zero to Newton s iteration that n1n may actually approximate the number of zeros in the cluster Only if and when iterates approach 2 can its own lower multiplicity m become manifest Alas the rst few accelerated iteration steps can overshoot the zero nearest the starting point too easily after which subsequent iterates may diverge or converge to a zero other than the one desired especially if an extremal real zero was desired Take fx 3eX e3 x for example again Starting from x0 gt 5 the foregoing acceleration scheme practically always skips over the larger zero Z 3 and converges to the smaller zero 2 017856 In general no way is known to moderate the growth of n1n so as to prevent this kind of undesired overshoot in all cases There is a special but common case that can be accelerated modestly without overshoot De ne Nx x fxf x Newton s iteration function and Wx x 2 fxfx DoubledNewton s iteration function This Wx can be iterated with no harm from overshoot in the following circumstances Theorem 101 Suppose that f y 0 Z fy at the lefthand end of a nite interval y S x S x0 throughout which fquot is a positive nondecreasing function also assume fxo gt 0 Then in that interval l The equation fz 0 has just one root 2 2 y and Wx lt Nx when x gt z 2 Nx Z z for all x gt y and then Wx gt y unless Wx y z 3 If x gt 2 then NWx S Nx with equality only when f E 0 Work in Progress NOT READY FOR DISTRIBUTION Page 4463 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Starting from X0 gt 2 this theorem justi es the following procedure Iterate Xn WXn1 thereby descending faster than Newton s iteration would until Xn lt 2 detected when fXn and fXn1 have opposite signs and then revert to Xn1 NXIQ Doing so locates the largest real zero of a polynomial whose other zeros real and complex all have lesser real parts and locates the largest real zero Z 3 ofeXamples like fX 3eX e3 X discussed above usually faster than would Newton s iteration all the way The theorem s clause 3 assures us that the last iterate Xn WXn1 before reversion to Xn1 NXn cannot jump past 2 so far and hence so close to y that the rst Newton iterate Xn1 would jump way back beyond Xn1 on the contrary Xn1 comes closer to 2 than NXn1 would come I rst proved this theorem in the early 1960 s but rather differently for the special case of a polynomial pX whose at least two zeros are all real the largest being 2 one of f p or f p can easily be shown to satisfy the hypotheses of this theorem A proof for this polynomial case can be found in Stoer amp Bulirsch 1980 for this case parts 1 and 2 ofthe theorem had been quoted by Jim Wilkinson 1965 who had learned them from Hans Maehly as well as me Soon afterwards Werner Greub liberated the theorem from polynomials by suggesting that the crucial hypothesis was merely f quotX Z 0 from which the following proof evolved Proof of Theorem 101 1 As X y increases through positive values so does f X because fquot gt 0 Therefore at most one root 2 2 y can exist in the given interval fX increases through 0 to a positive value fXO as X increases from y to X0 so 2 2y does eXist in the interval And then obviously NX WX fXfX gt 0 for all X gt 2 therein 2 N39X fXfquotXf X2 has the same sign as X2 if X gt y therefore the minimum value taken by NX is N2 2 A nondecreasing derivative is a continuous and therefore integrable derivative and fquot X is nondecreasing as X increases beyond y so 0 s IyXIyu f u fquotv dv du Xyf X 2rx 2fy This implies WX Z y2fyf X 2 y too with strict inequality unless y 2 in which special case Theorem 75 above implies NX 7 2 and WX 7gt 2 as X 7gt 2 In the further special case ofa quadratic f constant fquot gt 0 with a double zero y 2 we nd WX 2 3 When 2 S WX lt X the inequality NWX S NX is obvious but proving it is hard when y lt WX lt 2 lt X The proof might be easier if NX NWX fXf X fWXfWX increased monotonically but it needn t for eXample try fX X2 X224 1 1224 Worse NX NWX vanishes like OX23 so three differentiations or integrations would be needed to infer the desired inequality from an hypothesis like f quot 2 0 We shall simplify the work a little by proving that NX NWX fWX f WXfXf X fWX Z 0 To eXploit the symmetry of WX and X about NX let us abbreviate Q fXfX n NX XQ and w WX nQ then fquotnq fquotnq Z 0 when 0 S q S Q because fquot is nondecreasing Integrate 0 S IqQfquotnp f npdp f X f nq f nq f w Work in Progress NOT READY FOR DISTRIBUTION Page 4563 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Integrate again from 0 to Q to infer that f39 X f wQ Z fX fw which simpli es to the last inequality of the previous paragraph These inequalities become equalities just when f is quadratic constant fquot gt 0 END of PROOF The theorem just proved provides a guarantee that the doubled iteration Xn1 WXIQ cannot overshoot the desired zero 2 so far as would lose more than one iterationistep after reversion to Newton s Except for the one step that overshoots z the iterates of W approach 2 faster than correspondingly numbered iterates of N would because N39X gt 0 for X gt z see the proof of 2 above so once the iterates of W get closer to 2 than the iterates of N they stay closer How much faster do iterates of W descend than iterates of N would Because WX S NNX if X is close enough to z at y and usually for all X gt z in the interval W usually descends at least twice as fast as N until 2 is overstepped But not always fX XlX in the interval 2 0 lt X lt X0 lt 1 is a countereXample with WX lt NNX for X lt 21 WS 0618 but not otherwise Another eXample fX eX for oo y z lt X lt X0 lt 00 has WX NNX X2 A more nearly typical eXample is fX 3eX e3 X for which iterates descend from X0 8 thus Table 1 For fX 3eX e3 X Iterates of N Iterates of W X0 8 8 X1 7015757 6031524 X2 6052129 4195981 X3 5132988 2912537 X4 4302929 3006191 X5 3631900 3000029 X6 3198687 3 X7 3025447 X8 3000476 X9 3000000 While it converges faster than Newton s iteration the doubled iteration Xn1 WXn can still converge arbitrarily slowly to a multiple zero but its values fXn tend to zero usually more than twice as fast as Newton s would and always at least twice as fast as Theorem 76 describes Corollary 102 Assume the hypotheses of Theorem 101 again and suppose also that the procedure described just after it is followed Then either the iteration ultimately reverts to Newton s and converges quadratically or else the doubled iteration Xn WXn1 converges slower but monotonically and fXn tends monotonically to 0 at least so fast that 2114 lfXIQl S lfX0l X0 ZX0 X1 Work in Progress NOT READY FOR DISTRIBUTION Page 4663 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Proof Either y at z or y z If y at 2 then f z 0 and ultimately xn Wxn1 falls upon 2 and stops or falls between 2 and y In the latter event the iteration reverts to Newton s iteration which after stepping backward once to Nxn between 2 and Nxn1 converges monotonically according to Theorem 75 and quadratically according to Theorem 74 If y 2 then f z fz 0 and the doubled iteration Xn1 WxIQ converges monotonically towards z This iteration is the same as Newton s applied to solve the equation Wfz 0 Is Wf convex To nd out consider the RiemanniStieltjes integral I f dfquot which exists since fquot is nondecreasing If x gt 2 then 0 3 2t ft df t 2fquotxfx f x2 4Vrx3 Vrxquot Therefore Wf satis es the convexity hypothesis f satis ed in Theorem 76 whence follows its conclusion for Wf which is this Corollary s inequality END OF PROOF The doubled Newton iteration X111 WxIQ works so well in the circumstances for which it was intended that it encourages us to double Secant iteration too There are two ways to double the Secant iterating function Sx X x fxflx X x fxXxfX fx One is the obvious way x X x 2fxflx X The unobvious way applies Secant iteration to the equation Wf 0 to get an iterating function Rx X x Xx WfXfx l Both doubled iterations work The latter is faster because if z lt x lt X and so 0 lt fx lt fX then Rx X lt x X lt Sx X as is easy to verify Therefore we concentrate upon R The hypotheses of Theorem 101 are presumed to be still in force f y 0 Z fy at the lefthand end of a nite interval y S x S x0 throughout which fquot is a positive nondecreasing function also assume fxo gt 0 so the equation fz 0 has just one root 2 2 y in that interval The procedure that follows that theorem is now supplanted by this Starting from x0 gt x1 gt z iterate xn Rxn1 xn2 thereby descending faster than Secant iteration would until xnlt z detected when fxn1fxIQ lt 0 and then revert to Xn1 Sxn xn1 Once again as in Theorem 101 part 3 we seek reassurance that the last doublediiterate xn cannot overshoot 2 so far as might set subsequent Secant iterates back behind Sxn1 xn2 Conjecture 103 Assume the hypotheses of Theorem 101 again and also the de nitions of S and R then SSRu w u Ru w S Su w if z lt u lt w Discussion Intermediate Value Lemma 91 lets us de ne vu w to lie strictly between u and w and satisfy Nvu w Su w gt 2 whenever either y S w lt u lt z or z lt u lt w This v is de ned uniquely because Nx is monotone decreasing when y S x S 2 increasing when sz THERE ARE NUMEROUS DETAILS STILL TO BE SUPPLIED HERE Work in Progress NOT READY FOR DISTRIBUTION Page 4763 Math l28AB Lecture Notes on Real RootFinding June 30 2009 637 am Corollaries Convergence of fXn to 0 is faster than 13n for or 14n for R Example Table 2 For fX 3eX e3 X Iterates of S Iterates of R x0 9 9 x1 8 8 x2 7427732 6479176 x3 6706504 5205631 x4 6057988 4250802 x5 5407616 3166181 x6 4800048 2812781 x7 4243171 2976582 x8 3766543 3003603 x9 3396594 2999936 x10 3154697 3000000 x11 3037465 3 x12 3004024 x13 3000111 x14 3000000 Work still to be rewritten out 11 All Real Zeros of a Real Polynomial Finding only real zeros of a real polynomial of high degree is applicable to Tarsky resolution of rational inequalities geometrical computation construction of numerical ODE formulas Sturm Sequences Tumbull 1952 are costly to compute or vulnerable to roundoff or both A better way using Rolle s Theorem and running erroribounds is attractive when the real zeros are far fewer than the polynomial s degree as is usually the case 12 Zeros of a Real Cubic How to nd the zeros of a real cubic quickly and accurately using Newton s iteration from an artfully chosen starting guess 13 Error Bounds for Computed Roots using A5 Running Error Bounds Work in Progress NOT READY FOR DISTRIBUTION Page 4863 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am ccc Conclusion These notes were written at the behest of two mathematicians who inhabit my body The pure mathematician savors surprises The applied mathematician tries to avoid them by predicting how computational procedures will behave Both mathematicians rejoice when they prove a procedure to be surprisingly predictable But the latter s joy must be shortlived for two reasons First compared with the procedures they explain our proofs are too long they augur ill for our understanding of more complicated procedures Second more complicated procedures will arise inevitably from attempts to circumvent limitations in the simple procedures we have come to understand at last Thus these notes contain the seeds of their own obsolescence We say mature when we wish to avoid the pejorative obsolescen The material in these notes will soon be mature if it isn t already The corresponding material in most textbooks is too mature Bringing textbooks up to date is a formidable challenge compounded by limitations upon space and time both the author s and the readers Until a brave author rises to this challenge the burden of these notes will continue to be added to my students load They and I pray that their load will be lightened soon Surely Sharkovsky s Theorem 51 deserves to appear in texts So does Corollary 83 and an example of its application if not also Theorem 82 because they suggest how to reformulate equations to make them easier to solve by Newton s and Secant iteration Theorem 92 deserves at least a footnote more if someone nds a shorter proof because it justi es the use of Secant iteration instead of Newton s Error analysis dull but necessary deserves more space in texts too without it who can tell when to quit iterating or how much the result is worth Work in Progress NOT READY FOR DISTRIBUTION Page 4963 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am A1 Appendix Divided Differences Brie y This topic is discussed at length in Numerical Analysis texts like Conte amp de Boor 1980 but usually in the context of Interpolation and always in a different notation For an ancient subject the persistence of diverse notations suggests that none are satisfactory and licenses us to introduce another notation more nearly analogous to a widely used notation for derivatives Inspired by formulas attributed to Hermite we de ne for any suf ciently smoothly differentiable function fx its First Divided Difference ru w I01 r u wut d and its Second Divided Difference fIIu v w I01 for f u vut wvs ds dt For positive integers k generally the kth divided difference is the uniformly weighted average of the kLh derivative over a simplex the convex hull of kl arguments then divided by k However k gt 2 will not be needed in these notes In general the argument x of fx could be a vector but in these notes it will almost always be a real scalar Then that simplex the convex hull of the kl real arguments degenerates into an interval of the real xiaxis over which the kth divided difference becomes a positively not necessarily uniformly weighted average of the kth derivative divided by k For instance if u lt v lt w then it follows that lm v w luv i ufquott d vu W wi r39a d wv wu Because it is an average the kth divided difference lies between the largest and least values taken in that interval by the kth derivative divided by k This Mean Value property gures in nearly all applications of divided differences in these lecture notes Divided differences turn up elsewhere as coef cients in Newton 393 Interpolating Polynomials which see below or during rooti nding or optimization or when differential equations are solved using nite differences Because the argument x of fx is a scalar the foregoing integrals can always be simpli ed into expressions with no integral signs For instance ru w fu fw u w if w 2 u f u if w u fIw u arguments order doesn t matter f v at some v strictly between u and w if they are unequal The rst two equations above constitute an alternative de nition of fT in so far as they describe it independently of whether f exists strictly between u and w and then the last equation turns out to be valid so long as f x does exist at every x strictly between u and w and fx is continuous at u and w even if f39 is not integrable Similarly the next two lines describe or alternatively de ne fH independently of whether fquot exists fIIuvw fIuv fIvw uw if w i u 6fIuv6u f u fIuv uv if w u at v fuuvuw fvvwvu fwwuwv if u at v at w i u fIIvwu fIIuwv arguments order doesn t matter f y at some y between minuvw and maxuvw if fquot exists Don t confuse fIIu v w with f Iu w f u f wuw fIIu u w fIIu w w Work in Progress NOT READY FOR DISTRIBUTION Page 5063 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Strictly speaking we should write u w instead of flu w because it is best construed as a function of an unordered pair u w that replaces the single argument x of fx Similarly we should write fllu v w instead of fllu v w The extra braces are super uous in divided differences of functions of one argument but a necessary nuisance in partial divided differences of functions of more than one argument For instance given any function gx y of two scalar arguments we must distinguish gluw y guy gwy uw from glx uw gxu gxw uw by our placement of the braces to show which argument was split into a pair 6gxy6x glxx y and 6gxy6y glx yy are distinguished by the same imperative Similarly the mixed partial divided difference gllGLu V5W i gLV guaV gLW t guaw tuVW has to be distinguished from gHO uaVawD i gtauuVuW t gLVVWVu t gLWWuWV much as we distinguish 62g6x6y gllxx yy from fizg y2 2gllx yyy The factor 2 will be vindicated in a moment and if discontinuity invalidates 62g6x6y 62g6y6x it may render gllxx yy ambiguously dependent upon the order of limiting processes Return to functions fx of one argument A composed function fx hpx has a derivative f x h39px p39x derived from a Chain Rule that works analogously for divided difference fluw hlpu pw pluw And just as derivatives compound to form higher order derivatives like f x f x39 divided difference operations compound to form higher order divided differences For instance the alternative de nition of fll above amounts to fllau v w mum w fllau vw flluv uw in other words every second divided difference is a rst divided difference of a rst divided difference in as many as three ways Since derivatives are limiting values of divided differences arluwau flluuw flluuw and 6fluw6w flluww provided the derivatives in question exist Setting u v w vindicates the factor 2 in fquotv dflvvdv fllvvv lvvv 2fllvvv Like differentiation divided differencing maps certain families of functions into themselves Divided differences of polynomials are polynomials albeit with more arguments Divided differences of rational functions of scalar arguments are rational Likewise algebraic Irrational algebraic functions are handled by implicit divided differencing just like implicit differentiation and derived in the same way from the Chain Rule With the aid of that rule any algorithm that computes an algebraic function fx can be expanded mechanically into a similar algorithm that computes divided difference flu w fu fwuw at almost the same cost as computing fu and fw but without ever dividing by uw A simple example is Wlu w lNu Ww Ideally such expansions should be performed on request by computerized algebra software like Derive Macsyma Maple and Mathematica which ought to manipulate divided differences as well as derivatives but they don t Consequently the computing public remains largely unable to exploit a valuable but little known application of divided differences namely the suppression of numerical instability attributable to systematic cancellation Work in Progress NOT READY FOR DISTRIBUTION Page 5163 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Many a numerical computation turns out to be the computation of a divided difference in disguise Attempts to compute flu w naively from the obvious formula fu fwuw can be thwarted by roundoff and then cancellation when u is too near w If nonzero the divisor uw is no problem because its cancellation occurs without error But the computed value of fu is generally rounded to say fu Afu and therefore the value computed naively for flu w when fu fw would mostly cancel turns out to be flu w Afu Afwuw instead overwhelmed by the last quotient if uw barely exceeds roundoff For example consider the solution of a quadratic equation Az2 2B2 C 0 Its solutions 2 are B i WB2 AC A The solution of smaller magnitude 2 C signB WIB2 B2 AC is vulnerable to roundoff and cancellation when lACl ltlt B2 unless the divided difference Wl is expanded as was mentioned above to yield 2 C B signB WB2 AC which stays accurate if lACl ltlt B2 Sometimes the accuracy of transcendental expressions can be insulated from cancellation with the aid of ancient formulas motivated by divided differences For example tanu tanwuw is best computed from the formula tanlu w l tanu tanw tanluw 0 when u nearly equals w Sometimes an inverse divided difference can render cancellation harmless For instance because lnlv l lnvvl does not suffer from cancellation when v nearly equals 1 the computation of explu 0 expu lu can be protected from cancellation in the numerator by the use of the formula explu 0 llnlexpu 1 instead These transcendental examples work because they exploit the few occasions when transcendental functions take simple rational values at rational arguments In general transcendental functions af ict divided differences but not derivatives in two ways First many transcendental functions have simple perhaps algebraic derivatives but no simple expanded divided differences unde led by cancellation For example d21nvdv2 lv2 but no known simple nite formula for lnllu v w stays accurate no matter how u v and w approach each other Secondly the divided difference of a nonipolynomial rational function of a vector argument generally involves logarithms andor arctangents For example let column vectors x and u H and let fx yz then its derivative f x lz yz 2 is a Z W rational row vector but Hermite s formula for its rst divided difference yields a transcendental flx 11 lnlzw y vlnllzzw lnllzww yvzw Newton 3 Interpolating Polynomials approximate functions of scalar or vector arguments fx fu fluxxu fu fluv flluvxxv xu fu fluv flluvw fIlluvwxxw xv xu etc The polynomial in x obtained by substituting 0 for fTH interpolates matches fx at xu x v and x w elsewhere it differs from fx by a remainder term f yxwxvxu6 in which y falls somewhere inside the convex hull of u v w x Interpolation is osculatory if two of u v w coincide This polynomial s degree is minimal only for a scalar argument x Work in Progress NOT READY FOR DISTRIBUTION Page 5263 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am A2 Appendix Functions of Restrained Variation This digression concerns a way to sum an undulating function s uctuations The TotalVariation of a real function QX over a closed nite interval u S X S w is de ned to be VuWQ luwldQXl Iulerxidx though the last equation is valid only if lQ39Xl exists and is integrable In general a function whose Total Variation over some interval is nite is called a function of Bounded Variation thereon Such functions gure in Measure Theory Stieltjes integrals and Fourier series They can have none but jump discontinuities and at most countably many of these Bartle 1976 In particular a derivative of bounded variation must be a continuous derivative Obviously Vuw Q 2 lQw Qul with equality just when Q is monotonic Where QX is continuous so is VuXQ Wherever QX jumps so do VuXQ and VXWQ and by the same amount but the former always increases and the latter always decreases as X increases Hence Total Variation is Additive over abutting subiintervals if u S X S w then VuX VXW Vuw It is a SemiiNorm because it satis es the Triangle Inequality 0 S Vuw PiQ S Vuw P Vuw Q If Vuw Q lt co and u S X S w then QX admits in nitely many Splittings into a difference Q PM between two nondecreasing functions PX RX QX VuX Q Vuw Q and MX RX QX VXW Q Vuw Q in which R can be any nondecreasing function Conversely any nonidecreasing P and M determine both Q PM of bounded variation and the function RX PX MX VuX PM PX MX VXW PM that appears in Q s splitting this R is nondecreasing because PM varies faster than PM If Q and R are continuous so are P and M and vice versa If Q and R have integrable derivatives so do P and M and vice versa But when Q39 is so violently oscillatory that Vuw Q 00 then Q is unsplittable as are eXamples like QX X2 COS1X2 around 0 Among functions Q of bounded variation the ones that will interest us have a splitting Q PM that is special because all three of Q P and M have the same sign and keep it throughout the interval u S X S w We shall call such a function Q a function of Restrained Variation Lemma A21 A Function of Restrained Variation Q can be split into a difference Q PM between two nonidecreasing functions P and M one noninegative and the other nonipositive throughout the closed nite interval u S X S w if and only if Vuw Q S lQu Qwl Proof Ifnecessary replace Q by Q to get Q2 0 If r Qu Qw VuWQ Z 0 then choose any nondecreasing R Z 0 and Pu Z 0 and Mw S 0 subject only to the constraint 2Pu 2Mw Rw Ru r and construct functions Work in Progress NOT READY FOR DISTRIBUTION Page 5363 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Px Pu VuXQ Qx Qu Rx Ru Z 0 and M00 1 M0 wa Q t QX QW t RW RX S 0 evidently they are nondecreasing and satisfy Px Mx Qx too as desired If r 0 this splitting is determined uniquely with Pu Mw Rx Ru Rw Rx 0 On the other hand if a splitting Q PM already exists with nonidecreasing P and M and P 2 0 Z M then this splitting also determines the nonidecreasing Rx Px Mx VuX Q VXW Q as explained before the lemma therefore 0 S Rw Ru Pw Mw Pu Mu Vuw Q whence Vuw Q S Qw 2Mw 2Pu Qu S Qu Qw as claimed END OF PROOF What are here called functions of restrained variation are also called tame by Aharoni et al 1992 who characterized them by means of a discretized version of the foregoing lemma which now shortens the proof of their characterization Lemma A22 Tame Functions Aharoni et al 1992 Qx is a nonnegative function of restrained variation over the interval u S x S w if and only if Qxo Qxl Qx2 Qx2k1 Qx2k Z 0 for every integer k 2 0 whenever u S x0 S x1 S x2 S S x2k1 S x2k S w Proof If Q PM for some nondecreasing P 2 0 and M S 0 then every alternating sum QXo 2 1JQxJ Pxo Zj k mpPm Zjok391 Mx2j1Mx2j Mxzk is nonnegative termibyiterm which con rms the lemma s only if part Except for setting k 0 to prove Q 2 0 the if part is harder to prove Its proof is easier when Qx takes its locally extreme values at only nitely many points in the interval u S x S w including its endpoints among them Then we assign x0 u x2k w and for 0 ltj S k we set all other x2j to be all consecutive points where Q2J Qx2j is locally minimal and x2j1 to be all consecutive points where Q2j1 Qx2j1 is locally maximal these points interlace including possibly x1 u if Q0 Qu is locally maximal andor x2k1 w if Q2k Qw is locally maximal Because the lemma s alternating sums are all nonnegative we soon nd that Qu Q0 2 Qi39Qo Q139Q2 Q339Q2 Q2k 139Q2k 2 Q2k 139Q2k Vuw Q Applying Lemma A2l completes the proof for the case when Q has just nitely many extrema When Q has in nitely many extrema the last equation is invalid but salvaged by taking its left7 hand side s supremum over all partitions u x0 S x1 S x2 S S x2k1 S x2k w END OF PROOF Restrained variation has only one consequence signi cant for Newton s or Secant iterations it is the following corollary whose now nearly obvious proof is left to the reader Corollary A23 A function Q of restrained variation over an interval Q is also of restrained variation over every subinterval of Q and is sumitopped thereon Work in Progress NOT READY FOR DISTRIBUTION Page 5463 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Sumitopped is case k l of Lemma A22 This corollary s converse is false A function can be of restrained variation over two abutting intervals and yet not over their union A function can be sumitopped but not of restrained variation QX 3 cosX is an example over any interval wider than 211 But a sumitopped unimodal function is of restrained variation A function unimodal over an interval Q has at most one extremum maximum or minimum strictly inside Q Our interest in functions of restrained variation is now mainly historical Over about two decades ago they were the rst nonimonotonic functions to be recognized as sumitopped and in practice they are still easier to recognize as such from their splittings than are most other sumitopped functions Their relevance to Newton s and Secant iteration is apparent in Corollary 83 Work in Progress NOT READY FOR DISTRIBUTION Page 5563 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am A3 Appendix Projective Images The rede nition SX X NX connects Newton s iteration Xn1 NXn Xn fXnf XIQ to Secant iteration Xn1 SXn Xn1 Xn fXnflXn Xn1 but not so tightly as they are connected by a shared family of invariants under certain Projective transformations In general plane projective transformations are those that map straight lines to straight lines Thus they map tangents to tangents and secants to secants which is why some of them are pertinent to Newton s and Secant iteration The pertinent ones constitute a fouriparameter family of projective maps each of which takes a pair X fX to a pair X FX in such a way that both Projective Images fX and FX are linear functions of their respective arguments or else neither are Each of these maps is determined by the values of four constants B u b and m chosen almost arbitrarily subject to two inequality constraints Constraint I c Bm by at 0 and Constraint II bm does not lie strictly inside the interval Q in which we seek a zero 2 of f After these constants have been chosen the projective map X fX gt X FX and its inverse X fX lt X FX are de ned thus X XX HX Bb mX FX fxXb mxX fxXp mXlt X xX bX Bu mX fX FXXb mX FXXcu mXX In the last two lines the last equation is derived from the rst which is a M obius Bilineari Rational transformation with the aid of a valuable identity bmXpmX 9 0 It and Constraint II prevent b mX from reversing sign while X runs through Q and prevent u mX from reversing sign while X runs through the interval XQ Whether this Mobius map preserves or reverses order in those intervals depends upon the sign of c in Constraint I because the same sign turns up in dXdX X39X cb mX2 and dXdX x39X gp mX2 What do projective images F and f have in common F has as many zeros strictly inside XQ as f has strictly inside Q A zero at an end of an interval can evaporate if that end is mapped to 0039 for eXample consider fX X and XX lX for X Z 0 whence FX l for X S 0 Similarly F and f have the same number of poles strictly inside their intervals Therein F also has as many In exiompomts where Fquot 0 and Notches where Fquot 00 as f has since FquotX f xXcp mX3 Other properties F and f share are less obvious Under composition the projective transformations form a noniAbelz39an nonwommutative Group isomorphic to the multiplicative group of nonsingular 2by2 matrices In other words suppose XJ39X ijX Bjbj ij for j l 2 3 are the Mobius parts ofthree projective transformations of which the third is composed from the rst and second X3X X2X1X b 7m b 7m b 7m 7 b 7m 7 then 3 3 3 3 1 1 and c3 7 det 3 3 7 92 91 0 In th1s 1somorph1sm Bs H3 El 2 B1 1 Es 3 the projective map associated with the constants 5 u b m c b p Bm has an inverse that Work in Progress NOT READY FOR DISTRIBUTION Page 5663 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am must be associated with constants respectively Bc bc yc mc lc Every projective map can be decomposed into a sequence of at most ve maps each selected from a subgroup listed in the following table Table 3 Subgroups of Projective Maps X f ltgt X F Subgroup Name XX XX FX B u b m c Scaling X X fXb 0 b b 0 0 b2 Dilation p X Xu fXu 0 u i 0 l 0 u Translation x B X B fXB B l l 0 l lX lX XflX l 0 0 l l Reciprocali The Reciprocal subgroup has two elements including an Identity that changes nothing Often the easiest way to prove an assertion true for all projective maps is to prove it for each subgroup separately and then infer it for their compositions Often the assertion is unobvious only for the Reciprocal subgroup Such is the case for the neXt two assertions Lemma A31 Bf r3 r39392r5 m3 3 r r ff quot2ffquot3 is invariant for projective maps X fX ltgt X FX of nonlinear functions in other words nonlinear projective images fX and FX satisfy PEfXPEFX after substitution of the projective map s Mobius part say X XX Lemma A32 Newton s iterating function NfX X fXf X and Secant iterating function SfX y X fXXyfX fy are constructed from f by operators N and S that commute with projective maps X fX ltgt X FX in other words NFXX XNfX and SFXX Xy XSfX y wherein X XX is the projective map s Mobius part The tedious but easy proof of both lemmas is left to the reader For example a Negative7 Reciprocal projective map X fX ltgt X FX de ned by X lX and FX XflX has NFX X FXF39X X XflX flX f lXX lNfX as claimed in Lemma A32 It implies that whether the iterations converge or meander is another invariant of projective maps and motivates us to learn more about them The Mobius part of a projective map is determined by what it does to any three distinct values 11 v w of X It must map them to some three distinct values U V W respectively of X and vice versa It can be constructed from these triples by solving a bilinear Cross Ratio equation like x uxv wxX WV U X UV wxx wxv u for either X XX or X XX thereby determining the constants B u b and m eXcept for a common factor One member of the triple u v w can be 00 if the crossiratio equation is replaced by an appropriate limit similarly for U V W The sign of c Bm by which determines whether the Mobius transformation preserves or reverses order is the same as the sign of Work in Progress NOT READY FOR DISTRIBUTION Page 5763 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am uvvwwuUVVWWU b mub mvb mw2g3 if both triples u v w and U V W are entirely nite Moreover Constraint II bm does not lie strictly inside the interval Q which ensures that XQ be an interval too requires both triples u v w and U V W to have either the same or the opposite linear order whenever u v and w lie in Q or equivalently whenever U V and W lie in XQ Many a theoretical problem is simpli ed by a projective map that transforms a nite interval Q into a semiin nite XQ One example is the proof of the Intermediate Value Lemma 9 1 Here is another example designed to show why the continuity of Newton s iterating function N is an hypothesis necessary for the conclusion of Theorem 92 Example A33 Twice differentiable function f will strictly increase throughout a nite interval Q From any starting point in Q Newton s iteration xn Nxn will always converge in Q to a zero 2 of f On the other hand Secant iteration won t converge from some starting points in Q This ostensible violation of Theorem 92 merely violates one of its hypotheses this example s N is discontinuous at z This example fx is the projective image of a simpler FX constructed over the semiin nite interval oo S X S X0 lln2 1442695 thus For n 0 l 2 3 in turn let X3n llnn2 X3n1 X3n X3n32 X3n2 oo Evidently X0 2 X311 gt X3n1 gt X3n3 gt 0 and X311 2X3n3 l X3n6 gt 0 Next de ne FX x explX if x lt 0 0 if X 0 I X311 X3n32 X3n1 S X S X311 and 1 PH qn T2X 39 X3nl 39 X3n3X3nl 39 X3n3 if X3n3 3 X S X3nl where PH 1 FX3nl FX3n3 2 X3n39 X3n64 gt0 qn 1 FX3nl FX3n32 X3n39 2X3n3 X3n64 gt0 and Tt tanhtan11t2 if 1 ltt lt l signt otherwise T is in nitely differentiable with T t gt 0 for l lt t lt l and T39 0 otherwise Til i1 Consequently the graph of FX over 0 lt X S X0 looks like a rising staircase with rounded comers and risers and treads that shrink to zero as X 7 0 Between subintervals over which F is constant are subintervals X3n3 lt X lt X3n1 over which F39X gt 0 and FX increases monotonically from FX3n3 X3n3 39 X3n62 to FX3nl FX3n X3n39 X3n32 as X increases In the middle of each such subinterval the derivative F39 rises to its local maximum n2X3n 2X3n3 X3n6X3n X31143 which approaches 0 roughly like ln as n 7gt 00 Consequently FX 7gt 0 and F39X 7gt 0 roughly like explX or faster as X 7 0 It soon follows that FX is twice differentiable wherever it is de ned namely oo S X S X0 The completed de nition of Newton s iterating function NFX X FXF39X including NF0 0 NFoo l and NFX oo when X3n1 S X S X3n remains discontinuous at 0 because NFX runs from w up to a small positive value and back to oo as X runs Work in Progress NOT READY FOR DISTRIBUTION Page 5863 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am through each subinterval X3n3 S X S X3n None the less Newton s iteration converges to Z 0 ultimately monotonically and usually slowly from every starting iterate in the domain of F But Secant iteration need not converge The completed de nition of the Secant iterating function SFX Y X FXFIX Y has to include the limiting values SFX X NFX SFX oo X FX and SFXY oo when FXFY 0 7E XY Then Xn SXn Xn1 for n l 2 3 by design Starting from X0 and X1 every third Secant iterate X3n2 oo 39 thus Secant iteration does not converge although the subsequence of nite Secant iterates converges slowly to Z 0 To transform the semifin nite interval oo S X S X0 into a nite interval 1 S X S 25887 set X xX X2X or X XX 2XlX and fX lXFXX This projective map turns F into a twice differentiable strictly increasing function f while preserving the iterations nonconvergence Newton s iteration converges to z 0 from every starting iterate between 1 and X0 xX0 but starting from X0 and X1 xX1 every third Secant iterate X3n2 xX3n2 l Thus Secant iteration need not converge though I have proved 1979 that a subsequence of its iterates always imitates Newton s by converging to z END EX A33 Inverse to the problem of constructing a projective map is the problem of detecting one Given fX and FX what test reveals whether they are projective images of each other An easy test works if they have at least three but not too many of the following special points zeros poles in eXionipoints notches For instance suppose the triple u v w includes one zero and two in eXionipoints of f and U V W does likewise respectively for F then solving the crossiratio equation above determines a prospective Mobius transformation X XX that passes the test if fXFXX is a linear function namely b mX If this XX fails the test all other matching triples of consecutive special points have to be tried and fail too before f and F can be deemed not to be projective images this is why we hope f and F have not too many special points Another test can be fashioned out of Lemma A3l s projective differential invariant 2Er 3 r r rw r2 r393r After the substitution X XX of their Mobius transformation nonlinear projective images fX and FX must satisfy PEfX PEFX Conversely if the equation PEfX PEFX is satis able by a Mobius transformation X XX for which fX3 f X FXX3 Fquot XX simpli es to apositive constant 92 and fXFXX simpli es to a linear function b mX then fX and FX are projective images For example PEfX 12 4 lnX 9lnX and PEFX 12 4 lnX 9lnX when fX lnX and FX X lnX so the equation PEfX PEFX has two solutions X XX of which only one is a Mobius transformation XX 1x neXt 92 fX3 f X FXX3 FquotXX 1 and b mX fXFXX x whence u b 0 and B m c l in the projective map X fX ltgt X FX For another example the projective map X XlX ltgt X XlX can have either of two Mobius parts either XX X ll and c l or XX ll X and c l Work in Progress NOT READY FOR DISTRIBUTION Page 5963 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am This test is complicated slightly by the possibility that in nitely many Mobius transformations may be compatible with a given pair of of projective images For instance Bf BF 16 when fX eXpX and FX X eXplX and then the equation BfX BFX is satis ed by all Mobius transformations X XX but 92 fX3 f X FX3 Fquot X eXp4X 4X is a constant only if X lX 6 is a constant so the only compatible Mobius transformations are XX lo X whereupon b mX fXFXX e6 6 X whence B m e6 u 0 b 6ef and c e20 in projective maps X fX ltgt X FX wherein o is a parameter Another oneiparameter family of projective maps with Mobius part XX constantX 0 has projective images fX Xk and FX Xl39k and Bf BF l6ll2kl2 for any constant k I know no other oneiparameter family nor other projective images with constant Bf We have seen that Bf and the convergence of Newton s and Secant iterations applied to solve fz 0 are invariants of proj ective maps Are they related Is there some condition that Bf can satisfy in an interval Q to prevent the iterations from meandering in Q forever Because f3 fquot 92 F3 Fquot another invariant is the sign of ff if it is constant it gures in Theorem 75 Otherwise monotonicity is not a projective invariant so neither are Theorem 82 nor Corollary 83 do invariant versions of them eXist Work in Progress NOT READY FOR DISTRIBUTION Page 6063 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am A4 Appendix Parabolas This Appendix is provided for students who have taken a course on Cartesian Geometry in High School but not yet in College Lemma A41 Let any nondegenerate triangle s vertices be Q R S then one parabola C passes through Q and R and is tangent there to sides QS and RS Proof In Cartesian X ywoordinates let the triangle s sides have equations aX by c 0 for QS AXByC0 for RS and eXfyg0 for QR Thende ne HHX y aX by cAX By C eX fy g2 For every choice of the constant p the equation HH X y 0 is the equation of a Conic Section CH an ellipse parabola hyperbola or pair of straight lines that passes through Q and R The set ofall such conics CM is called a Pencil ofconics Every CH passes through Q because aX by c eX fy g 0 at Q similarly CH passes through R Therefore no CH degenerates into a single point nor the empty set The differential dHHX y aX by cAdX de AX By CadX bdy 2ueX fy ge dX fdy must vanish along CH this means that if X y lies on CH because HH X y 0 then dX dy points along the tangent to CM at X y when dHHX y 0 too At Q dHHXy 0AXByCadXbdy0 0 but AXByC 0 so adX b dy 0 which means that the tangent to CM at Q is parallel to QS therefore QS is tangent to CM at Q Similarly RS is tangent to CM at R The neXt step is to select the lemma s parabola C CH from the pencil of conics by choosing the appropriate value for u For this purpose HHX y must be eXpanded HHX y aA H62X2 aB bA ZHefXy bB pf2y2 terms linear in X and y Its Discriminant aB bA 2m 4aA ue2bB m2 aB bA2 4pbe afBe Ar vanishes just when u takes the nite nonzero value In aB bA2 4be afBe AD It is nite and nonzero because no two sides of the triangle QRS are parallel so no factor of u can vanish With this choice for u the vanished discriminant implies that HHX y i other terms linear in X and y 2 terms linear in X and y so HH X y 0 is the equation of either a pair of parallel straight lines or a parabola The pair is ruled out by the intersection of its tangents QS and RS so CH must be a parabola END OF PROOF The parabola is a conveX curve because it lies entirely on one side of its every tangent as can be veri ed easily The triangle is a conveX gure too and its side QR lies inside the parabola Therefore an arc of the lemma s parabola C stays inside QRS as the arc runs from Q to R This parabola gures in the proof of Theorem 76 Work in Progress NOT READY FOR DISTRIBUTION Page 6163 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am Citations R Aharoni A Regueros J Prashker amp D Mahalel 1992 A Global Convergence Theorem Result for the OneDimensional NewtonRaphson and Secant Methods with an Application to Equilibrium Assignment 17 pp draft stamped DEC 03 1992 from the Mathematics amp Civil Engineering departments Technion Haifa Israel RG Bartle 1976The Elements ofReal Analysis 2d ed Wiley New York pp 225227 AM Bruckner amp JG Ceder 1965 DarbouX Continuity Jahresbericht aler Deutschen MathemiVerein 67 I Abt ss 93117 SD Conte amp C de Boor 1980 Elementary Numerical Analysis anAlgorithmic Approach 3d ed McGrawHill New York pp 6271 G Dahlquist A Bjquotrck amp N Anderson 1974 Numerical Analysis PrenticeiHall New Jersey pp 227232 R J Fateman 1977 An Algorithm for Deciding the Convergence of the Rational Iteration Xn1 fXIQ ACM Trans Math Software 3 pp 2727278 NJ Higham 2002Accuracy analStability ofNumericalAlgorithms 2d ed Soc Indust amp Appl Math Philadelphia A S Householder 1970 The Numerical Treatment of a Single Nonlinear Equation McGrawHill New York XC Huang 1992 From Intermediate Value Theorem to Chaos Mathematics Magazine 65 2 April 1992 pp 91103 W Kahan 1979 N0 Period Two Implies Convergence or Why Use Tangents when Secants Will Do 69 pp Memorandum No UCBERL M7961 10 Oct 1979 Electronics Research Lab Univ of Calif Berkeley CA 94720 Out of print W Kahan 1979quot Personal Calculator Has Key to Solve Any Equation fX 0 Hewlett7 PackardJournal 30 12 Dec 1979 pp 20726 The calculator was the hp34C A scanned copy is at httpwwwcsberkeleyedukaahanMathl28SOLVEkeypdf RE Martin 1977 Printing Financial Calculator Sets New Standards for Accuracy and Capability HewlettiPackaral Journal 29 2 Oct 1977 pp 22728 The calculator was the hp92 PJ McClellan 1987 An Equation Solver for a Handheld Calculator H ewlettiPackaral Journal 38 8 Aug 1987 pp 30734 There were two calculators the hp18C and 28C Work in Progress NOT READY FOR DISTRIBUTION Page 6263 Math 128AB Lecture Notes on Real RootFinding June 30 2009 637 am M Misiurewicz 1997 Remarks on Sharkovsky s Theorem inAmer Math Monthly 104 9 Nov 1997 pp 8467 AM Ostrowski 1960 Solution oquuations and Systems oquuations and 2nd ed 1966 and 3rd ed 1973 Academic Press New York ch 3 WH Press SA Teukolsky WT Vetterling amp BP Flannery 1994 Numerical Recipes in F ortran 7 the Art of Scienti c Computing 2d ed corrected Cambridge Univ Press p 351 Y Saad 1974 Shifts of Origin for the QR Algorithm in Information Processing 74 7 Proc IFIP Congress ofAug 5 10 1974 in Stockholm pp 527531 North Holland Publ Amsterdam AN Sharkovsky 1964 Coexistence of cycles of a continuous mapping of the line into itself Russian with English summary Ukrain Math Zh 16 no 1 pp 61771 Math Rev 28 1964 3121 Translated in Internat l J l Bifurc Chaos Appl Sci Engrg 5 1995 pp 12631273 AN Sharkovsky 1965 On cycles and the structure of a continuous mapping Russian Ukrain Math Zh 17 no 3 pp 1047111 Math Rev 32 1966 4213 J Stoer amp R Bulirsch 1980 Introduction to Numerical Analysis translated from the German versions of 1972 and 1976 by R Bartels W Gautschi amp C Witzgall SpringeriVerlag New York pp 274P277 J F Traub 1964 Iterative Methods for the Solution of Equations PrenticeHall New Jersey HW Tumbull 1952 Theory oquuations 5th ed Oliver amp Boyd Edinburgh pp 103107 MV Wilkes D Wheeler amp S Gill 1951 The Preparation of Programs for an Electronic Digital Computer Addison Wesley Cambridge Mass pp 8485 and 130132 The computer was the EDSAC at Cambridge England JH Wilkinson 1965 The Algebraic Eigenproblem Oxford Univ Press p 480 Work in Progress NOT READY FOR DISTRIBUTION Page 6363 Lecture 1 Introduction to MATLAB Ryan Hynd ryanhynd mathberkeleyedu Department of Mathematics University of California Berkeley Math 98 Introduction to MATLAB programming Announcements Announcements syllabusclass web page httpmathberkeleyedu ryanhyndmath98html class routine I present several topics for part of the class time you do exercises in the remaining time purchase student version of MATLAB x 100 purchase text book httpdecagonpresscomCSSVersionindexphp HW1 Posted It39s due in one week HWs will be assigned and due on Thursday so our weekends are free Calculating in the command window Some basics 6 When you launch MATLAB you see five basic windows Command window command history current directory workspace and help browser 0 We shall see that the command window is a programmable calculator This is where we will perform basic calculations debug programs etc 0 When we start writing programs the current directory window will tell us what folders we have open to use 0 We won39t use command history and workspace You should think of the command window as a calculator being able to do any arithmetic operation and much more Using the help browser Always consult the help browser 0 MATLAB has thousands of built in functions and various capabilities Therefore you should always search around for what has been done before writing a program 0 Searching in the Help window is usually the best way perhaps followed by internet searches 0 lfyou know the name of a MATLAB function but are not sure what it does type in the command windown command help nameoffunction What does MATLAB have built in as far as a quotsquare root function MAT LAB Arrays Perspective 0 Arrays are important in many programming languages c c java 0 However MATLAB was designed for manipulating arrays the are the fundamental data structure in MATLAB We can apply sqrt to a simple 2 X 2 array of numbers such as 12 2 4 or a 1 x 100 array such as linspace01 How will we use arrays 0 Saving a sequence of values of runtime variable sequence of iterates of Newton39s method 0 Representing functions especially when solving ODE 0 Matrix Algebra MAT LAB functions To define x 22 in MATLAB we can use the command 1 zm2 evaluate f2 f73 etc in the command line vector functions Another way is to define a function of a vector representing a segment of domain of 1 For instance define z Iinspace711200 We can the set 9 m z which is approximate z gt gt 2 on 711 MATLAB plotting Perspective 0 When doing Numerical calculations plotting is often the best way to present results 0 The applies to the programming assignments in Math 128A and the assignments of this course Plotting i gt on 711 o ezplotf 11 o plotXg Exercises Compute 2 3 2 x 3 23 23 213 10g2 3 cos3 sin2 in the command line 2 Plot 25 gt gt e on 2 2 3 Plot 25 gt gt iti on 11 4 Use the help command to figure out how to display numbers to many digits of accuracy Do this for 7139 Math 128 Notes March 9 2002 635 am An Ordinary Differential Equation in IntervalArithmetic Let points u in the plane be employed as the diagonally opposite comers of a rectangular interval u 5 when u S E componentwise In other words point u x y belongs to interval u E with u x y and x y just when respective components satisfy x S x S x and y S y S y Such rectangular intervals with edges parallel to coordinate axes shall be called coffins their restricted orientation renders them disadvantageous for the purpose of bounding the propagation of uncertainty into the solutions of most differential equations as this note will illustrate by an example The differential equation of uniform counterclockwise rotary motion of the plane about its origin is dudt uJ in which J 01 1 This rotates any point u at time t 0 to uexptJ at time t which is also the angle of rotation in fact exptJ Icost J sint Rectangular intervals should get rotated the same way except that then their edges would not always lie parallel to the coordinate axes If we insist that pointsets rotated by the differential equation be circumscribed always by coffins of the kind in the first paragraph above the coffins will grow too fast How much too fast Suppose that the initial point u is uncertain to the extent that only a coffin u E corltaining it is known The differential equation moves points u that lie initially in an interval u u into a circumscribing coffin whose comers can be seen to satisfy dudt minuJ and dudt maxuJ minmaximized over u in g E Spelled out in detail these equations say dxdt min7y 7 dydt minx x dxdt max7y 7y dydt maxx x The size of the coffin g E is determined by Gig whose components satisfy dx7xdt 731 and hod xix ie du7udt HimM elementwise The last differential equation s solution transforms neg at time t 0 to uiuexptJ in which expt J l Icosht J sinht grows exponentially Thus although the differential equation is not uncertain at all its effect upon its initial point s uncertainty is changed from mere rotation to explosive exponential growth by the insistence that uncertainty be bounded by coffins instead of rectangles with arbitrary orientations Adding uncertainty to the differential equation can only exacerbate the growth This example is typical of the way naive interval arithmetic when used to bound the propagation of uncertainty can cause it to grow exponentially too fast The exponentially excessive growt can be somewhat damped sometimes by allowing parallelograms instead of coffins and general skew parallelepipeds instead of coffinshaped rectangular parallelepipeds in higherdimensional spaces The suppression of exponentially excessive growth generally requires relaxation of the restrictions upon shape as well as orientation Among methods known so far growth is least excessive typically less than linear when uncertainty is bounded by ellipsoids rather than by parallelepipeds but that s a story for another day see http www cs berkeley eduwkahanMath128Ellipsoi pdf Prof W Kahan Page 11 Tlrn39a rln nm an xxvna quotAnnl ML 17mm Al Anlmr A n A A Lecture on Real RootFinding Prof W Kahan Univ of Calif Berkeley 6 April 1998 Extracted from Lecture Notes on Real RootFinding httpwwwcsberkeleyeduwkahanMath128RealRootsps and pdf http www csberkeleyeduwkahanMath l 28SolveKeypdf Given a real equation fz O solve it for its real root z a zero of function f typically by iteration Xn1 UXn gt z Uz a xed point as n gt 00 How should the Iterating Function U be constructed Common iterations Newton s Xn1 NXn where NX X fXf X Secant Xn1 SXn Xn1 Where SX y X fXXy fX fy fX tangent NV V Newton s Xn1 NXn where NV V fVf V fX secant Su w Secant Xn1 SXn Xn1 where Su w u fuu w fu fw Before a real root z of an equation fz O can be found six questions demand attention 1 Which equation In nitely many equations some far easier to solve than others have the same root z 2 What method Usually an iterative method must be chosen there are in nitely many of them too 3 Where should the search for a root begin A global theory of the iteration s convergence helps compensate for a vague guess at z 4 How fast can the iteration be expected to converge A local theory helps here Convergence much slower than expected is ominous 5 When should iteration be stopped Erroranalysis helps here And the possibility that no z exists may have to be faced 6 How will the root s accuracy be assessed Erroranalysis is indispensable here and it can be done in more than one way These questions lead to others Questions Is some simple Combinatorial Homeomorphically invariant condition both Necessary and Suf cient for convergence of X gt UX Yes 5 Is that condition relevant to the design of root nding software Yes 6 Do other iterations X gt UX besides Newton39s exist Not really 3 Must there be a neighborhood of z within which Newton39s iteration converges if f39 X and X fXf39 X are both continuous Maybe Not 7 Do useful conditions less restrictive than ConveXity suf ce Globally for the convergence of Newton39s and Secant iteration Yes 8 Why are these less restrictive conditions not Projective Invariants as are ConveXity and the convergence of Newton39s and Secant iterations I don t know A3 ls slow convergence to a multiple root worth accelerating Probably not 7 Can slow convergence from afar be accelerated with no risk of overshooting and thus losing the desired root In certain common cases Yes 10 When should iteration be stopped Not for the reasons usually cited 6 Which of Newton39s and Secant iterations converges faster Depends 7 Which of Newton39s and Secant iterations converges from a wider range of initial guesses at z Secant unless z has even multiplicity 9 Why Use Tangems When Secanls Will Do 197939 SOLVE key on HP Calculators Do other iterations X gtUX besides Newton39s eXist Not really 3 Thesis 31 Newton s Iteration is Ubiquitous Suppose that U is differentiable throughout some neighborhood Q of a root z of the given equation fz 0 If the iteration Xn1 UXn converges in Q to z from every starting point X0 in Q then this iteration is Newton s iteration applied to some equation gz 0 equivalent on Q to the given equation in other words UX X gXg39X and gX gt 0 in 9 only as X gt z Defense gX ieXp l dXX UX with a constant of integration that may jump when X passes from one side of z to the other re ecting the fact that U is unchanged when gX is replaced by say 3gX for all X on one side of z ls some simple Combinatorial Homeomorphically invariant condition both Necessary and Suf cient for convergence of X gt UX Yes 5 Theorem 51 Sharkovsky s NoSwap Theorem 1964 5 Suppose U maps aclosed interval 52 continuously into itself then the iteration Xn1 UXn converges to some Xed point z Uz from every X0 in 52 if and only if these four conditions each of which implies all the others hold throughout Q NoSwap Condition U exchanges no two distinct points of Q in other words if UUX X in Q then UX X too No Separation Condition No X in Q can lie strictly between UX and UUX in other words if X UXX UUX S 0 then UX X N0 Crossover Condition If UX S y s X s Uy in Q then UX y X Uy OneSided Condition If X1 UXo 7 X0 in Q then all subsequent iterates Xn1 UXn also differ from X0 and lie on the same side of it as does X1 This last condition s violation can be detected by a program and thus serves to initiate activity designed to prevent a raw iteration s nonconvergence Example 54 Suppose f is a rational function with simple real interlacing zeros and poles one of them a pole at 00 An instance is fX pXp39X where pX is apolynomial all of whose zeros are real Another instance is fX detXI AdetXI A Hi X ziHj X 6 in which A is an hermitian matrix A is obtained from it by striking off its last row and column and the I s are identity matrices the zeros zi lie among the eigenvalues of A and the poles oj are the distinct eigenvalues of A that are not also eigenvalues of A That they interlace ie z0lt61ltz1lt62ltz2ltlt6KltzK is a well known theorem attributed to Cauchy We do not know the zeros zi but like Y Saad 1974 propose to compute them by running Newton s iteration Xn1 Xn fXnPXn Does it converge If so to what These are thorny questions considering how spiky is the graph of f and yet Newton s iteration can be proved to converge to some zero zi from every real starting value except a countable nowhere dense set of starting values from which the iteration must converge accidentally after nitely many steps to a pole oj Do useful conditions less restrictive than ConveXity suf ce Globally for the convergence of Newton39s and Secant iteration Yes 8 Corollary 83 A Weak Convexity Condition Suppose f gh is a differentiable difference between two conveX functions one non decreasing and the other nonincreasing throughout a closed interval 52 Then Newton s iteration Xn1 Xn fXnf39Xn started from any X0 in Q either converges in Q to the zero z of f or leaves 52 the iteration cannot meander in Q endlessly Application Calculation of Internal Rate of Return in Financial Calculators Which of Newton39s and Secant iterations converges from a Wider range of initial guesses at z Secant unless z has even multiplicity 9 Theorem 92 Suppose P and N are continuous throughout aclosed nite interval 52 strictly inside which f does not vanish Without reversing sign there too If Newton s iteration converges in Q from every initial X0 in Q then it converges to the sole zero z of f in Q and Secant iteration also converges in Q to z from every two starting points X0 and X1 in Q Alas the proof is very long Lemma 91 An Intermediate Value If Su W u fuu Wfu fw does not lie between u and W ie if fufw gt 0 and if P X is nite throughout u S X S W then at some V strictly between u and W either NV V fVf39V Su W or fV f39V 0 fX secant WgtX NV u V S u Proof Let 0X fXsX Where s SuV Since s does not lie between u and W gt u 0X and 039X are nite throughout u S X S W And 0u 0W because of how s was de ned so Rolle s theorem implies 039V 0 at some V strictly between u and W Then 039V PVsV fVsV2 0 implies that this V is Where either NV s or fV f V 0 ls slow convergence to a multiple root worth accelerating Probably not 7 Theorem 75 Suppose PX increases as X moves away from z through some neighborhood Q on one side of a zero z of f Then 0 lt N X zXz lt l and so Newton s iteration converges monotonically to z from every initial X0 in 9 Similarly 0 lt SXy zXz lt l for all X and y in Q and so Secant iteration converges monotonically to z from every initial X0 and X1 in 9 Theorem 76 Under the conveXity hypothesis of Theorem 75 the iterates Xn may converge to z arbitrarily slowly though monotonically but fXn tends monotonically to 0 at least so fast that in 2H roan 2 s fx02 x0 zx0 x1 There is a special but common case that can be accelerated modestly without overshoot De ne NX X fXf X Newton s iteration function and WX X 2 fXf X DoubledNewton s iteration function This WX can be iterated with no harm from overshoot in the following circumstances Theorem 101 Suppose that fy 0 2 fy at the lefthand end of a nite interval y S X S X0 throughout which f39 is a positive nondecreasing function also assume fXo gt 0 Then in that interval l The equation fz 0 has just one root z 2 y and WX lt NX when X gt z 2 NX 2 z for all X gt y and then WX gt y unless WX y Z 3 If X gt z then NWX S NX with equality only when fquot E 0 Semiseparable Linear Systems for n 5 61 2 V1 3 V1 4 V1 5 V1 We solve A56 b 06172 62 MW MW 5 V2 04173 04273 63 4 V3 5 V3 04174 04274 04374 5 V4 06175 04275 04375 04475 65 Left Givens rotation on rst two rows gtk gtk O O O 07273 07274 07275 AH Wm WM 53 04374 06375 MM H4V2 H4V3 54 04475 MM H5V2 H5V3 MM 55 Also Givens rotation on right hand side Right Givens rotation on rst two columns gtk 0 0 0 0 52 07273 07274 07275 4 391 392 63 04374 04375 491 492 H4V3 54 04475 591 592 H5V3 MM 55 Also Givens rotation on unknown vector 13 We can now solve for the rst component in rotated unknown vector De ne U3 M3 MM and b3 last 71 2 components of b We now have 52 07273 07274 07275 392 53 06374 04375 a lt gt 492 MW 54 06475 53 9113 H592 H5V3 H5V4 65 for some constant p This is a system similar to the original one and can be solved similarly Angle subtended at the eye by neighboring stars W Kahan Spherical polar coordinates R H A convert to Cartesian X Y Z R radius H angle of Elevation above horizon A angle of Azimuth from compass North Rgt0 712SHS712 71SAS71 X distance North Y distance West Z distance Up XRcosHcosA YRcosHsinA ZRsinH Neighboring stars Put one at X Y Z lt7gt R H A for any arbitrary R gt 0 XRcos H cos A YRcos H sinA ZRsinH another at XX Yy Zz lt7gt R Hh Aa XX RcosHhcosAa Yy RcosHhsinAa Zz RsinHh The angle v subtended at the eye by the stars satis es 2Rsin v22 x2 y2 z2 so 2sin v22 cosHhcosAa 7 cos H cos A2 cosHhsinAa 7 cos H sinA 2 sinHh 7 sin H 2 2 7 2sinHhsin H 2cosHhcosHcosa 21 7 cosh 21 7 cos acosHhcosH 4sin2h2 4sinza2cosHhcosH Conclusion Of the two azimuths A and Aa only their difference a mod 271 matters then v arccos sinHhsinH cosHhcosHcosa 2arcsin sin2h2 sin2a2cosHhcosH The rst formula malfunctions at small subtended angles The second is numerically ne at small angles v but loses almost half the precision carried if v cs 71 though no cancellation occurs Example a 179999 h 52 H 726 carrying 10 Sig dec this v 180 instead of 179999101 We can do better Since 112 S H S 112 and 112 S Hh S 112 0 S cosHhcosH cos2Hh cosh 2 coszHh2 7 sin2h2 7 coszh2 sinzHh2 Therefore tan2v2 7 sin2v217 sin2v2 sin2h2 sin2a2cosHhcosH coszh2 7 sin2a2cosHhcosH sin2h2cosza2 sin2a2coszHh2 coszh2cosza2 sin2a2sin2Hh2 Hence follows a numerically accurate and ef cient formula v 2arctan thl ta TH ta 1 THl ta1th wherein ta tan2a2 th tan2h2 and TH tan2Hh2 Only if h 0 and H iTE2 does TH oo Therefore presubstitute Ooo 0 and 0000 lTH to handle all special cases Actually in nite oatingpoint values of tan can t arise unless angles are in degrees The same formula works with astronomical Declination instead of Elevation and Right Ascension instead of Azimuth A similar formula would work for distance over the surface of the Earth using Latitude instead of Elevation and Longitude instead of Azimuth if the Earth were perfectly spherical September 17 2004 101 pm Tlm39a rln nm an xxvna quotAnnl ML 17mm AA Anlmr A n A Math 128 Taylor Series for Differential Equation Solvers Local Error The Taylor Series of an in nitely differentiable vectorvalued function yt of a scalar t is yt yO ty390 t2yquot02 t3yquot3906 t4yquotquot024 These derivatives can be computed for a solution of the Initial Value Problem yO yo is given and y39t fyt for all t 2 0 from the derivatives of the given vectorvalued function fy In fact from the Chain Rule yl f yquot ylquot fquot yllquot flquot 3fquot flfquotff I Then each derivative of yt can be evaluated at t 0 by evaluating each derivative of fy at y yO yo Note that the higher derivatives of f are symmetric multilinear operators for instance fquoty is a symmetric bilinear operator fquot uv fquot vu is avector valued linear function of each vector u and v separately Because linear operators do not necessarily commute fquot f39ff f39fquot ff in general though they are equal if y s vector space is one dimensional If y s vector space is Ndimensional then y and f can be represented by column vectors each with N components f39 by amatrix with N2 components fquot by an array with N3 components of which at most N1N22 can be distinct f quot39 by an array of N4 components Higher derivatives arrays become huge when N is large Normally the Taylor series would be used to obtain yth from yt for any suf ciently small stepsize h yth yt hy39t h2yquott2 h3yquot39t6 h4yquot quot024 in which the derivatives dythdh etc are computed at h 0 from the same formulas yv f a yquot fvf ym fuff fv2f yuu fmfff 3fvvfvff fvfvvff fv3f as before except that now fy and its derivatives are computed at y yt A similar process generates a formal series for any onestep numerical method s formula that advances an approximate solution y yt through one step h to Y Yth but now we differentiate with respect to h instead of t to get Yth y hY39 h2Yquot2 h3Yquot396 h4Yquotquot24 in which y yt and the derivatives Y39 etc are derivatives of Yth with respect to h evaluated at h 0 These derivatives depend upon the numerical method s formula For example take the implicit Trapezoidal Rule Y y hfy fY2 Now at th Y39 fY t fY2 t h39f39Y39Y392 Yquot fquotYYVYV fVYYquot Y39 3fquotYY39Y392 f39YYquot2 h fquot39YY39Y39Y39 2 etc Here every instance of Y or its derivatives is evaluated at th For the Taylor series we set h 0 in the foregoing formulas and substitute for derivatives of Y in succession to get Ylf Yquotflf YVVV3fquotffV2f2 in which now the derivatives of Y are evaluated at t and f and its derivatives are evaluated at yt Hence the computed approximation Yth yt hf h2f39f2 h3 3fquotf f392 f12 can be compared with the local solution yth yt hf h2f39f2 h3 fquotf f392 f6 to reveal the 2ndorder Trapezoidal Rule s local truncation error yth 7 Yth h3 f392 7 fquot ff12 Prof W Kahan page 12 August 25 2004 553 am Math 128 Taylor Series for Differential Equation Solvers Local Error Another example is the implicit Midpoint Rule Y y h fyY 2 At th Y39 fyY2 hf yY2Y392 Yquot f39yY2Y39 h fquotyY2Y39Y394 f yY2Yquot2 Y39 3fquotyY2Y39Y394 3f39yY2Yquot2 h f39 etc Setting h 0 and substituting for derivatives of Y in succession yields Ylf Yquotflf YVVV3fquotf6fV2f4 in which now the derivatives of Y are evaluated at t and f and its derivatives are evaluated at yt Hence by comparison with the Taylor series for yth we nd the 2ndorder Midpoint Rule 5 local truncation error to be yth 7Yth h3 fquotf 7 2f392 f24 The foregoing manipulations are tedious enough that only a computerized algebra system should perform them However programming Maple or M athematica or Derive to perform them has been more di icult than it should be At least some of the di iculty arises I think because these languages disallow declaration of a variable 5 linguistic Type Besides the derivatives non commutative partially associative multiplication their multilinear symmetry has to be taken into account in order to achieve correct simpli cations of expressions like the fth derivative yv fllquotffff 6flllflfff 4fquotfquot 3fquot flfquotlfff 3flfquotflff fl2fquot I I hope I ve gotten it right The foregoing formal because their convergence is undetermined Taylor series expansions do not reveal an important property possessed by the computed solution Y of the Initial Value Problem when it is obtained from a Re exive formula which is a formula in which Yth and yt are merely swapped when the sign of h is reversed The Midpoint and Trapezoidal Rules formulas are re exive The composition of Th steps of a re exive formula to approximate the true solution yT at a xed T but using any suf ciently small stepsize h so long as Th is an integer can be proved to produce a computed approximation YT that depends upon h and di ers from yT by an error yT 7 YT c2h2 c4h4 c6h6 whose formal expansion in powers of h contains only even powers The expansion need not converge for any h gt 0 instead it is an Asymptotic expansion whose behavior is conveyed by its rst few terms ever more accurately as h gt 0 Recomputations of YT with diminishing stepsizes h h2 h4 h8 provides a sequence to which Richardson sExtrapolation can be applied as in Romberg integration to achieve what amounts to higherorder convergence Prof W Kahan page 22 August 25 2004 553 am General Semiseparable Representation Block 4 x 4 example D1 U1V2H U1W2V3H U1W2W3V4H A P262 D2 U2V3H U2W3v4H P3R2Q 133 D3 U3V4H P4R3R2Q P4R3Q p4Q I D4 In general Az j U Wi1Wi2 Wj1VH7 if j gt1 Lowrank blocks Look at the sub matrix H2 U1W2V3H U1W2W3V4H U1W2 VH W3vH U2V3H U2W3V4H U2 3 4 More generally every upper off diagonal Hankel block in this representation has the form U1W2W3W U2W3 W 1 Vigil W 1V Ii2 Wi1Wi2 Wn 1Vr Uz The low rank of every off diagonal Hankel block can be cap tured in this representation Integral Equations on a curve Discretizations of integral equations of the form 1 13t I O ktsxsds bt where kts is smooth away from the diagonal leads to ma trices with short representations Example kernels 1 lh MV where 1 and y are points on a curve log Ila yll Fine structure of offdiagonal ranks 0 en curve Closed curve For sequential elimination only Hankel block ranks matter Run time in S COHCS SIZG mizki 256 512 1024 2048 4096 8192 16 004 008 016 036 067 134 32 008 019 042 083 166 344 64 018 048 112 236 48 987 128 015 101 273 609 1291 269 GEPP 015 072 457 3046 22257 349946 Final Statements Representations are purely algebraic Includes both dense and sparse matrices Great potential for pre conditioners Interesting relationships with nested dissection and other sparse matrix techniques Useful applications in a number of other important areas File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 119 0 Abstract and Introduction Given a real function fx about which we know how to compute its value we seek one of its Zeros z a root of the equation fz 0 starting from some rst guesses This z should be the limit of a sequence of presumably improving guesses Xn1 H fxn x1171 x1172 computed for n 0 l 2 3 in turn by an Iterating Function Hf de ned below It will be compared with a few others and its application to an eigenproblem will be analyzed in detail Several assertions equations and inequalities will be left for diligent readers to recon rm Some iterating functions are derived in 1 from zeros of interpolating functions that match fx at two or three points If a derivative or two of f can be computed too con uent versions of those iterating functions become available Newton s and Halley s are two of several offered in 2 How should an iterating function be chosen from the plethora available Considerations relevant to that choice are explained in 3 Hyperbolic iterating functions seem apposite when f has poles sprinkled among its sought zeros The close relation demonstrated in 4 between these iterating functions and Mobius bilinear rational functions helps reveal how a program s languid convergence may be misdiagnosed This is why 5 chooses Bi Confluent Hyperbolic Iteration to converge to a zero from both sides thereby combatting both languor and roundoff except as 6 observes for slow convergence to a multiple zero A zero Hidden by a pole too nearby in 7 can be found quickly provided it is rst straddled A practical application of the foregoing theory is the solution of the Spectral equation of an eigenvalue problem updated by a rankl perturbation 8 11 exhibit the details including an erroranalysis that assesses unavoidable uncertainties due to roundoff Another application in 12 is another eigenproblem Contents Page 0 Abstract and Introduction 1 l Iterating Functions Sf Hf and Mf 2 Con uent Versions 3 Choosing an Iterating Function Table l lterations FiguresofMerit u 4 Mobius Maps Commute with Hyperbolic Iterations 5 Infantile Greed and Bi or biCon uent Hyperbolic Iteration 6 Linear Convergence to a Multiple Zero Table 2 lterations Linear Convergence Ratio p to a Zero of Multiplicity In 7 Languid Convergence to a Hidden Zero Graph exhibiting a Hidden Zero 8 Example A Spectral Equation also called a Secular Equation 9 How Straddling is Maintained Table of SignPatterns 10 Where to Find the Initial Straddle Case B Between Case 0 Outside l 1 Error Analysis of the Roots of the Spectral Equation fog 0 Staircase Graph of fx De ation 12 Another Example a Tridiagonal Eigenvalue Problem l3 Further Reading WORK IN PROGRESS NOT YET READY FOR DISTRIBUTION HHHHHHHHHi H oooumAbwwMHHWOOMOOONUIJsmmN Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 219 1 Iterating Functions Sf Hf and Mf If a function f is smooth enough it may be approximated well by the rst several terms of its Newton Interpolating Polynomial Series 1 00 7 fXo Horim x71 x7x1fiixo x71 x72 kwfill J in which a nonstandard notation is being used for Divided Di erences m y 7 fx7fy x7y vav x y 7 w x 7 flX y w7y fill The rst two terms linear polynomial vanishes when X S fX0 X71 X0 7 fX0flX0 X71 The iteration Xn1 S fXn X1171 is called Secant Iteration and if it converges to a Simple zero 2 of f where fz 0 7E f39z converges with Order 1 2 m 1618 or rarely faster This means that if z X71 and X0 agree in suf ciently many leading decimal digits Xn agrees with z in at least an additional number of decimal digits that grows roughly proportional to 14r 2n until roundoff in the computation of f etc interferes with convergence If unlike a polynomial f resembles a rational function with poles scattered among its zeros f may be better approximated by the rst few levels of an Interpolating Continued Fraction 1 00 7 fX x7x fX x x7x fx x x x7x f 0 0 0 71 71 0 71 2 2 in which a nonstandard notation is being used for nonstandard Reciprocal Divided Differences fix y 7 x7y 1 00 7 y fW x y 7 x7y fiw x 7 fw y W The rst two levels continued fraction interpolates matches fX at X X0 X71 and X72 YfX x0 x71 x72 7 foo x7xofxo x71 XX71fxoa x71 x72 7 fXo x7xofixox1 1 7 Xixii fllXoX71X72flXoX72 This interpolant YfX X0 X71 X72 vanishes when X H fX0 X71 X72 determined thus Hfu v w 7 u 7 fu39fu v moWm v w 1 fufu v w 7 u 7 fu u v 7 fV39fllu v wflv w The iteration Xn1 H fXn X1171 X1172 is called Hyperbolic Iteration and if it converges to a simple zero 2 of f converges with Order q m 1839 the positive root of p3 p2q 1 or rarely faster This follows from an important and tediously veri able identity Iiiiyygtltzyg Rm X y vav x y 39fllx y z 7 m y WWW x y z w7zigtlt7ziy7z Nov X39flx y 7 fooNW x y whose 11m1t as w gt z X gt z and y gt z implies that as 1terates Xn gt 2 they satlsfy Xn ZXn Z39Xn7l Z39Xn72 Z gt RfZ 2 Z fquotZ24 1quotZ39J quotZ6 f39Z2 whence a linear recurrence eXplains why 7logan7zl grows at least as fast as qn as n gt 00 Both iterating functions have graphical interpretations Sf uses a straight line a secant that cuts the graph of f twice to appr0Ximate it Hf uses an hyperbola with vertical and horizontal asymptotes that cuts the graph of f thrice to appr0Ximate it Its hyperbola resembles a straight line ever more widely as f approaches 00 Caution fw X y fw y X i fX w y Both our iterations can be applied with a compleX analytic function fX of a compleX variable X to nd a compleX zero 2 However if fX is real for all real X then our iterations require compleX initial guesses lest they never converge to a nonreal zero 2 This requirement can be circumvented by David Muller s iterating function v M fw X y that is the root v nearest Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 319 w of the quadratic equation 0 fw v7wflw X v7Xfllw X y M fwXy w 7 2fW fiwxw7xfiiwxy i V Novxvv7xfiiwm 2 7 4fWfllwXy Graphically M f uses a parabola that cuts the graph of f thrice to approximate it Iterating functions M f and Hf have the same Order q m 1839 of convergence to a simple zero of f but Hf and Sf cannot be expected to converge to a double zero as fast as M f can Note Like fl and fll so do Sf Hf and Rf disregard their arguments order 2 Con uent Versions If computing fX and also its derivative f39X simultaneously costs not much more time than computing fX alone con uent versions of the foregoing iterating functions become worth considering Some of these con uent versions arguments coincide some divided differences are replaced by derivatives thus flX X f39X and fllX X X fquotX2 The con uent version of Secant Iteration is Newton 3 Iteration Xn1 S fXn Xn Xn 7 fXnf39Xn which if convergent to a simple zero converges with Order 2 or rarely faster Hyperbolic Iteration has five con uent versions Xn1 HfXn Xn X1171 converges with Order 1 V2 m 2414 Halley s Iteration Xn1 Hfx Kn Kn armsmm 7 fx fquotxHf39x if converges with Order 3 three more versions introduced below converge with order 2 2 3 Choosing an Iterating Function How shall we decide which version to choose The decision must weigh three considerations listed here in order of difficulty Cost Vulnerability to Roundoff and Appositeness Cost is easiest to assess when the accuracy desired is so high as will entail a large number n of iterations of Order 0 gt 1 They will garner roughly 0 correct significant digits at the cost of computing time T nQ where Q is the cost of one iteration Q is assumed the same for all iterations as is the case when oatingpoint arithmetic affords just one precision in which case the iterates correct digits grow with time T roughly like 01 CT Otherwise if arithmetic s precision is variable the last iteration is likely to cost more than all the others taken together Thus u log0Q is a rough F igure of M wit the bigger the better for an iterating function of Order 0 and cost Q per iteration A Ostrowski proposed an assay like u of an iteration s merit in the middle of the 20th century u is very rough because it disregards properties of f like values of its derivatives at its zero so comparing FiguresofMerit of two iterations may be misleading if they depend upon different derivatives especially when as we often hope will happen adequate accuracy is attained with not very many iterations We shall try not to be misled when we compare a few iterating functions FiguresofMerit Let Q1 be the cost of an iterating function that requires only one new value of f per iteration let Q2 be the cost of f39 and f let Q3 be the cost of fquot f39 and f With very rare eXceptions Q1 lt Q2 lt Q3 and normally CK lt KCl Ostrowski considered polynomials f of degrees so high that CK m KCl these contribute the following table s last column Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 419 Table 1 Iterations Figures of Merit u Iteration Order Cost u u for polynomials f Secant xn Sfxn x114 1618 91 0481291 0481291 Newton s xn1 Sfxnxn 2 CZ 06931C2 03466Q1 Hyperbolic xn Hfxn xnil x1172 1839 CI 16094 Cl 0609491 l onlluent Hyperbolic Xn1 HfXn X1171 X1171 2 Q2 06931 Q2 03466 Q1 Con uent Hyperbolic Xn1 HfXn Xn X1171 2414 C2 08814 Q2 04407 C1 BiCon uent Hyperbolic to be de ned later 3 2C2 05493 Q2 02747 C1 Halley s xn1 Hfxnxn X 3 C3 10986C3 03662Q1 Were FiguresofMerit u all that mattered this table would enact a severe law of diminishing returns from iterations whose higher Order of convergence is achieved at the cost of computing derivatives For instance Newton s Iteration would be preferred over Secant Iteration only if f39 added less than 44 to the cost of computing f alone In fact u is not all that matters Vulnerability to roundoff matters too Besides limiting how accurately a zero can be computed roundoff in the computation of f complicates the decision to quit iterating The last few iterates may waste time dithering in the zero s neighborhood unless roundoff s obscuration is estimated adequately by an erroranalysis built into the computation of f Instead of an erroranalysis an incorporation of f39 accurate to a few significant digits helps an iteration to nearly minimize the magnitude of iterates dithering thus simplifying the program s decision to quit iterating Appositeness of an iterating function like S f or H f re ects how nearly its provenance accords with properties of f and affects the iteration s behavior while not yet near the sought zero For instance if f has apole near which an iterate may fall Hyperbolic and Halley s Iterations escape from the pole s neighborhood much faster than Newton s or Secant Iteration can We should consider also whether an iteration perhaps modified slightly can maintain a Straddle this is a pair of iterates not necessarily consecutive between which f reverses sign an odd number of times Newton s and Halley s Iterations require programs more complicated than the other iterations need to maintain a straddle when losing it risks losing the sought zero or converging to an undesired zero Moreover when Newton s iterates converge they almost always converge ultimately monotonically ie from one side whereas the other iterations converge from both sides of a straddle for at least about as many functions f as not When two sided convergence is predictable it simplifies criteria for quitting the iteration it shrinks nested straddles around a zero about as tightly as roundoff allows until it transgresses the last straddle The more is known about the properties of f the better can an iterating function be chosen to take advantage of those properties and also perhaps overcome their vitiation by roundoff Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 519 4 Mobius Maps Commute with Hyperbolic Iterations Four real constants a b c and d determine a Mobius Function MX aX 7 bcX d also called a Linear Fractional or Bilinear Rational function Lest it degenerate into a constant its constants must be constrained ad bc 0 Then another Mobius function W03 b d a 7 cFg turns out to be inverse to MX in the sense that WMX X and M W a These functions range and domain is best construed as the circle obtained from the real aXis after it has been closed by its incorporation of a single point at in nity thus MGdc oo Woo idc Moo 610 and Wac 00 Any three distinct points p q r on this circle can be mapped by a suitably constructed Mobius map M to any other distinct three points 11 Mp K M q p M r resp by solving a Bilinear Cross Ratio Equation x7pq7rMx7pKin MoroK7 px7rq7p for MX after deleting every factor if any that contains 00 Except across its pole c MX is strictly monotonic increasing if ad bc gt 0 else decreasing because the divided difference MlX y ad bccXdcyd has the same sign as 6151 bc has The set of hyperbolas and straight lines that interpolate fX to determine Hyperbolic Iterating Functions Hf is a set mapped to itself by MX Consequently these Commute thus Given functions fX and M X determine M s inverse W03 and define 103 wfW for any constant watt 0 Derive H n 5 11 from divided differences of I just as H fw X y is derived from divided differences of f Then H M W M XLM YD M Hf W X Y This means changing coordinates from X to a MX and from fX to 103 wfW amounts to a Mobius Map that connects X fX and 63 03 so tightly that Hyperbolic iterates Xn1 H fXn starting from X0 and converging to the zero 2 of f are images of analogous Hyperbolic iterates n H n MXn1 starting from F50 MX0 and converging to the zero Q Mz of I Both sequences of iterates XI and 5n MXIQ must converge to their respective destinations at the same speed in the long run In the long run we are all dead 7 JM Keynes 1883 1946 Longrunning computation is what we wish to avoid We prefer that iteration enter promptly a regime of fast Superlinear Order gt 1 convergence that we will terminate soon as soon as iterates approach the sought zero 2 about as closely as roundoff allows Easier said than done From an illfated start an iteration can enter instead a regime of languid convergence creeping through too many iterates before entering an ultimate regime of fast convergence If a program could detect its embrace by languor the program could attempt an escape Detection is hindered by an involuntary kind of conspiracy between roundoff and Mobius maps that can squeeze or stretch any interval through which many Hyperbolic iterates travel converging slowly to a zero outside it In one coordinate system say 63 03 these iterates are separated widely enough for some regularity in their successive differences to suggest an eXtrapolation that may escape from languor In another X fX coordinate system the iterates appear as crowded as if they were about as near their destination as roundoff allows though actually the destination is not that near at all We desire that our rootfinding program cope well with these phenomena Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 619 5 Infantile Greed and Bi or bi Con uent Hyperbolic Iteration We want it all on the cheap and soon Sometimes we get it We wish to compute each zero of f at least about as accurately as the data and the arithmetic s precision deserve but without performing the extra computation of erroranalyses to estimate how much accuracy is deserved And we wish not to wait while iterates grossly contaminated by roundoff dither Consequently a rootfinding program purporting to grant our wishes cannot rely upon an iterative method that will ultimately converge monotonically from one side lest short steps confound the program It cannot distinguish numerous short steps taken to converge very slowly to an unknown destination far away from short steps taken as iterates drift through a region around a zero where lfxl is smaller than its computed value because of roundoff Without computing amodest overestimate like the ErrorAnalysis in 11 of 8 s example of roundoff s contribution to f the only way to be about as sure as roundoff allows of a zero s location is to straddle it about as tightly as roundoff allows To this end a rootfinder s iterative method must after it nds a straddle maintain it through a nested sequence of tighter straddles that shrink onto a zero inside them until roundoff thwarts the method This kind of method is the kind we will prefer However what thwarts means here entails subtleties Roundoff cannot thwart an iterative method if it samples only computed values of f not its derivative nor any other information about it until the straddle s ends coincide or are adjacent oatingpoint numbers To reach this state if its precision overreaches the accuracy the data deserve often takes longer than we wish to wait We will prefer some other method Roundoff can thwart an iterative method that would in the absence of roundoff always shrink a straddle and ultimately shrink it fast superlinearly Roundoff thwarts the method when as computed an iterationstep would lose the straddle or not shrink it Then is the time to quit the method It must depend upon information like a computable derivative f39 andor assumptions about the smoothness and monotonicity of f beyond merely how to compute its values And the method should not so exaggerate the effects of roundoff that iteration is stopped before the sought zero has been located at least about as accurately as it deserves Such a method can be expected to succeed only under favorable circumstances Here are two such methods Bi and bi Con uent Hyperbolic Iterations apply the iteration function computed as Hm m Hfxx y x 7 fX f39X 7 fX39fllXXYflxy to both sides of a straddle Suppose u and v straddle f s sought simple zero 2 but no other zero nor singularity nor zero of f39 Therefore f is strictly monotonic within the straddle so fufv lt 0 and f39u fluv and f39v all have the same nonzero sign Computing both t Hfuu v and w Hfu vv will cost very little more than computing either and both will fall into the straddle between u and v Appropriate replacements of u andor v by t andor w will maintain and shrink the straddle How shall Appropriate be determined Putting this question s answer into effect constitutes a Bi or biCon uent Hyperbolic Iterationstep The obvious answer comes from the computation of both ft and fw together with f39t and f 39w for use in the next iteration The computation is wasteful only if ftfw gt 0 in Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 719 which case either t ft f39t or w fw f39w will be discarded and the other made part of a new straddle nested in and noticeably narrower than the given straddle u v Waste like this cannot af ict very many consecutive BiCon uent Hyperbolic iterationsteps ultimately as is explained hereunder waste is inhibited by the in uence of the ratio R f defined above in l s tediously verifiable identity namely t7zv7zu7z2 Rfuu v Rfv uu 7 imu V39flluu 2 7 f39u39fllluu v zflu V39f39u 7 fu39flluu v w7zu7zv7z2 7 R u vv 7 7 Nu vvMm 2 7 f V39flllu w zflu V39f39V 7 fV39fllu vv Though important R f is not intended to be computed during the iteration R f cannot be in nite otherwise t andor w would be thrown out of the straddle contrary to the assumed monotonicity of f Therefore Rf like f has no singularity so long as all its arguments stay within the given straddle As u and v converge to their respective limits not yet proved to be coincident in the course of iteration R fuu v and R fu vv converge to their respective finite limits as do H fuu v and H fu vv to limits each of which must coincide with a limit of u or of v Because the assumed strict monotonicity of f bounds fl away from zero the convergence of Hf to the same limit as one of its arguments regardless of their order implies that f converges to zero there consequently iterates u and v converge to z from opposite sides and both values of Rf converge to Rfzzz If R fzzz 0 iterates will converge to 2 so exceptionally fast that the iterates computed will be too few for anyone to care whether some were wasteful in the sense described above If R fzzz 0 its sign will ultimately be matched by the signs of R fuu v and R fu vv whereupon 2 will lie between t and w both within the straddle neither need be wasted That nonzero sign determines how 2 and the BiCon uent Hyperbolic iterates will be ordered Whenall Rfgt0 either ultwltzlttltv or vlttltzltwltu Whenall Rflt0 either ulttltz ltwltv or vltwltzlttltu Either way the iteration produces a sequence of ultimately strictly nested straddles whose ends converge to z with the same Order 3 as Halley s lteration s However unless we value a tight straddle more than a onesided but usually no worse approximation to a sought zero the Figure of Merit u m 05493 Q2 since each BiCon uent Hyperbolic iterationstep performs nearly twice as much arithmetic as a merely Con uent one This cost can be cut back if R f always has the same nonzero sign deduced in advance from properties of f such functions will appear in 8 and 12 Knowledge of that sign permits values of t H fuu v and w H fu vv computed from the ends of a straddle to be put in order with 2 according to the inequalities displayed above before ft or fw is computed Of two nested straddles each with one end at u or v and the other at t or w the na1rower provides the next bi Con uentHyperbolic iterationstep It requires only either ft f39t or else fw f39w to be computed Ultimately this iteration converges altematingly with Order 1 win 2414 if Rf gt 0 or else Order 2 NEW 2058 if Rf lt 0 Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 819 6 Linear Convergence to a Multiple Zero Occasionally an iteration converges far slower than is consistent with its Order of convergence exhibited in Table 1 If we wish to preclude such languor we shall have to understand what can cause it One possible cause is a Violation of the assumption that the sought zero z is Simple Functions with multiple zeros are rare the more so the higher is the multiplicity On the other hand functions f with clusters of zeros well separated from the function s other zeros and poles are commonplace From afar far enough from both the cluster and all other zeros and poles the cluster resembles a multiple zero closely enough to retard an iteration s convergence towards a sought zero z until iterates come sufficiently nearer to it than to all the cluster s other zeros While retarded the iteration s behavior often exposes the cause of languor thus While convergence is languid it is roughly Linear which means the ratio Xn1 7 zxn7 z of successive errors approximates a value p at 0 dependent upon only the number m of zeros in the cluster and the choice S f or H f of iterating function This p is the ratio s limiting value as iterates approach any function s zero z of actual multiplicity m 2 2 How p depends upon m and that choice of iterating function is summarized below without proofs which when they exist can be extremely tedious Tabulated below for each choice Sf and Hf is the polynomial of which p is its sole zero strictly between 0 and 1 When m is big 1 7 p is of the order of 1m Tabulated too is the value p takes when m 2 which is by far the most common multiplicity m exceeding 1 Table 2 Iterations Linear Convergence Ratio p to a Zero of Multiplicity m Iteration Polynomial p at m 2 Secant xn1 Sfxn xn71 pm pm1 7 1 0618034 Newton s an Sfxn xn mp 7m 1 05 Hyperbolic Xn1 Hfxn xn71 xn72 132 p 1pm1 7 1 0543689 Con uent Hyperbolic xn1 Hfxn xn71 xn71 mpm pmil 7 pm7 1p 7 1 05 Con uent Hyperbolic xn1 Hfxn xn xn71 pm 2ppmil 7 1p 7 1 7 m 1 0414214 Halley s an Hfxn xn xn m1p 7 m 1 0333333 Unfortunately languid convergence can be caused by other than clustered or multiple zeros Table 2 omits BiCon uent Hyperbolic Iteration because it requires a straddle unobtainable for a double root Whether and how to accelerate languid convergence to a cluster are discussed in 7 and lO of my Lecture Notes on Real RootFinding posted at ltWWWcsberkeleyedukaahanMathl28RealRootspdfgt 7 Languid Convergence to a Hidden Zero Call z a Hidden Zero of f when it has a pole so close to the zero that they almost cancel out An example fx z7x1 1x has a pole at x 0 that almost cancels the zero z if it is tiny enough As z gt 0 the graph of y fx approaches the union of two straight lines one the graph of y 7x 1 the other a vertical line along the yaxis as shown hereunder Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 919 fX z 7 x11x z 7 00005 39 l l l l l 2 15 1 O5 0 05 1 X More generally a plotted graph of a function f with a zero Hidden by a simple pole nearby will fail except by accident to reveal the zero unless the plotted points are dense enough to reveal that a nearly straight vertical line crosses a graph otherwise almost unexceptional Hyperbolic iterations appear called for to cope with the pole adjacent to a sought Hidden zero though it is unlikely to be found by these or any other iterations unless they start from a straddle and maintain it Halley s iteration begun between the adjacent pole and zero converges well unless an iterate jumps a little too far past the sought zero whereupon subsequent iterates will go elsewhere A program can inhibit Halley s divagation by including ad hoc eXpedients like retractions at the risk of needing so many of them that they retard convergence No ad hoc eXpedients are needed for the other two con uent Hyperbolic iterations to maintain a straddle They do so whenever fX is strictly monotonic within it If fufv lt 0 and all three of f39u fluv and f 39v have the same sign then H fuuv and H fuvv both lie strictly between u and v which after the appropriate replacements give way to a tighter straddle Nested straddles will ultimately confine the sought zero as tightly as roundoff allows This desirable behavior comes at a price Occasionally these iterations can tarry in a regime of languid convergence The example f graphed above will illustrate how this languor can occur The Con uent Hyperbolic iterating function in question namely Hfy XX HfXX y X 7 fX f X 7 fXfllXXyflxy will be applied from each side to the foregoing example s function fX z 7x1 UK with 0 lt 2 ltlt 1 but we must assume that no more is known about 2 than a straddle 0 lt u lt z lt v Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 1019 Halley s iterates Hfxxx jump 0Ver z From the far right HfVVV jumps beyond the pole too HfVVV lt 0 unless z lt V lt 32 1 7 3V2 so HfVVV must be retracted unless V on the right of z is extremely near it From the near left from u between z and the pole 0 lt u lt z lt Hfuuu 7 z 2413113 3uz z17z lt Hf000 7 z 22172 which is excellent If Halley s iteration starts from the near left it will converge altematingly and rapidly for this simple example Other more complicated examples rst iterates can jump from between z and the pole far enough past z that their second iterates jump back beyond the pole and must be retracted lest the straddle be lost Hidden zeros tend to hide from Halley Both Con uent iterates t Hfuu V and w Hfu VV maintain the straddle more0Ver neitherjumps past z for this simple example whose Rfzzz m 71z lt 0 Keeping in mind that 0 lt z ltlt 1 and 0 lt u lt z lt V we nd that coming from V far on the right V gt Hfu VV z 17uzV7z2 1 2V V2uz u7z m V17uz2 Vuz 1V when V gtgt V2 This means that iteration solely from the right starting afar from V gtgt V2 shrinks the straddle repeatedly by factors somewhat smaller than 1 7 my 2 until abruptly conVergence becomes quadratic Coming from u on the near left u lt Hfuu V z 7 z7u2V7z zu2V 1 7 z 2uz m z1717uz2VV1 when V gtgt z which implies that iteration solely from the left conVerges quadratically up to the tiny hidden zero z without eVer shrinking the straddle much Does this example s behaVior adumbrate what happens with Hidden zeros generally If so it bodes ill for biCon uent Hyperbolic Iteration because its conVergence is retarded by its greedy choice of the narrower straddle BiCon uent Hyperbolic Iteration conVerges to a Hidden zero so much faster that it seems preferable 0Ver all other iteratiVe methods wheneVer a Hidden zero is likely to hide among those that are sought This preference persists despite that computing both t ft f 39t and w fw f 39w seems costly or at least wasteful of the one not retained Because both characterize f from the same data computing both costs appreciably less than twice the cost of computing either on most of today s computers which take longer to fetch data from memory than to perform pipelined arithmetic upon them HoweVer BiCon uent Hyperbolic Iteration conVerges to a Hidden zero only after it has been straddled How much should be known about a function f for its Hidden zeros to be straddled more reliably than by chance At least the poles of f that may hide its Hidden zeros should be easy to find Therefore the question that deserVes to be considered is actually Are there functions f whose Hidden zeros are hidden only by poles easy to find Such functions will appear in 8 and 12 They were not contriVed They are not artificial Man is a toolusing animal 7 Thomas Carlyle 1795 1881 This explains Why a man holding a hammer must find nails to hit Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 1119 8 Example A Spectral Equation also called a Secular Equation Here is an example of an equation fz 0 which Hyperbolic Iteration is apposite to solve Suppose we seek all K eigenvalues Agj ofa real symmetric matrix X V occc39 differing by a matrix of rank 1 from another real symmetric matrix V whose eigensystem is given given also are scalar at at 0 and column c at o The eigenvalues sought are the zeros a j ofthe Characteristic Polynomial det I 7 X which turns out to cost much more to compute than does the ratio det I 7 Xdet I 7 V of characteristic polynomials Consequently our task reduces to the numerical computation of all roots Ag of a Spectral Equation fog 0 whose f 2k ykzF 7 Xk 7 loc with given values of at at 0 yk 0 and eigenvalues Lk of V all distinct and sorted so that Lk lt Lk1 Each root 6 is known to lie in an open interval EJ one of whose endpoints is N the other is the nearer of X ocZk ykz and ltjirsignm when it exists Every root is Simple since the derivative f39 72k ykzF 7 Xk2 lt 0 moreover f39gJ lt 710c2Zk 12 so every root 6 is determined sharply by the Spectral Equation s f Though no root is repeated some can be Hidden in the sense explored in 7 above Alas the given data can come almost arbitrarily close to pathological at can be arbitrarily tiny or big distinct values X can agree in all but their last digits and coef cients ykz can vary arbitrarily widely unless each yk tinier than roundoff in bl is reset to zero and de ated out In consequence of a nearpathology an interval EJ can be arbitrarily narrow or its Agj can be arbitrarily nearly Hidden Pathology and roundoff conspire sometimes to render a numerical computation unreliable The conspiracy can usually be thwarted by artful originshifts but these are a tiresome story for another document Divisions dominate the cost of computing fog on computers that lack a pipeline dedicated to divisions Whatever that cost it is less than doubled by the cost of computing the derivative f 39Fg too However because 00 is an attractive fixedpoint for Newton s and Secant iterating functions S f these are illsuited to solving our Spectral Equation unless started close enough to a sought zero Con uent Hyperbolic Iterations can generate iterates that converge to Agj rapidly from both sides when started from a straddle in Ej as we shall see The second derivative f quot63 would add no more than f 39Fg did to each iteration s cost and would enable Halley s Iteration but this must be started close enough to the sought zero or else occasionally inhibited by retractions of iterates that would escape from the current straddle Flurries of these retractions by delaying convergence to some zeros especially Hidden zeros can complicate loadbalancing severely enough to impede parallel computation for which all eigenvalues are otherwise eminently eligible since each is computable with no reference to any other 9 How Straddling is Maintained Subscript j will be dropped from 6 and EJ to allay notational clutter in the explanation here Now fog 0 and 6 lies in an open interval E either between two adjacent poles Lk of f or else outside all poles How do the properties of the Spectral Equation s f and of Hf combine to maintain straddling Suppose 5 contains three points w x and y maybe not all distinct ofwhich two straddle a These two straddle v Hfw x y too because f39 lt 0 and fl lt 0 Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 1219 On which side of a will V lie An answer comes out of a close examination of f and Hf Start by dissecting Rfw X y Hfw X y 7 w 7 X 7 y 7 E with the aid ofthe tediously veri able identity in 1 to nd for R fw X y 7 Rnw X yRdw X y that Rnw x y flx y WWW x y a 7 NW x y 39fo y a 21 2101 12713 7amp1 M2 7H 39W i 7H 39X r 7H 39y r 7H i r M 39W i M 39X r M 39y r M gt 0 and Rdw x y 7 w X39flX y 7 x 7afix oow x y gt 0 The last rather unobvious inequality follows from our supposition above about a straddle plus three observations First Rdw 5 y gt 0 Second Rdw X y is a continuous function of its arguments and independent of their order Third while two of w X and y in E straddle a Rdw X y at 0 otherwise v H fw X y could be thrown out of E instead of falling inside the straddle along with a Taken together those inequalities about Rn and Rd imply that V W 39X 39Y RfW K Y lt 0 This inequality R fw X y lt 0 and the prerequisites for its validity are important These are w X and y not necessarily all distinct lie in E along with a and Either two of w X and y straddle F5 or it is close enough to one of them Amply many examples dissatisfy the second prerequisite and throw H fw X y out of E The first of two inferences from the inequality Rf lt 0 is that so long as both its prerequisites hold for all the iterates Xn of a perhaps Con uent Hyperbolic Iteration apersistent pattern of signs of XI 7 6 determines the maXimum number 5 of consecutive iterates that can lie on the same side of a Here is atabulation of E and that pattern Hyperbolic Iterations FigMerit u E SignXn7 F5 Patterns Hyperbolic Xn1 HfXn X1171 X1172 06094 Q1 3 7 7 or 777777 m Hyperbolic Xn1 HfXn1 X1171 Xn 06931Q2 1 7 7 7 7 Con uent Hyperbolic Xn1 HfXn Xn X1171 08814 Q2 2 7 7 7 7 Halley s Xn1 HfXn Xn XI 10986Q3 1 7777 This table eXplains how Hyperbolic iterates that all satisfy both prerequisites straddle a and converge to it from both sides until roundoff interferes m Hyperbolic Iteration always satisfies both prerequisites if started from a straddle but converges a little slower than do the others Their faster convergence could be enjoyed if we could answer an interesting question How can we predict from some few of a chosen Iteration s iterates whether all subsequent iterates will satisfy both of the prerequisites that keep Rf lt 0 Prediction is obvious only for m Hyperbolic Iteration only if it starts from a straddle For other Hyperbolic Iterations prediction may become practicable if the following conjecture is true Provided W X and y satisfy both prerequisites for R fw X y lt 0 Whenever any one of them moves towards so does v Hfw X y Until this conjecture is decidedl am inclined to eschew nonCon uent Hyperbolic and Halley s Iterations Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 1319 A second inference from the inequality Rf lt 0 is that since R fuu V lt 0 and R fu vv lt 0 whenever u and V straddle a then we can predict signfHfuu v signfu and signfHfu vv signfv These predictions gure in straddlemaintaining BiCon uent Hyperbolic Iterations that converge to a Hidden zero a never slower than quadratically we hope and converge ultimately with Order 3 and FigureofMerit u 2 05493 Q2 Of course the program must allow for predictions outed by roundoff It must terminate the iteration when because of roundoff neither prospective computed replacement t H fuu v for u nor w H fu vv for v could narrow the straddle And the program has to find an initial straddle to initiate BiCon uent Hyperbolic Iteration 10 Where to Find the Initial Straddle BiCon uent Hyperbolic Iteration needs initial points u and v in Bi and on opposite sides of 6 Toward this end closedform solutions of the Spectral Equations of 2by2 matrices X will be introduced Given values ofjust or at 0 712 gt 0 22 gt 0 and A1 7E X2 the two zeros of NZQ7 A1 fQ7 2 7 loc are the functions COW 12 7L1 22 7L2 1 7L1 i 7L2 0012 Y22 i Sig110 quotZ 2 and QBOL 71 2 X1 22 2 k1 k2 ocyl2 22 7 signocZ 2 wherein discriminant A X1 7 k2 yo127722 2 403712722 gt 0 The names Q0 and QB have been so chosen because QB lies strictly Between M and X2 whereas Q0 lies strictly Outside them When Q0 and QB have very different magnitudes the smaller is best computed after the bigger from the identity QOQB A1 392 X391 722 A2712 to lessen contamination by roundoff What follows uses both functions QO and QB Back to the original task BiCon uent Hyperbolic Iteration needs two initial iterates u and v in EJ but on opposite sides ofan eigenvalue Ag of X whose dimension now exceeds 2 Two Cases B and 0 shall be distinguished according to whether EJ lies Between 7t and kjsigna or else Outside the narrowest interval containing all the eigenvalues 9L of V Case B When Ej lies Between 7t and kjlrsignm This is the more common Case It entails the computation of two nonempty sumsofsquares 6ltlt ykz over indices k for which ock7j S 0 and 6 ykz over indices k for which ock7j gt 0 In other words 6ltlt sums ykz for all indices k on the same side of j signoc as is j and 6 sums ykz for all indices k on the same side of j as is jsignoc Then Agj turns out to lie strictly between QBOL 6ltlt 7t JJrsignmf Xj ignmp and QBOL J2 7t 6 Xj ignmp both of which fall inside EJ too Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 1419 Case 0 When EJ lies Outside the narrowest interval containing all eigenvalues 9L of V This Case arises only when X is the least if at lt 0 or largest if or gt 0 eigenvalue of V Let 6 Zk vkz this sums vkz for all indices k with kk on the side of AJ opposite j which lies strictly between QOOL J2 Xi Nj ignmf Xj ignmp and QOOL J2 hi 6 Xj ignmp Both of these also fall inside EJ which is the interval strictly between X and X ocZk vkz The relevant Case provides two estimates QW inside Ej that straddle j Neither of the two need be close to Agj especially if 6signa2 and J2 are tiny compared with 2k vkz in which case each estimate QN may lie so close to its end of EJ that too many Hyperbolic Iterations will be squandered to get all iterates far enough away from those ends poles that fast convergence can commence To defend against this contingency form a third estimate Q from the average of the two estimates QN and after examining the sign of f keep Q and whichever of the two estimates Q lies on the other side of Ag to serve as the two initial iterates u and v Subsequent BiCon uent Hyperbolic Iterations will maintain the straddle and shrink it rapidly onto 6 even if it is Hidden until roundoff retards further shrinkage The iteration s stopping criterion takes roundoff into account without mentioning it explicitly Unless fu 0 or fv 0 in which case there is no need for more end iteration as soon as neither computed H fuu v nor Hu vv falls strictly between u and v and then assign to 6 whichever ofthese two minimizes the computed lf jl 11 Error Analysis of the Roots of the Spectral Equation 63 0 How uncertain is a computed root g because of roundoff Uncertainty is engendered almost entirely by roundoff s contribution to the computed value of f 2k vkzF 7 Xk 7 loc Recall that its data are 0H 0 and K values vki 0 and K values Ak all distinct and sorted so Ak lt Ak Each root i is known to lie in an open interval E one of Whose endpoints is A the other is the nearer of AHSignW and AJ ocZk kl subject to the understanding that A0 700 and AKH 00 Every root i is Simple because the derivative f39 72k kl 7 A192 lt 0 moreover f39 J ltfloczZk vkz so every root i is determined sharply by the Spectral Equation s f In fact f is monotone nonincreasing in each E despite roundoff Alas the given data can come almost arbitrarily close to pathological on can be arbitrarily tiny or big distinct values Ak may agree in all but their last digits and the coefficients vkz can range almost arbitrarily Widely unless each vk tinier than roundoff in yldl is reset to zero and De ated out In consequence of a nearpathology an interval E can be arbitrarily narrow or its i can come arbitrarily close to either endpoint Where a pole may Hide the zero Pathology and roundoff can conspire to render the numerical computation of i unreliable The analyses hereunder will help anyone who wishes to assess a computed zero s uncertainty due to roundoff and the effectiveness of an attempt to combat it by De ation described below Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 1519 A reasonable bound upon roundoff s contribution is e03 aeZk 11316 7 Xkl wherein a3 is a modest multiple ofa roundoffthreshold 1000 001 7 1 like MATLAB s eps This modest multiple varies with matrix X s dimension K which is the number of terms in 2k The multiple depends on how summation is programmed and runs from 1 through log2K to K roughly WK is probably satisfactory The consequent uncertainty in any computed value of a at which re m 0 is about e 7 e lf39 l 7 aeZk viz1a 7 MDZk viza 7 x192 In this formula 0c gures indirectly by virtue of its in uence upon a Thus u ae is a computable positively weighted harmonic mean of all the gaps 1g 7 Xkl between g and the poles ltk We can tolerate this much uncertainty 1391 easily if absolute error in g is our sole concern However in order to compute every eigenvector x of X accurately enough from the simple formula x Columnyk F5 7 Xk all the gaps relative uncertainties must be kept small To this end we might endeavor to keep both u a3l 7 M and u ael 7 h1signal small enough This endeavor must fail occasionally when g is too nearly Hidden by a pole Am Before the failure is explained some simplifying assumptions and abbreviations will be invoked First for any given J let I be the subscript ofthe pole lei nearest g1 so that either I J or I J signoc Next assign abbreviations ck 7 vizv12 gt 0 rial 7 1amp7 Warm 239 7 2kg and qei 7 nanwig 7x11 Now aeq is the relative uncertainty in the smallest gap g 7M Drop F51 to get simply q 1 Z39ckrk1 Z39 ckrkz Note that 0 lt rk S ri 1 for all k consequently q gt1 How big can q not be In the absence of constraints upon the data at Ak and 7k beyond those assumed so far q can be arbitrarily bigger than 1 as happen to a small KbyK matrix example with K3 J2 700 0c12 21 X1172q 9L312q 11 732qq71 This small example s uncertainty 11g2 qaeleg277t21 for any chosen q no matter how big More generally for arbitrary dimensions and data q 7 nanwarn s a 1 VltZk vie1m and the upper bound is achievable This inequality s proof starts from Cauchy s inequality which implies q 7 1 239 ckrk 1 239 ckrk2 s a 1 12v ckZ39 ckrkz 1 239 ckrk2 with equality just when all numbers rk are the same for all k at l Anyway q 2 1 since all such rk S 1 Now introduce temporary abbreviations c WZ39 ck and 0 WZ39 ckrkz to simplify q 1 901 02 and the confirmation that 1 92 42571 7 992 29 7 c21 962 z 0 Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 1619 so that a S 1 Wl 922 with equality just when 29 c Work backward to obtain the foregoing more general upper bound upon q It can be arbitrarily big if ly 12 is tiny enough and then the endeavor described above can fail and often does alas then roundoff can de ect some eigenvectors if computed from the simple formula above through angles roughly as big as the gaps worst relative errors The roundingerrorbounds i103 and aeq cost little to compute because they come from quantities already computed for each iteration The bounds are approachable too occasionally bigger than actual errors by less than an order of magnitude However the bounds tend to gross pessimism the more so as dimension K increases They are pessimistic on computers that conform to IEEE Stande 754 for oatingpoint arithmetic because rounding errors resemble independent random variables with mean zero in so far as they tend to cancel partially as they accumulate But roundoff is not random on these computers consequently computed values of fX are monotone nonincreasing between poles Here is a snapshot comparing values of a typical Spectral fX computed with 4byte oatingpoint versus 8byte 76 X10 1 15 2 25 X 77 X 1 O This staircase graph is typical though the sizes of horizontal steps and vertical risers are often far more diverse Were roundoff random or biased the graph of f would appear ragged or shifted and its zeros g1 would need a trickier program and noticeably longer to be computed at least about as accurately as the data and the arithmetic s precision deserve Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 1719 Evidently roundoif can debase the relative accuracy of a computed g1 severely only when ly 12 is too tiny Then in an attempt to circumvent debasement we could try De ation As if 7f 0 remove it and M from the data temporarily restore them after all K71 other roots 5k have been computed De ation is motivated by the observation that one of the Spectral equation s roots g gt M as 7f gt 0 while all other data is held xed Two questions demand attention Which root g gt M How big is De ation s error leg 7 Ml Their answers will tell us how small ly must be to render De ation s error accepting M in place of F51 tolerable and where to restore M in sorted order among the other roots 63k We know already that either J l or J l 7 signoc Assume for now that we know which the appropriate choices will emerge later And assume M is not too near any other Am At rst sight the error leg 7 Ml might be expected to approach zero like Viz since this is the only way 7f appears in the Spectral equation f 2k Ifgr Wk 7 loc 0 Usually the error does behave that way but whenever M is a kind of double root the error approaches zero much slower like ly instead Here is how that can happen Abbreviations 239 7 2kg ma 7 239 vkza 7 xi 7 Na fi 7 mi f 7 file M and ff ff39Otf fflOtf M lt 0 will help reduce clutter Note ff and ff are computable Now 0 7 my 7 Viz91 7 xi fiG J 7 Viz 1 7 xi fi i 1 7 Ki39fil 1 M whence comes 5 F5 7 M ff F5 7 A ffl 0 From this equation s two solutions F5 7 M we select the one conforming to our assumption that W is tiny enough that the limits namely lg 7M gt 0 and fjl gt ff lt 0 as Vi gt 0 are very near When fI39 0 our selection must be a 7 M 7 72Vi2 signf139 612 7 4Yi239fil f1 e 7ZYi2 signf139Vf12 7 4Yi239fi39 f1 and then J l 7 signf139 signoc2 When fI39 0 then Ii signoc and lab K1signoc our selection becomes ambiguous F517 hf ily W7f139l m ily N7ff39 for J l i1 7 signoc 2 resp This ambiguity is not a defect in our analysis but a characteristic of the dispersal of a double root s fragments by a perturbation of its equation M is the double root half Hidden of the Secular equation when Vi is in nitesimal In other words hf M igma turns out to be a double eigenvalue of matrix X V occc39 when Vi 0 A small example with l 2 has 71 X1 72 X2 0 Y3 7L3 0c l and tiny variable 72 y and produces these series expansions for the three eigenvalues of X Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 1819 g1 711117 7 7216 29M3256V7 g2 7 MN 7 7216 7 29M3256V7 and g3 4 9y28 7 81168192 all for m lt 1461 Generally when 71f is far enough from all other 71s the foregoing approximations to g 7 71f permit De ation s error to be deemed tolerable or not The situation gets more complicated otherwise when M belongs to atight cluster of values 71 since ff may approximate fI39l poorly then A crude but fully general overestimate of the effect upon all X s eigenvalues of De ation comes from recognizing it as a Rank2 perturbation of X that subtracts from all its eigenvalues amounts between oc 7f i Nyfz 42194713 Wf 2 most of them much tinier Whether with De ation or without whether assessed or not the accuracies of all computed eigenvalues can be at least about as good as the data and arithmetic s precision deserve and yet some gaps g 771k can be too inaccurate for reliable computation of all X s eigenvectors from the simple formula x Column1k F51 771k To compute every eigenvector of X reliably without extraprecise arithmetic requires unobvious formulas for them and for eigenvalues obtained accurately enough with the aid of artful origin shifts these are a story for another day 12 Another Example a Tridiagonal Eigenvalue Problem Suppose T is a KbyK symmetric tridiagonal matrix whose eigenvalues 1J are sought These are the K zeros of detT7 1391 but it is harder to compute than 111 detT7 1IdetT7 1391 where T is obtained from T by striking off its last row and column after oatingpoint OverUnder ow is taken into account In fact 111 is the last diagonal element of the upper triangular factor U of T71I LU without pivotal exchanges L is lowertriangular with a diagonal whose every element is 1 The K poles of 11 are at 00 and the eigenvalues of T and interlace the zeros of 11 The count of zeros of 11 less than 1 is the same as the count of negative diagonal elements of U a count of the signs of all but the last diagonal element of U counts finite poles of 11 less than 1 locating them implicitly Between poles 1139u lt 71 and computed values of 11u are monotone nonincreasing despite roundoff Hyperbolic Iterating Functions H11 act much as H f do for the Spectral equation s f because a Mobius Map 4 that takes the infinite pole of 11 to a finite location turns 11 into a rational function f with K finite zeros interspersed among K finite poles between which f is strictly decreasing However because the poles of 11 are not explicit straddles of Hidden zeros of 11 have to be found by a process of Binary Chop acting on the aforementioned counts On the other hand De ation accomplished by zeroing out negligible offdiagonal elements of T is simpler to manage for 11 than for f So is roundoff Another story for another day Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am File Hyp Hyperbolic Interpolation and Iteration towards a Zero p 1919 13 Further Reading For more about the theory of equationsolving see books by J F Traub by AM Ostrowski and by AS Householder all cited in my extensive Lecture Notes on Real Root Finding posted at ltwwwcsberkeleyedukaahanMathl28RealRootspdfgt These notes also apply Projective maps built out of Mobius maps to Secant and Newton s Iterations Properties of Mobius maps are explored in my lecture notes posted at lt Math185Mobiuspdfgt For the properties of eigenvalues of real symmetric matrices see BN Parlett s book The Symmetric Eigenvalue Problem 1998 Classics in Applied Mathematics 20 Soc for Indust amp Appl Math Philadelphia Roundoff is treated at length in N Higham s book Accuracy anal Stability ofNumericalAlgorithms 2d ed 2002 Soc Indust amp Appl Math Philadelphia For the tiresome details about unobvious eigenvector formulas and artful origin shifts see my notes on Rank 1 Updates of a Symmetric Eigenproblem to be WHEN posted at lt Smepdt1pdfgt and the MATLAB programs therein The unobvious eigenvector formulas come from Ming Gu and Stanley C Eisenstat 1994 A Stable and Efficient Algorithm for the RankOne Modification of the Symmetric Eigenproblem pp 12661276 of SIAM J Matrix Anal Appl 15 For the best alternative to BiCon uent Hyperbolic Iteration see RenCang Li 1994 Solving Secular Equations Stably anal Ef ciently EECS Comp Sci Divn Tech Rept No UCBCSD94851 Univ of Calif Berkeley Prof W Kahan Mathematics Dept and BE amp Computer Sci Dept University of California Berkeley CA 94720 ltWWWcsberkeleyedukaahangt Prof W Kahan Notes for Math 128 B version dated September 2 2009 830 am M Brand s 2nd Revised Problem has In nitely Many Solutions Given are two real symmetric positive de nite matrices A A39 and V V39 and real column b and row c39 both nonzero Desired are the real solutions Z of the equation Z39AZbc39Z7VO Note that Z may be rectangular provided it has at least as many rows same number as A s as columns same number as V s The C holeski F actorizations of A U39U and V R39R provide two uppertriangular matrices U and R Set Z UTISR p R39le i o and g39 c39U 1 i o39 to transform the given equation for Z into an equivalent equation we wish to solve for S S39S pg39S71O Let e39 1 0 0 0 be the rst row of any identity matrix of appropriate dimensions Next compute symmetric orthogonal matrices W W39 W 1 and Y Y39 Y 1 that map p to Wp inHe and g39 to g39Y ngHe39 wherein HpH Vp39p and M Vg39g compute W 2ww39w39w 71 from w p i HpHe i o and Y 2yy39y39y 71 from y g i HgHe i 0 Now ng39Y Hee39 for a scalar LL inHHgH whose sign is inherited from earlier choices of i signs Consequently we shall obtain a desired solution S YKW from any solution K of K39Kuee39K71O Every such solution K here F has orthonormal columns ie F39F I but is otherwise arbitrary f is any solution of FT 0 with f fS 1 24 and B iuZ i V1 7 f f 24 has either of two values The other is 4143 B Note that F like Z must have at least as many rows as columns but if F and Z are square matrices then F39 F 1 and f o Thus every solution Z can be computed by rst computing U and R then p g39 and LL Then choose any F of the right dimensions with orthonormal columns and if it is not square choose any f orthogonal to F s columns and not too big f fS 1 24 Then choose one of two available values for B and construct K Y W S and nally Z If Z is square so f o two extremal solutions can be identi ed one has F fl and B lt 0 the other has F I and B gt 0 The two values of B are the values of 4 LL i V4 2 2 and 2 LL i V4 2 choose i signs to avoid cancellation Here is a MATLAB program to provide both extremal solutions in this square case function Zhi Zlo brandA b c V bb cc UcholA RcholV pR b gU c lpnormp lgnormg mu lplg i wp i yg i if wlgt0 wl wl lp else wl wl lp mu mu end if Ylgt0 Yl Yl lg else y1 yl lg mu mu end k oneslengthb ll d sqrtmumu 4 abs mu if mugt0 blo d2 bhi 2d else blo 2d bhi d2 end Klo 1310 k Khi bhi k cw 2w w cy 2y y S cwKloww diagKlo Slo cyyy S S S cwKhiww diagKhi Shi cyyy S S Zlo USlcgtR Zhi UShiR This program was tested on randomly generated examples of which some samples follow Prof W Kahan page 12 April 13 2001 1023 am Tlrn39a rln nm an xxvna quotAnnl mm 17mm Ab Anlmr A n A M Brand s 2nd Revised Problem has In nitely Many Solutions 164 71 758 710 9 85 717 66 A 71229757 b 5 0 71 V 717255 5 758 757174 71 72 66 5 334 Zhi 1109840049881680 04253931355746230 05001629792830848 01206715574191866 1011046773463334 05048722534871607 03828893646171346 01442649562470549 1543191771916220 Zlo 05371541152388652 01390501682532156 04428943858188037 01449099635028275 09989275704215140 05072960940955248 03148767493153201 01102586485961477 1536390510386038 85 758 713 1108 64 85 27 781 A 758105 8 a 19 0 423 a V 2710646 713 8 2 71307 716 781 46 131 Zhi 5167098224202696 01607300268144672 6054872185497971 04813216129673538 02477840131363260 06024937430376918 3338889876937286 7193633915618992 4015314775155887 Zlo 4437167098224607 7616073002682194 5221945127814978 7756481321613002 1332477840131375 9148397506257003 1104661110122815 1828063660843385 1311015314774864 How accurate these results are is hard to say However their residuals have been computed and are comparable with the rounding errors generated when they were computed so these numerical results are compatible with a claim of Backward Stability In other words the computed Z s cannot be much worse than if they had been computed exactly from data perturbed only in end gures The foregoing formulas must suiTer excessively from roundoff only when A and V are too nearly singular This situation may arise arti cially when A and V were derived from other data in a way that happens often A B39B for some rectangular B with at least as many rows as columns In such a case computing A and then its Choleski factor U is numerically imprudent unless performed in arithmetic at least twice as precise as the data B and the desired result Z Otherwise U is better computed from a QR Factorization B QU in which Q has the same dimensions as B has and orthonormal columns Q39Q I Similarly for V and R Prof W Kahan page 22 April 13 2001 1023 am

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.