### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Analysis I MATH 341

UT

GPA 3.52

### View Full Document

## 11

## 0

## Popular in Course

## Popular in Mathematics (M)

This 31 page Class Notes was uploaded by John MacGyver on Monday October 26, 2015. The Class Notes belongs to MATH 341 at University of Tennessee - Knoxville taught by Staff in Fall. Since its upload, it has received 11 views. For similar materials see /class/229827/math-341-university-of-tennessee-knoxville in Mathematics (M) at University of Tennessee - Knoxville.

## Reviews for Analysis I

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/26/15

Note Since many elementary MVC textbooks cover the relation between partial derivatives and total derivative in less depth than I believe is optimal for this course or they do it without reference to matrices I am providing the key issues here as special notes Partial derivatives De nition of partial derivatives Everybody who can do singlevariable derivatives from calculus 1amp2 can already do partial derivatives Eg for a 2 variable function 1 given by the formula fzy m3 3z2y2 gs sinm2 By we can choose to treat y like a xed parameter and view the expression as a function of the variable x alone and we can take the derivative with respect to m Alternatively we can view this expression as a function of y alone treating z like a parameter and taking the derivative with respect to y These derivatives are called partial derivatives and the adjective partial is tting because they provide only part of the information that should be contained in the derivative The prime notation 1quot from single variable calculus won t serve us here because the prime doesn t tell us with which variable m or y we are dealing Instead we use the Leibniz notation in single variable calculus that would be df 7 However in order to provide a visual sign that these are partial derivatives we replace the standard 1 with curly 01 s7 8 The signi cance in this notation will become more transparent soon for the moment it s just a reminder that we are dealing with one variable at a time in a situation where several variables are present So we can write an example for partial derivatives with the 1 given above 1 y 003 390292 243 8111002 3y W 3x2 6x342 sinm2 3g m3 3z2y2 gs cosm2 3y 2x 13 W 6m2y 3342 sinm2 3g m3 3z2y2 gs 3cosz2 3y 9 How do the curly d s read aloud For instance partial df over dx7 or dell fover dell x Clari cation of the function concept and appropriate notation To explain and interpret partial derivatives we need to work a bit on a clean language about functions Mind the mantra that a function is not the same as a formula Think of a function as a slot machine that takes certain inputs and assigns a speci c output to each legitimate input The output may be obtained from the input by means of a formula or by means of several formulas in the case of piecewise de ned functions But even if one formula provides the output the function is not the formula but rather the function is the whole input output device In the example of partial derivatives given above we are actually talking about three different functions each of which has its output given by the same formula but they are distinguished by different input slots Namely we are talking about one two variable function and two singlevariable functions Firstly we have the two variable function which we assigned the name f A more elaborate and maybe weird looking name would be f The dots represent the input slots We customarily name the quantity that goes in the rst slot as x and the quantity that goes in the second slot as y Not always in f1 2 the quantities are explicit numbers and they don t need to be given another name The full picture of what the function does is given in the following notation which pure math folks love for its conceptual clarity and unambiguity but which is not so often used in calculus because it is lengthy f i 907 y H aw 963 390212 9358111952 3y Before the colon you have the name of the function f after the colon the explanation what the function does You see its input slots with convential but arbitrary names m and y given to the variables that go in these slots Then the assignment arrow gt gt which symbolizes the function s operation of taking the input and converting it into an output Finally the output with its generic conventional symbol fzy function with variables lled in the slots and the actual formula that tells how the function calculates the input You ll save yourself a lot of heartache with partial derivatives if you make sure that your notion of function in your brain represents this pattern and is NOT reduced to the formula at the end And yes Iknow the functionformula misconception has served you well so far so it s hard to get rid of it but now you d better dump the misconception anyway lest it become the source of mysterious confusions down the road Unlike the neatly groomed textbooks I will not provide an arti cially screened and protected environment designed to cater for survival with the beloved misconception In a rough metaphor the function f is like a coke machine with the two inputs being a coin and a button push The output fm y is the coke Now if somebody rigs up the input output device and keeps the button for diet cherry coke permanently selected leaving you only one input slot than this device is a new function We can give it a completely new name like 9 or we can call it f y with the second slot already lled with some constant y and only the rst slot ready to take an input called This function is now Hwy 3 m H aw 003 390212 2438111952 324 It is a single variable function and its derivative at z is what we called 8fz y8z above So you could write f 8fm But writing it this way serves no other purpose than to illustrate a notation that may seem weird to you yet The third function we were considering in our example is my 39 3y H aw 003 390212 2458111952 324 Three different functions all from the same formula You may not see the dot slot notation too often and it may be abhorrent to physicists who might rather call the 2 variable function fmy and the 1 variable functions fz and fy respectively using the variables to distinguish which function we are talking about and not bothering to give a name to the function itself Interpretation of partial derivatives in the graph of a function Z graph 1 2 1 y 2 graph 9507 393 Z 9007 y m mo xed 900790 Z partial derivatives don t see more than this Z graph Hmyo Z 790 y yo xed 950790 The upper left corner of this gure represents the graph of a 2 variable function 1 Below and to its right you see the single variable functions fy0 and u graphed respectively The graph of u still appears tilted with respect to the paper plane as in its orignal position The lower right corner of the gure combines the two single variable graphs again and adds their tangent lines The partial derivatives m0 yo and m0y0 are precisely the slopes of tangent lines in the graphs of the single variable functions If you think of these tangent lines drawn in the 3 dimensional gure you can put a plane H passing through these lines This plane has also been plotted We d like to consider it as the tangent plane to the graph of the 2 variable function f in the point zOMyO mo 340 In the example drawn here the plane H does indeed qualify as a tangent plane and this observation will justify calling the function 1 from the graph totally differentiable 7 in contradistinction to partial di erentiability which only refers to the existence of the partial derivatives Look at the lower right gure The information that enters into the partial derivatives does not see anything the function does in the place where the question marks are The 2 variable function 1 could be modi ed wildly in these quadrants without affecting the single variable functions from which the partial derivatives are calculated In particular it could be modi ed so wildly that neither H nor any other plane could reasonably be considered to be tangent to the graph And this brings us to the limitations of the partial derivative Limitations of the partial derivative and What we will do about them As I have just pointed out If we were to call a multi variable function differentiable if merely all partial derivatives exist we would end up calling some rather wild functions di erentiable that do not deserve to be called differentiable Take for instance our old friend 9 H aw 002 y 0 This function is not even continuous at 0 0 but still the singlevariable functions f0 x H 0 and f0 y H 0 are constant and have trivially the derivative 0 We will de ne a notion of di erentiable that still gives us the theorem that we had in single variables namely If f is differentiable then it is continuous In contrast existence of the partial derivatives does not even guarantee continuity of a function as we have just seen There is a practical limitation as well and it is related to the theoretical limitation just mentioned If the variables x and y are indeed cartesian coordinates of a point P with the function depending on the point geometrically then the function P H fP has a meaning independent of the coordinate directions we choose We could construct a continuous function like eg note the square root this time my if m y 00 9 H Way vx20 y In polar coordinates gz y gr cos 4p rsin 4p r Sian cos 4p We can slice the graph of g in many directions not only the two directions that were arbitrarily singled out as coordinate directions and we get in nitely many single variable functions just from slices through the origin for instance 25 H gtt t if we slice along the diagonal of the m y plane This time the function is continuous in the origin and we also have tangent lines in all coordinate directions but these tangent lines still do not assemble into a plane If the input of the function has a geometric meaning then partial derivatives single out certain directions arbitrarily as coordinate directions at the neglect of other directions Think about the other meaning of partial whose opposite is not total but impartial We will also construct a notion of directional derivative which generalizes partial derivatives in the sense that any direction can be used for getting a single variable slice of the graph With this notion partial derivatives will simply be directional derivatives in coordinate directions But again even the existence of all directional derivatives in a point is not suf cient for the existence of a tangent plane as the example 9 shows Worse even Even if all directional derivatives in one point exist and are 0 so that the tangent lines neatly t together into a plane namely a plane 2 const this still does not guarantee the continuity of the function in this point The function from Homeworks 813 hz y z2y4m4y8 for z y 7 00 and h0 0 0 is an example for this phenomenon Details later Planes linear maps derivatives and matrices Graphs that are planes and linear maps How does a 2 variable function T look like whose graph is a plane 7 Well all the single variable functions Ty z gt gt Tmy should have graphs that are straight lines and all these lines should have the same slopes So Tz y gy mm Since the single variable function Tz y gt gt Tz0 y would also have to graph as a line we need gy a ny for some constants a and n In other words it is just the linear functions Tz y a mm ny whose graphs are planes ln natural generalization we call an Z variable function T linear inhomogeneous or a ne if it is of the form Tm1 zg a mlzl mug with constants a and mi j 1 6 We call a vector valued function T linear inhomogeneous or af ne if each component function Ti is a linear inhomogeneous function ie if we can write with constants ai and mij T19017 WW a1 m11901 mum T2m1ml a2 mglml myml LF Tk1 l ak mk11 mum In the case k 1 Z 2 which we can neatly graph in R3 the graph is a plane Note The words linear inhomogeneous7 or af ne are more popular in Linear Algebra ln Calculus we often say simply linear instead of linear inhomogeneous af ne ln Linear Algebra the word linear alone refers to the special case of LF where all a are zero Matrices Matrices have been invented as a more concise notation for situations like This notation will actually condense LP to a form that makes it look very similar to the scalar valued single variable case Tm a mm Earlier we had combined the components m1 zl into a vector abbreviated as E which even turns out to have a geometric interpretation so we are not merely talking about an abbreviation but rather f is the real thing7 and its components zj are merely pieces of f de ned in terms of an arbitrarily chosen cartesian coordinate system In a similar spirit we now arrange the coef cients nil7 into a rectangular array and call this whole array a matrix mll mll m21 mu M l mkl mkl i More speci cally we say M is a k x 6 matrix namely a matrix with k rows and 6 columns We will strictly abide by the following convention the Rst index for entries of a matrix indicates the Row whereas the second index represents the column in which that entry may be found For a matrix an equally convenient geometric interpretation as in the case of vectors viewed as arrows is not visible at this moment nevertheless you should still think of the matrix M as the real thing and of its entres mij merely pieces determined from an arbitrarily chosen coordinate system Vectors are special cases of matrices namely they are matrices with but one column remem ber the vertical convention7 for vectors Refer to the Linear Algebra Glossary page 3 for the definition of multiplication of matrices ad the basic rules of matrix arithmetic They belong here logically but I won 39t repeat them here Total Differentiability Let us consider a function Let it depend on several variables x1 So we can write a f1x1xz fmlw39wmlHfm17quot397xl fkx1lxll We have written a vector valued function for generality but the scalar valued case is included if the number k of components of fis 1 Moreover the case 6 1 the single variable case is also included as a special case For ease of notation we treat the variables x1 x as components ofa vector i x1 xng and write Note however that the vectors fand f in general are of different nature they may even have a different number of components We now select a particular input point it For the moment I don t want to call it 0 to avoid confusion between the index 0 for some particular point7 and the vector component index We ask the question whether for f close to if can be approximated well by some linear inhomogeneous function Speci cally we want to be well approximated by flflTx it for some matrix T It does not suf ce that this quantity goes to 6 as E a e which would only mean continuity of 1 Rather it has to go to 6 faster than if it Here is the formal de nition De nition Let U be an open subset ole and fbe a function from U into Rk Let x E U We call 1 di erentiable at e if there exists a k X 6 matrix T such that i a I 7 a r 7T 11m llf l fem H 0 TD hat A synonym used to emphasize the distinction to partial deriuatiues is totally di erentiable We ll see in a moment that there can be at most one such matrix T if T exists we call it the total deriuatiue of f in if and write it as Dfx Synonyms for total deriuatiue are functional matrix or Jacobi matrix With it 32732 the function f H Tx 7x will have as its graph a tangent plane to the graph of f in it At least this holds in the case k 1 l 2 in which we can geometrically draw such a plane in R3 In other cases this sentence is merely a way of speaking which by metaphor carries over our geometric intuition into dimensions never beheld by human eyes More formally consider this statement a de nition of tangent plane7 in these cases of higher dimension In the case k 1 Z 1 of single variable calculus our de nition of total di erentiability reduces to the old de nition of di erentiability for single variable functions D f it would have to be a 1 x 1 matrix which we usually identify with the number that is the one and only entry of this matrix and this number is what was called f m1 in single variable calculus Indeed in the SV case the norms become absolute values 1 1 and our de nition asserts that the limit hm rm h 7 rm 7 Th haO h vanishes for some number 1 x 1 matrix T But this means that limhno W exists and is T So T is the derivative f z1 Let39s drop the from the notation now We study differentiabilityata pointi 1 2 ulT Next we ll see that total di erentiability of 1 implies the existence of the partial derivatives and that the entries of T are precisely these partial derivatives We do this by choosing special vectors 71 namely those that point in coordinate directions For simplicity assume that f is scalar valued Let s choose I 25 0 0 T We are looking for a row matrix T T1 T2 Tl that sati es the de nition Note that T tT1 0T2 OT tTl Differentiability requires in particular for our chosen vector that that Pn 901 t727lgttf17277MtT1 0 But this identi es T1 as the partial derivative 8fi8z1 lfwe had chosen 7 to be 0 t 0 T instead we would have selected the second entry T2 of T and identi ed it with the partial derivative 8ff8z2 and so on It is clear from this deliberation that partial derivatives arise from the total derivative by choosing speci c vectors 7 in the de nition of differentia bility Total di erentiability requires that the limit in the de nition exists even without any restriction on how 7 goes to O The same considerations carry over to the vector valued case For a function fwith compo nent functions f1 fk the limit TD in the above de nition will be zero if and only if the corresponding limit for each component function is 0 Our conclusion is E f is di erentiablev then So different rows of correspond to dif 8 f1 8 f1 8 f1 8 f1 ferent components of the function f aFor scalar i i i i valued functions f the matrix Dfz is made 8x1 8mg 8x3 8x1 f 1 DE t 1 f 8102 6102 6102 6102 up oa on y one row eren co umns o i 7 7 7 Dfz correspond to the different variables D a n 7 8x1 8mg 8x3 8x1 Jew T While the k x 6 matrix in this formula can al ways be constructed when the partial deriva afk afk afk afk tives existb this matrix only deserves the name 87M 8 aims aim Dfz if f is totally differentiable at m Only if f is totally differentiable does this matrix give an appropriate linear approximation to the function near E From now on I39ll omit the vector arrow from regardless of whether 1 is scalar valued or vector valued I will retain the vector arrow on f Proving Total Differentiability Before entering into the task outlined in the headline let s note a very easy consequence of differentiablilty Theorem If f is totally di erentiable at E then it is continuous there W I a m 7 TIM HhH must go to 0 So we get lim a6 i h 7 fi 7 Th 0or 0 as the case may be Since Th a 0 or 0 automatically as ii a 0 we conclude ff a ff ie f is continuous at f The proof is easy If lim 0 then in particular the numerator ha0 Pedestrian Differentiability Proofs In principle to prove that a function is totally differentiable you rst need to nd an appro priate matrix T to be used in de nition TD then you have to check the limit property that is required in the de nition Finding T is easy because the matrix formed from the partial derivatives is the only possible candidate and partial derivatives are easy to calculate The labor then consists of checking the limit property We ll see an example below and anther one is in Hwk 18 Easy Differentiability Proofs Easy proofs are available if the partial derivatives you have computed as the only possible entries of the matrix T turn out to be continuous functions in a neighborhod of a point i That means of course that they have to be continuous in the multi variable sense continuity of the single variable functions obtained by freezing all but one variable will not suf ce In that case there is a theorem that guarantees that f is differentiable at f and we save a lot of work Note when I say continuous in a neighborhood of i I mean there is a little ball around i in which the functions in question are continuous A proof of the theorem in question is a very useful exercise to begin understanding the notion of total differentiability so you do not want to skip over this proof below Example of a pedestrian di erentiability proof m2 sake of comparison we will also study the function gz y for z y 7 00 and z y 907 0 0 We prove that f is differentiable in the origin but 9 is not First we note that fz0 0z2z2 0 0 for z 7 0 and of course f0 0 0 also So the single variable function x gt gt fm 0 is the constant 0 Its derivative at z 0 is 0 gt s derivative is 0 everywhere but it is z 0 we are interested in We have concluded f 0 0 0 The very same argument applies to show 0 0 0 The only matrix T that could be Df0 0 is 0 0 2 2 We consider the 2 variable function fzy 2 for z y 7 00 and f0 0 0 For 39 So far the very same could be said for g with the same calculations except for the trivial change in the formula The only matrix T that could be Dg00 is 0 0 Next we show that T is indeed Df0 0 Thereafter we will see that T does not qualify for Dg0 0 We choose the letters h and k for the components of the vector h We want to show that fmm0m7 am7mmm ww mm h2k2 Since 1 k 00 is not considered in the limit 1 k a 00 we can use the zZyZm27l7y2 formula for f and simplifying we have to show that 71sz l 0 wagon h k232 Here is an easy trick how to do that Note that by the agm inequality lhkl h2 k2 for all real numbers h k Therefore lhk2h2 k232l 012 k212 and this goes to 0 trivially as h k a 00 This proves that f is totally differentiable at the origin Now if we try to do the same on 9 we would have to show that lhklh2 k232 7 0 as h k a O This however is not true For instance if we approach the origin along the diagonal k h we get hZ2h232 2 32lhl 1 which does not go to 0 So 9 is not differentiable at the origin 7 Of course we knew this from the onset because dil ferentiability implies continuity and we had seen that 9 isn t even continuous at the origin Proof that continuous partials imply total differentiability We do this for a 3 variable function zy z gt gt fmyz using I 71 k llT The general case can be worked out completely analogously except for bulkier writeup In the numerator we have to consider the difference f h 7 f 7 Th where T is the matrix of partial derivatives Speci cally we have to deal with the difference 6 1 967y72L a mwx L 7 a yx Numfzhykzl7fzyz7 8m 7 8y 82 In line with the fact that we have knowledge about partial derivatives we write this as a sum of differences in such a way that in each of several terms only one variable changes Num lt mhyhz07f wkg0gt ltfz7ykzl7f7y72lgt lt z07fwampD a ayhz0 7 8m WWW 0k 8y 7 WWW 2 Z 8f7yk72l a mwx 8f907y72l a mwx lt 8m 7 8m gth lt 8y 7 8y gtk In this layout the rst staircase just rewrites the two 1 terms as a telescoping sum the second staircase models the partials we have in the formula for Num but we have changed the arguments to match the ones in the rst staircase The last line merely corrects for the modi cations made in the second staircase Let s begin with what this last line contributes to the fraction in TD to this end we throw in the denominator v 712 k2 2 again lt8f907yk72l 8f967y72gt h lt8f7y72l 8f7y72gt k 8x 8x h2k212 8y 8y V112 k212 The fractions have absolute value 1 and the differences in the parentheses go to 0 because the partials are continuous Now we combine matching steps in the two staircases The rst of them contributes fzhykzlfz7ykzl7W1 h h hz k2 2 to the fraction in TD Here we notice that fz 1 7111 1 kz 1 7 fzy kz 1 W h by the mean value theorem for the single variable function fy k z l where is some number between z and z 71 Again we have exhibited a contribution that goes to 0 as h a 0 by the continuity of the partials1 The same reasoning applies for the other two steps of the staircases So if we put the quantity Num into formula TD we obtain a sum of terms each of which goes to 0 as h k l a 000 And this proves total di erentiability of f at zyz Directional Derivative and Geometric Interpretation of D ff as Vector Eater7 We have seen that total di erentiability implies the existence of partial derivatives To see this we merely had to choose for the vector 7 vectors t1 O O T t0 10 T etc vectors pointing in coordinate directions Let us instead use vectors 7 2517 with 17 a xed vector pointing in any direction coordinate or not We then get a single variable function 25 gt gt f 2517 which is obtained by restricting the multi variable function f to inputs on the line 14517 1 t E R If the derivative of this single variable function at t 0 exists we call this quantity the directional den39uatz39ue of f at i in direction 17 and denote it as 85 i ln formulas a d a a 851 i f90 75v dt t0 Some authors use the word directional derivative only if 17 has length 1 because the quantity in question depends both on the direction and the length of 17 Only by normalizing xing the length a priori do we get a quantity that depends only on the direction In this class however I will not restrict the length of 17 and accept the drawback that the word directional derivative7 could then be slightly misleading 11f you have a really excellent Hons Calc 2 Vision you ll see that we actually use that the partials are umformly continuous which is implied by continuity on a bounded and closed domain If you don t see this sublety ignore it in peace for now and try again seeing it after the course Math 341 It is a healthy and hopefully simple exercise for you to prove the following Theorem If f is totally di erentiable at i then it has a directional derivative in each direction 17 and this derivative equals Dff17 As with partial derivatives even the existence of all directional derivatives in a point does not guarantee total differentiability as is seen in Homework 13 I used the symbol 17 for the direction vector and refrained from enforcing length 1 on it The idea I have in mind is that 17 may be a velocity Think of a function f f l gt f as a temperature function depending on a location i E R3 Now ifl start out at f thermometer in hand and move with velocity 17 l ll be at location i l t17 at time t My thermometer records the temperature at each time t That is it records the temperature at the location where I am at time t The rate of change of this temperature with respect to time is what we called directional derivative Of course if I move faster l ll experience faster temperature changes this accounts for the dependence on the length of 17 that is being hidden by the name directional derivative But more signi cantly the rate of change of the temperature will in general depend on the direction in which I am moving This issue is absent in single variable calculus because there is only one direction on the real line Negative direction is nothing but 71 times positive direction so it contributes no independent information about rates of change2 The notion of directional derivative is useful to understand why the total derivative has to be such a bulky object like a matrix It needs to have many pieces of information incorporated in it In single variable calculus every change dz in the input z is a multiple of one standard change 1 To tell how the output changes all that is needed is one number f m that gives the ampli cation of the input change dz into an output change dy f zdm of course in linear approximation only In multivariable calculus if there is a notion of derivative that is to tell you the rate of change of the output i as you change the input i this thing derivative must ask back In which direction do you change the input So it asks for a vector 17 and in response it gives you a rate of change Seen from this vantage point it is clear that the derivative Dff is not a vector even though it has as many entries as a vector Rather it is a vector eater You must feed it a vector and it produces for you a rate of change which is a number or a vector depending on whether 1 is scalar valued or vector valued This distinction is re ected in the row vs column distinction columns represent vectors rows represent vector eaters They are called forms in more advanced mathematical contexts but let s keep the more descriptive word vector eater just for fun for the purposes of this class In some MVC textbooks you will see this distinction omitted for simplicity Such simpli cation is perfectly good for crunching calculational problems but it comes at the expense of disconnecting the geometric intuition from the calculational formalism An outlook far ahead There are two upgrades39 of MVC that you may encounter in more advanced courses You may study infinitely many variables39 called functional analysis In that context the distinction between vector eaters and vectors becomes much more substantial and cannot be covered up by simply converting a row into a column 2In linear algebra language if you know it Pd say that there is only one linearly independent direction on the real line You may study multi variable problems in which the variables are coordinates describ ing a curved surface like longitude and latitude on a sphere We can do this already now But in a more advanced setting called differential geometry you may want to have the language reflect geometric issues precisely in particular you may want to be able to write objects that should be independent of coordinates in a way that doesn39t make formulas look as if things did depend on our choice of a coordinate system This idea is foundational for modern developments of physics and in particular gen eral relativity theory relies on the formalism developed in differential geometry Here the conversion between vector eaters and vectors is explicitly dependent on core geo metric information physically measurable information and cannot be made without reference to such geometric information Reference that is taken for granted without any discussion in the case of R3 In both of these situations you39d createa lot of confusion ifyou trashed the distinction between vectors and vector eaters The conviction underlying these course notes is that only a formalism that is ready to accomodate these generalizations naturally at a later time will be a formalism that is genuinely intuitive for the purposes here and now As a matter of fact the big book on Gravitation by Misner Thorne and Wheeler contributed a good deal to my own understanding of how intuition and formalism connect in MVC The Gradient In this section we consider only scalar valued functions 1 And we assume that f is totally differentiable Much of What we Will do can be done already if only the partial derivatives exist and some books define the gradient only in terms of partial derivatives regardless of total differentiability or not But such generality Will serve no purpose for us at the moment rather it would make the language clumsy When explaining some geometry highlights You will often see the partial derivatives being considered as components of a vector called the gradient written as Vf The symbol V is called nabla With the vertical convention for vectors Vf Dfi T The gradient is the transpose of the functional matrix of a scalar valued function Doing this is NOT in de ance of the geometric distinction l have stressed so far Rather the gradient has a geometric meaning of its own which we will explore The geometric distinction stressed before only amounts to insisting that the gradient is not the same as the derivative but rather is the transpose of the derivative We will explore the geometric meaning of the gradient here By implication if there is geometric signi cance in distinguishing Dff from Vf there must be some hidden geometric meaning in the innocent looking formal operation of transposing a row into a column It is not so easy to tickle this geometric contents out at the level of a MVC course The dif culty is of the same nature as the dif culty of explaining water to a sh The sh will understand better when he gets out of the water But I ll try it anyways just for the heck of it and for reference if you want to have another look later For reference let me quote the familiar here 81 8H0 vim DfET on 91W 9H9 9H9 8x1 8x2 8x1 7 Df an The directional derivative is 851 Df W V i 5 The product in Dff17 is a matrix product the product in Vff 17 is the dot product of vectors For the moment we now do x the length of 17 to be 1 since we will now be interested in effects of the direction of 17 only we ask the question In which direction 17 is the rate of change of 1 largest You may be inclined to use calculus to answer this question since it is a maximum problem after all But algebra does it much more easily We note from the Cauchy Schwarz inequality that Vff 17 If 17 actually has the same direction as Vfi then the dot product is equal to by the geometric de nition of the dot product or by direct calculation with 17 For all other directions the directional derivative is strictly less than HVfE Geometrically this is because then the cos 1p in the de nition of the dot product is stritly lt 1 Algebraically speaking we can see the same thing from a second look into the proof of Cauchy Schwarz If we do this we see that 7 17 Hill only if 7 2517 Assuming So here is what we conclude The direction of Vfi is the direction in which we have to go from f in order to experience the greatest rate of change The rate of change we experience in this direction is the length norm of Vf If we move at right angle to Vf then the rate of change experienced is 0 because in the dot product the cosine of the angle is 0 The following discussion is a tad informal and will become more rigorous after we have covered the multi variable version of the chain rule Assume we move not along a line E 2517 but along a level set on which 1 is constant by de nition of level set The derivative with respect to time t as we are moving is therefore 0 At any moment the velocity vector will be tangent to the level set because we are moving within the level set If the fact that we are not actually exploring 1 along a straight line but along a bent path doesn t cause trouble and the chain rule will tell us it doesn t we should still observe the directional derivative in direction 17 which is tangential to the level set Since this directional derivative is O we would have to be moving orthogonal to the gradient unless the gradient vanishes in which case it does not specify a direction at all This means that the gradient will always be orthogonal to the level sets of a function The following facts can be proved rigorously with more advanced methods but can and should be appreciated at this stage We consider a continuously di erentz39able function of two or three variables Could be more variables also but I want to refer to your geometric intuition Continuously differentiable means a differentiable b the partial derivatives are continuous functions In this case the matrix valued function f gt gt Dfi is continuous automatically Then the following facts hold for level sets of f 13 For two variables i At any point i where Vfi is NOT the zero vector the level set that passes through ff looks like a smooth curve graph of a continuously differentiable function y gm or z in some ball around that point i Look in particular at the level sets that were the solution of Hwk 11 13 For three variables f y At any point i where Vff is NOT the zero vector the 2 level set that passes through 1 looks like a smooth surfacee graph of a continuously differentiable function 2 gzy or y hm z or z in some ball around that point i At points where Vff 6 the level sets may look weird or untypical7 The following list of building blocks7 for level sets in two variables is not exhaustive but features the most common examples At points where Vfi 6 the level set may consist of a single isolated point or it could feature two or sometimes more smooth curves that are crossing each other The level set might also look like a smooth piece of curve giving no indication of the vanishing gradient We call any point where the gradient of f vanishes a cn39tz39cal point of f The relevance of this notion is the following If f has a local minimum or a local maximum at an interior point i of the domain of 1 then the gradient of f vanishes there Can you see why This can be seen using the single variable slice functions only Conversely the vanishing of Vfi is no guarantee that f has a minimimum or a maximum at A As in single variables where the vanishing of the derivative doesn t guarantee a minimum or a maximum either A new alternative to minimum and maximum that occurs with several variables is the possibility of saddle points A saddle point is one that looks like a single variable maximum in some directions and like a single variable minimum in some other directions The origin is a saddle point in Hwk 11 A level line that goes through a saddle point will typically have a crossing there We will require second derivatives to distinguish minima maxima and saddle points and this will be studied later Rules for differentiation in particular the chain rule The following simple differentiation rules carry over from single variable calculus and are easy to prove o The sum of differentiable functions is differentiable If h f 9 then Dhf Dff Dgf Similarly for differences 0 The product of scalar valued differentiable functions is differentiable If h fg then Dhf ffDgf Dffgf The products on the right hand side are of course scalar times matrix 0 The ratio of scalar valued differentiable functions is differentiable where the denominator doesn t vanish If h fg then Dhi fig Dg Dff 1 j o The singlevariable product rule carries over to the dot product of vector valued functions If h 1 17 then Wt Ht t N 705 The one rule that requires discussion and training is the chain rule Actually it also carries over without modi cation from single variable calculus if you rely on the total derivative and matrix multiplication consistently However most of the time you will use it in a form that involves partial derivatives and then it looksquot different from the single variable version Our approach here will state the chain rule in matrix form rst then explore what it means to get some understanding for its inner workings which includes an informal proof and nally provide a formal proof Let s rst review the chain rule from single variables Remember to distinguish the name 1 of a function from its output value 1 So here is a graphical representation of the functions f z gt gt fz and g y gt gt gy where you should remember that the names z or y chosen for the variables are arbitrary albeit common in the context in which we are using the functions here 901 v y gltygt We can concatenate these two functions by feeding the output of the rst function f as input into the second function g The concatenated function bears the name 9 o f NOT f o 9 because its value for input z is Compositions are to be read from right to left7 because in our notation the input value z stands to the right of the function symbol x M Ewe 06 WWW So 9 o f is the name for the whole assembly and g 0 Now if you change the input z to the function f by a small amount dz the output fz will be changed by an amount that is approximately f zdz This linear approximation is only useful if dz is suf ciently small The derivative 1quot gives the ampli cation factor for small input errors dz using linear approximation Rather than viewing f z as a number that gives an ampli cation factor we may view the derivative at z as a linear function itself that does the ampli cation The natural name for this linear function would be f z times because that s what it does it multiplies an input dz by f z function f z fz deviations in linear approximation near z dz f z dz Introducing the multiplication point after f z and viewing f z as a linear error ampli cation function7 is the key to understanding the chain rule intuitively This is true for single variables already but it becomes particularly useful for multi variable So let s understand the chain rule in these terms z 1 i y m 90 dz f Mdm 3 dy 9 y 9 ydy 9 ff dm 90 90f i 90f90 dm 9ff39 l gt 9ffd This is NOT a proof of the chain rule because we use as input into the second error ampli cation function not the actual deviation Ag but instead the approximate deviation dy an actual proof would need to give an account of how this error in uences the outcome The answer would be In linear approximation the effect cannot be seen it only shows up if we study better than linear approximations like 2nd order Taylor approximation 15 But apart from this proof detail this picture makes us understand why 9 o f m g ff Now the punchline is that the very same argument carries over almost literally to the multi variable setting This is a bene t of working with the total derivative as the primary object and Viewing the partial derivatives as parts of the total derivative rather than Viewing the partial derivatives as the primary pieces of information that need to be somehow organized into a matrix or vector or whatever The only changes that we need to make are 1 and 9 may be vector valued z and y may be vectors now and instead of f we have chosen to call the derivative D f The input error ampli cation is not merely achieved by multiplying with a number but rather by multiplying with a matrix This distinction is very natural because deviation in different input variables may have different effects on the output and matrix multiplication can achieve this effect whereas multiplication by mere numbers cannot So let s redo the previous picture in the new notation c7o f 4 go 2 di 4 D ltflt gtgtDfltazgt 4 D ltfltigtgtDfltigtdi When I put vector symbols over everything I do not mean to say that all these quantities must be vectors The scalar case is included as special case with 1 component vectors The different vectors may have differently many components if only the chain ts together For instance i may have 3 components and may have 2 components Then the input variable 37 for Q must also have two components else the chain doesn t t together but then the output may have any number of components The sizes of the matrices and D g7 are accordingly and the size restriction on matrix multiplication is automatically satis ed Now with the theory all neat and slick all we need to understand is what this matrix form of the chain rule means in practice for the crummy partial derivatives with which we do all the practical calculations Let s do this in an example We take a 3 Variable function g scalar valued and assume its arguments zy z are themselves dependent on parameters 3 and 25 Let s say z f1st y f2st and z f3st If we insert these into 9 we get gzyz gf1stf2stf3st hst So now h g o f f is a 2 Variable function whose values are 3 Vectors but I will omit the arrow on top of the f and they t into the 3 Variable function g which in turn has numbers as values Now we want to calculate 8183 and 8h3t in terms of the partials of the f and g The chain rule says Dhst DgfstDfst which written out in detail means ltstgt 87t gm ltstgtlltfltstgtgt 27067 gm W2 81 16 This can be written out as two equations 3h 39 7 5 9 8f2 6775 006773 Svt 8724006775 8 S 99 8f3 Svt a image 2 and a similar equation for the partial with respect to 25 Remember that the f3t inside 9 actually stands for three variables f13t f23 t f33 With the identi cation z f1 y f2 2 f3 that is usually done with the physicist s convention about functions and a common name like u for the output varaible of both 9 and h g o f this is often abbreviated as 314 7 314 3m 3u 3y 314 32 33 7 3m 33 3y 33 32 g This is how you will nd the chain rule in many books and many contexts l have deliberately started with an involved and detailed notation and then moved to this succinct and easy to remember version The reason is that this easy notation is ambiguous and it is only the context that resolves the ambiguity If you come to love the easy notation before having worked through the complicated one you will nd the issue of ambiguity in the curly 3 notation rather dif cult to stomach and in situations where a hidden ambiguitiy does cause errors it will then be very dif cult to clear up the confusion For the moment let me make one simple comment about this issue When we write 3u3z our notation expresses which quantity varies namely m but it does not tell us which variables remain xed namely y and If the duh answer all other variables other than z remain xed77 really is clear enough to tell you that the other variables are y and 2 then the context has resolved the ambiguity of the notation and this hapens in many cases but not in all In thermodynamics you can study the pressure of a gas as a function of volume and temperature or you can study it as a function of volume and energy content And then if you take a partial with respect to volume it is no longer clear whether the temperature or the energy content are to remain xed And this might make a difference Here is one obvious thing that can be seen from the above chain rule curly 3 terms cannot just be canceled as you would do with the dm s and dy s in single variable calculus And this is a very good reason why we use curly 3 s for partial derivatives as a reminder that formal cancellation yields WRONG results not just sometimes but nearly every time Applications of the chain rule 1 The statement that the gradient is orthogonal to level lines which we had discussed heuristically above follows rigorously from the chain rule Suppose t gt gt f t describes a curve within a level set of a function 9 then c for all t The derivative of this constant singlevariable function is therefore 0 By the chain rule a a 0 9ltflttgtgt Dgltflttgtgt lttgt Vgltflttgtgt Vt Now is tangent to the curve described by If we interprete t as a time is actually the velocity vector For a 2 variable function g the level set is typically a curve and so must describe part of this curve For 3 or more variable functions the level set is a surface or higher dimensional and the curve described by t gt gt lies in this surface But since this argument can be made for any curve within the level surface we still conclude that 17 Vgi is orthogonal to the tangent of any curve in that surface and therefore is orthogonal to the entire tangent plane in But this is exactly what we mean when we say a vector is orthogonal to a surface it is orthogonal to the tangent plane to this surface 2 By taking the composition of the 2 variable function product 1411 gt gt M with f w w one can obtain the single variable product rule as the 2 vector valued function x gt gt a consequence of the multi variable chain rule See homework Similarly a power rule for fz9m can be obtained 3 This example relies on the theorem fgzt dt f 396 dt If we remember that the integral is a limit of Riemann sums and that we can differentiate sums term by term this theorem becomes plausible but is by no means proved The issue is that this theorem pretends that the derivative of a limit of Riemann sums is the limit of the derivatives of Riemann sums In reality this theorem is only true under certain hypotheses l deliberately do not want to specify these hypotheses here They are more appropriately dealt with in advanced courses Easy versions give the result under restrictive hypotheses which in particular exclude im proper integrals fooo but many interesting applications want the result under weaker hy potheses that allow for f0 More useful variants of the theorem rely on a more sophisticated notion of integral In the present context and only here we are focusing on the mechanics of calculation with a pragmatic applied science perspective but all the while being aware that hypotheses are needed and are assumed to hold in our calculation In most contexts in which you will want to do these calculations the hypotheses will be satis ed So suppose we have fz gmtdt To nd f z we consider Fmy fgztdt Then 8Fzy8z can be handled by differentiation under the integral sign when the hy potheses for validity of this procedure are veri ed On the other hand 8Fz y8y gz y by the fundamental theorem of calculus The chain rule says f z Fmz lem wlym The latter partial derivative is 3 1 because y z Conclusion I I g a gm dt gm 89 t dt 1 This kind of example is among the most frequent usages of the MV chain rule in practical calculations with explicit formulas Cleaning up Notation a Bit Slots vs variable names Near the end of the section explaining the chain rule I referred to different cultures of nota tion Mathematicians in particular pure mathematicians prefer to give names to functions and these names differ from the names for the variables that represent the value of a function fm y is the value of the function f for input variables z As such it is an expression dependent on x and y and we can write its partial derivative with respect to x as W Physicists may give a name to the output variable say 2 with z fm y and they would write 37 instead However so far we have no notation for the partial derivative of a function itself regardless of the names or values of the input variables Suppose fm y 2 2zy3 How do I write the partial with respect to x at the point z y 2 73 Should I write af 3 I don t like 18 this because there is no z left in the numerator with respect to which I could differentiate Should I write 2 73 Better because now at least the order of operations is clear First I take a derivative then I plug in 2 73 But still 1 is the name of a function and the generic names for its variables are arbitrary I could have given the very same function by u v 742 l 274123 and then you would have written the same thing as 31173 The best llthink that I can do with the previous notation is to write afgmm y lmy23 and this is c umsy While you will see ef gy written as my this latter is a mixed notation While clearly conveys that we take a partial derivative of the function 1 which we subsequently evaluate at z y the function f itself does not stipulate that its input variables be given speci c names What we really mean with is that we take a partial with respect to the rst variable And it is only because it is customary to call the rst variable by the name of x that the notation identi es this fact There is a pure notation to indicate this we write 811 for the derivative of f with respect to its rst argument This is analog to the notation D f for the total derivative and to the Newton notation f for the single variable derivative Each refers to a function with no regard to what its arguments may be called To illustrate this issue let me give you an example where both notations are needed and where confusion would arise if we didn t have a clean notation Some functions have the property that its arguments can be swapped with impunity For instance 1 z y gt gt z y and g my gt gt my are such functions Let s call them symmetric for the moment More precisely a 2 variable function f is called symmetric iff fzy fym for all For instance hz y zyz yzz is symmetric but pm y my is not symmetric Now we want to show the following claim If f is a symmetric function then the function 9 de ned by gzy 8fzy8m 8fzy8y is symmetric You see since the very hypothesis reads fm y y m you d be doomed if you tried to identify slots by variable names Here is a clean proof W w 81f7y 82fmy So 9 811 821 We want to show that gz y gy Now gm 81fy7z82fw W y 8m Lg L 82fm7y81fm7y my It is at the sign marked with that we used the hypothesis that f is symmetric There is one more notation you will encounter Since the notation with fractions of curly 13 s is sometimes bulky you may see the subscript notation am is often used instead of 3 So I could have rewritten the above proof as follows 9mm 81fyw 82fyw 8yfyw az yw a 8yfm7y az w 821 m7y81fm7y 9mg Similarly in the physicist style variable notation um stands for 34 When u fmy you will also see the mixed notation fm in analogy to My best advice is that in your own usage you should avoid mixed notation altogether ie never identify slots by default variable names but be tolerant to the frequent occurrences when others use such notation I may be uptight on the notation issue but students do suffer in courses on partial differential equations when they have fuzzy ideas about multi variable calculus Proof of the chain rule In this proof m h k gm fgz are all vectors even though I don t adorn them with arrows In a preliminary consideration we prove that for a matrix T and a vector h we have the estimate HThH cHhH where the constant C depends on the entries of the matrix T For instance we can take C ZjTij2 This is a consequence of the Cauchy Schwarz inequal ity The rst entry of the vector Th is T11 hl Tlghg Tlnhn which can be written as a dot product of the vector T11 T12 Tlan with h Therefore its absolute value is less than the product of the norms or Th S 2739 TQM1H2 Similarly for the other components of Th Adding up these we get HThHZ S ZijTliMlhllz Next we want to show that DfgzDgm is the total derivative of f o g In other words we have to show that lm vo hhifQWD7DHMM MWWH0 hw MW Rewriting this using the 6 6 de nition of the limit we have to show For every 6 gt 0 there exists 6 gt 0 such that lt 6 implies Hf9 h 7 f990 7 Df9D9hH E SW C eqn G for goal Similarly we rewrite the hypotheses that H1 1 is differentiable at gz and H2 9 is differentiable at x as For every 61 gt 0 there exists 61 gt 0 such that lt 61 implies S Elllkll H wmmifQWD7D MMWH HD in particular for k gz h 7 gz For every 62 gt 0 there exists 62 gt 0 such that lt 62 implies ll9h797D9hll ngllhll H2 We now calculate Hf990 71 7 f990 7 Df9D9hll S Hf990 71 7 f990 7 Df990990 h 7 9H llDf99 h 7 996 7 DWEWH S Hf990 71 7 f990 7 Df990990 h 7 9H Mfll9 h 7 996 7 DWEWH where Mf is the constant that comes from the matrix T Dfgz in the estimate g We aim to show that each of the two terms in the sum on the right is sllhll 20 provided is suf ciently small For this purpose we choose 62 62Mf in H2 and require lt 62 This takes care of the second term For the rst term we want to argue that k gz l h 7 gz becomes small if h is small and then we use More speci cally since ll990 h 990ll S ll990 h 990 D9hll llDWEWl we use rst H2 with 62 1 requiring lt 62 for the corresponding 6 2 and we use that ngzhll S Mgllhll This guarantees h 7 S Mg So far we have achieved llf9 71 f995 Df9D9hll S llf9 71 f995 Df9909 h 990ll llhll G0 ll990 h 99Ell S M 1llhll provided min626 2 Now we ue hypothesis H1 with 61 82Mg l 2 and we get a corresponding quantity 61 such that llf990 71 f995 Df990990 71 9ll S ll9lt h990ll21 Ig 2 provided l h 7 lt 61 Now we strengthen our requirement on It to i 6 llhll m1n 626g 6 This guarantees that l h 7 Mg l lt 61 and therefore we do get llf9 71 f9 Df9990 h 9ll S S Ellgh h 9ll2Mg 2 S igllhll Merging this with the previous estimate G0 we obtain Continuous differentiability and higher derivatives It is possible to say A function f is twice differentiable if it is differentiable and its total derivative D 1 which is a matrix valued function x l gt D fm is differentiable again But for the purposes of a rst course in multi variable calculus this approach tends to lead to a somewhat bulky formalism Fortunately there is an easier way out and it relies on the fact already visible in singlevariable calculus that the class of differentiable functions with continuous deriuatiue is much more useful than the class of merely differentiable functions Whereas we have seen that total differentiability as such cannot be described in terms of partial derivatives alone continuous differentiability ie total differentiabilitywith continuous derivative can very well be described in terms of partial derivatives alone This is due to the fact that continuity of all partial derivatives implies total differentiability So we de ne A function 1 de ned on an open set of R scalar valued or vector valued is continuously di erentiable also called once continuously differentiable7 and abbreviated as Cl iff all its partial derivatives exist and are continuous functions 7 This is equivalent to saying that f is totally differentiable and the matrix valued function Df is continuous 21 With this in mind we can now go on to say A function f is twice continuously differentiable CZ if all its partial derivatives are once continuously differentiable similarly we de ne k times continuously differentiable functions Ck functions A fundamental theorem says If f is CZ then the order of partial derivatives doesn39t matter more precisely for an n variable function 1 that is Oz and any ij E 1 2 n it holds 8 8flt 17wngt 7 8 8flt 1w7ngt 870ilt 890739 gt7Ejlt 8 gt Similarly for Ck functions partial derivatives of order up to k may be carried out in any order We ll skip the proof even though it s not dif cult refer to a textbook if needed Just one simple example to illustrate the theorem Take fm y zzyem Then fz y 2 zzyem and fzy 2zem mzem Calculating in the opposite order we get fm y zzem and gy z y 2zem ew the same result It must be pointed out that the CZ hypothesis is crucial Here is a counterexample when the CZ hypothesis fails Take 7 W ifzy070 rm 7 5y if M m It is easy to check that fm 0 0 and f0 y 0 With this observation and the quotient rule applied in points outside the origin we get m4y4m2yaiys 7 7 12 y2 2 if 79 7g 070 5 3 2 4 1f m 0 0 0 if any 070 A quick conversion into polar coordinates shows that these partials are still continuous in the origin they are r times some trig expression in the angle 4p So 1 is a C1 function It turns out that none of the 2nd partial derivatives has a limit as x y a 00 so 1 is not CZ From 81f0y 7y we obtain 8281f00 71 From 82fz0 x we obtain 8182f0 0 1 So in this case the order of partials does matter 82fx7y 8mm Outside the origin where all 2nd order partial derivatives are continuous we have 356 9354 279352 47 6 8281mm 8182mm Such counterexamples play a surprisingly insigni cant role outside calculus textbooks The reason is two fold Either applications like partial differential equations work with contin uous di erentiability and then there is no counterexample or else one is in a situation in which continuous di erentiability is not a useful hypothesis In such a situation total di erentiability is often not a useful hypothesis either One then rather deals with a yet more general notion of di erentiability that looks at the function as a whole and is not concerned with the discrepancy in a single point like 00 in our counterexample Details on this matter must be reserved to graduate level classes in partial differential equations The Hessian Suppose we have a scalar valued function 1 Then Vf is a vector valued function lts derivative DVf is a matrix valued function We call it Hf the Hessian matrix So speci cally for a CZ function f azf 375 32H 3m amlamg M 31131 azf 375 32H amlamg 31 39 39 amazn HHQE azf 375 32H 31131 amazn 31 Combining the derivative and the gradient in this way allows us to use the two matrix indices for the two partial derivatives Note that our hypothesis that f is CZ guarantees that Hff is a symmetric matrix ie it is equal to its own transpose The Hessian will play a similar role in multi variable minimax problems as the second deriva tive does in singlevariable minimax problems To this end we note the following simple formula about a directional second derivative If f is a scalar valued CZ function then d2 T A wtvh e Hfmv Proof d E f Dfft1717UVfft17UTVfEtU d2 W m ta ETDVf tU17 17THfaZ17 In the second line we have used that the derivative may be moved past the multiplication with a constant matrix think why Minimax problems Suppose f is a scalar valued multi variable function Analogously to single variable calculus we say 1 has a local maximum synonym relative maximum at A if fA 2 fg7 for all 37 in a certain ball B7 A about A that are also in the domain of 1 Likewise we say 1 has a local minimum synonym relative minimum at A if fA fg7 for all 37 in a certain ball BTA about A that are also in the domain of f Reminder from single variable calculus If f z gt gt fz is a differentiable single variable function and has a local maximum or a local minimum at A and A is in the interior of the domain this domain used to be an interval then f A 0 If moreover the function is twice differentiable and has a local maximum resp minimum at A in the interior of the domain then f A 0 resp f A 2 0 7 Conversely if f is CZ and A is in the interior of the domain of f and f A 0 and f A lt 0 resp f A gt 0 then f has a local maximum resp local minimum at A 23 Now the good news is that this result carries over to multi variable calculus if we study directional derivatives in all directions 17 going out from A If f i gt gt ff is a differentiable multi variable function with scalar values and has a local maximum or local minimum at 1 a point in the interior ofthe domain of 1 then 85fi 0 for every direction vector 17 Equivalently Vfi lf moreover the function is twice continuously differentiable and has a local maximum resp minimum at i in the interior of the domain then 17THfi17 0 resp 17THfi17 2 0 for all direction vectors 17 7 Conversely if f is CZ and i is in the interior of the domain of f and Vfi 0 and 17THfi17 lt 0 resp 17THfi17 gt 0 for all direction vectors 17 31 6 then f has a local maximum resp local minimum at 1 The rst part is nearly obvious For if f has a local maximum at A then in particular each restriction of 1 onto a straight line through i has a local maximum there as well This restriction is a function 25 gt gt i 2517 and it has a local maximum at t 0 Then the single variable result about local maxima together with the formulas lt0fi 2517 85f f1 and the similar formula for the 2nd derivative involving the Hessian produces the statement about necessary conditions for a local maximum Not quite so obvious is the statement about the suf cient conditions If i satis es the single variable conditions for a local maximum in every direction then 1 does indeed have a local maximum at E1 The proof of this statement would use that we assumed the function to be twice continuously differentiable and some version of the mean value theorem somewhat similar to how we concluded total di erentiability from continuous partial derivatives We won t bother with a formal proof here Rather we will study a bit how we use this mini mummaximum test in practice This endeavour has a number of interesting and nontrivial quirks of its own Consider the function given by fzy 4 2z2y2 114 l 2112 7 2 Does it have local minima andor maxima There is no boundary to consider since the domain is R2 So all points are in the interior If f has a local minimum or maximum at any point z y then the derivative or gradient must vanish there ie both partial derivatives must vanish 81fzy 4m3 lmyz 7 4m 4mz2 y2 71 0 8yfm7y 4ny 4113 4y 414902 yz 1 0 The second equation is equivalent to y 0 since 2 112 l 1 never vanishes The rst equation then says z E 71 O 1 By de nition those points z y where the derivative of f vanishes are called cm39tz39cal points of 1 They are the only candidates where 1 could have an interior minimum or maximum We investigate the three critical points 710 00 and 10 Let s calculate the Hessian 7 12m24y274 8mg 8 0 74 0 Hflti10gtlo Bl Hflt00gtl 0 4 17THfi1017 8v 8v UTHfOO17 in 4v Now indeed 817 817 gt 0 for all vectors 17 171 17ng 31 6 and therefore 1 has local minima at i10 However 741 417 Q does not have a speci c sign for all vectors 17 Since 24 this expression fails to be 2 0 for all 17 eg 17 10T is a counterexample 1 cannot have a local minimum at 00 But Q also fails to be 0 for all 17 now 17 01T is a counterexample So 1 cannot have a local maximum there either Let s study a more complicated example m y gz 3 2x112 l 11 Again we look for relative maxima and minima We have the conditions for the critical points 8 8 f957y1433m22y20 f7y4my4y30 8m 8y The second equation is equivalent to y 0 or 112 ix Case 1 y 0 Then the rst equation becomes z 0 or z 7 Case 2 112 ix Then the rst equation becomes 14x3 3x2 7 2x 0 ie z 0 or z 7 or z The latter choice however can be discarded because it doesn t correspond to any real y So we have four critical points P0 0 0 P1 70 and Pzi 7 We need the Hessian at each of these points 7 42m26x 4y Hf7y7 4 4mm So l HfPOO 0 E HfP1 10 3 H PZi i22 4 So at Po the quadratic form UTHfP017 vanishes identically The 2nd derivative test is inconclusive At P1 the quadratic form 17THfP117 is 1 7 173 This is positive for some 17 eg 17 1 OF and negative for other 17 So P1 can be neither a maximum nor a minimum A point where the quadratic form is positive for some direction vectors 17 but negative for others is a saddle point In some directions the critical point looks like a minimum in other directions it looks like a maximum Finally at Pgi we claim that the quadratic form 17THfP2i17 1 i 4 U1 U2 417 will be positive for all non zero vectors 17 This means that Pgi are local minima Now let s look how I would see this positivity First consider a general rule of thumb approach The pure squares 17 and 17 have positive coef cients the diagonal entries of the Hessian so they tend to make the expression posi tive The mixed terms could contribute negative terms if the signs of 171 and 172 are chosen inconveniently If their coef cient is small they may not be able to out compete the pure squares but if their coef cient is large then they will win over the pure squares How big is too big Let s complete the squares 12751 j 4 1112 41 4ltU2 i vgz 12711 So clearly as a sum of squares this is non negative and it will be strictly positive unless 171 0 and the parenthesis vanishes too And this latter happens only if 172 0 as well So having shown that 17THfP2i17 gt 0 for all nonzero direction vectors 17 we have identi ed Pgi as local minima Symmetric matrices quadratic forms and de niteness properties We study here the algebraic task of distinguishing local minima and maxima and saddle points by means of the Hessian as encountered in the previous examples Let H be a symmetric n x 11 matrix Remember that symmetric means that H HT With such a matrix there is associated a quadratic expression in 11 variables 171 1 namely the expression 17TH17 Such an expression is called a quadratic form No I don t know of a good motivation for this choice of name For instance writing u 17 11 instead of 171 172 173 we have h11 h12 h13 U U U 11 h12 h22 h23 U hlluz h22 U2 hfng 271121 2h131lw 271231711 h13 h23 has w You can see that the diagonal entries of H give the coef cients of pure quadratic terms whereas the off diagonal entries give coef cients of mixed terms Given H our task is to nd out whether the quadratic form is positive for all vectors 17 or negative for all vectors 17 or positive for some and negative for other vectrs 17 And yes there are borderline cases eg where the form doesn t take on negative values but can take on a 0 value The following de nitions are common A symmetric matrix H is called POSITIVE DEFINITE if 17TH17 gt 0 for all 17 31 It is called POSITIVE SEMIDEFINITE if 17TH17 0 for all 17 A symmetric matrix H is called NEGATIVE DEFINITE if 17TH17lt 0 for all 17 31 It is called NEGATIVE SEMIDEFINITE if 17TH17 0 for all 17 Clearly H is negative semi de nite if and only if 7H is positive semi de nite A symmetric matrix H is called INDEFINITE if it is neither positve nor negative semide nite in other words if 17TH17 gt 0 for some 17 and 17TH17 lt 0 for some other 17 By considering in particular coordinate direction unit vectors 17 that have a single 1 and otherwise only 0 s as components we make the following easy observations positive de nite gt 0 t d t I I I gt 0 p081 weI seml 8 Im e then all Its d1agonal entrles must be negat1ve de nIte negative semide nite 0 If a matrix is None of the converses hold The off diagonal entries of H contribute mixed terms in the quadratic form and they could change the sign of the quadratic form against what the diag onal entries would vote for lntuitively the diagonal entries get the say about de niteness properties only if the off diagonal entries are not too large We ll quantify this in a moment Didactic decision I will give you two easyto use tests for any size matrix without proof and a proved and explained version for the case of 2 X 2 matrices A thorough study of de niteness properties would require the full Linear Algebra course as a prerequisite and then some work on top of it My outline will be such that it is workable nowl but becomes more coherent when you revisit it with full Linear Algebra wisdom it is written in sufficient generality that you will not attempt your own generalization from simpler special cases which would certainly lead to wrong guessesI But you may not have the calculational tools to exploit the full generality right now and then such calculations will not be required from you in this class First let s study a 2 x 2 symmetric matrix I C and see exactly when it is positive de nite We take the quadratic form au2 1 213m 1 c112 We know that a and also c must be positive if the matrix is to be positive de nite Assuming a positive we use completion of squares to control the mixed term b 2 b2 b 2 7b au22buvcvzaltu7vgt 77U2cU2altu7Ugt 17va a a a a So if a and ac 7 b2 are both positive then the quadratic form is positive de nite namely it s clearly 2 0 but for it to vanish we need both 1 0 and u bva O ie we need u v 0 Conversely if ac 7 b2 is not positive then the quadratic form is not positive for u U 7ba1 Conclusion I I J is positive de nite if and only if a gt 0 and ac 7 b2 gt O 7 Equivalently we could show this matrix is positive de nite if and only if c gt 0 and ac7b2 gt O b certain number is assigned to every square matrix and this number is called the determinant of that matrix There is a neat geometric interpretation as a signed area or a signed volume or higher dimensional signed volume of the determinant and there is a variety of effective ways of calculating the determinant of any square matrix of not too large size But we will skip all these and I ll just give you some basic facts The quantity ac7 2 is called the determinant ofthe 2 x 2 matrix a I J ln linear algebra a 1 The determinant of a 1 x 1 matrix a is simply a The only reason I am telling you this is that 1 x 1 matrices retrieve the 2nd derivative test for single variable minima as a special case of the 2nd derivative Hessian test for multi variable minima 2 The determinant of a 2 x 2 matrix is the product of its diagonal entries minus the product of its off diagonal entries det all an a11a22 7 a12a21 121 122 3 The determinant of a 3 x 3 matrix is a sumdifference of six products namely an 112 113 det 121 122 123 11101220133 11201230131 11301210132 11201210133 11301220131 11101230132 0131 132 ass The way to remember this mess is to copy the rst two columns to the right and add the NW SE products and subtract the NE SW products a 2 0121 1 0122 a a 2 4 In this course I will not teach you how to calculate determinants of an n x 71 matrix for n 2 4 I only warn you that you should not attempt to guess generalize the preceding formulas to larger matrices It would almost certainly be a wrong guess A linear algebra course will provide correct ways of getting larger determinants I do want to tell you that det7H i detH for an n x 71 matrix H here the 1 applies if n is even and the 7 applies if n is odd You may write this more concisely as det7H 71 det H Here is the correct generalization of the above test for positive de niteness of a 2 x 2 matrix 27 Theorem Hurwitz Giuen a symmetric matrix A of size nxn take the following sequence of determinants start with the top left corner then in each step add the next row and column and calculate the determinant in each step until you have calculated the determinant of the full matrix A Now A is positiue de nite if and only if all of these determinants are positiue Example an 112 ais 114 121 122 123 124 131 132 ass 0134 I41 I42 I43 144 is positive de nite if and only if all 0412 0413 0414 0421 0422 0423 0424 a 0432 0433 0434 0441 0442 0443 0444 all an all an ais a11gt0 and det gt0 and det agl agg agg a21 a22 a31 a32 ass gt0 and det and yes remember I haven t told you how to calc the last determinant and will not ask you to In practice you do not have to use the test in this order You could start with any diagonal element eg agg and then successively add one rowampcolumn at a time in any order but the same row and column number always together For example taking rows and columns in order 3142 we get that the above matrix is positive de nite if and only if all 0412 0413 0414 0421 0422 0423 0424 a3 0432 0433 0434 0441 0442 0443 0444 all an all dis a14 a33gt0 and det gt0 and det agl agg a34 a31 ass a41 MS 441 gt0 and det Warning Do not modify this test in any other way In particular do not swap the gt into lt and believe this tests for negative de nite It doesn t Rather to test A for negative de nite you test 7A for positive de nite You may use det7A 71 det A to see what sign implications this has for the determinants obtained from A 7 Also do not replace gt with 2 hoping to test for positive semide nite This would still lead to a false theorem There is another test that is much easier to use but that may be inconclusive whereas the Hurwitz test is never inconclusive Theorem Gershgorin Given a symmetric matrix A calculate the sum of absolute ualues of o diagonal entries for each row and compare it with the diagonal entry in this row If each diagonal entry is positiue and is larger than the sum of the absolute ualues of the o diagonal entries in its row then A is positiue de nite Note The converese does not hold if the condition is violated the matrix may or may not be positive de nite Example 9 71 3 72 71 8 5 1 3 5 17 4 72 1 4 10 is positive de nite because 9 gt 71 3 72 and 8 gt 71 5 1 and 17 gt 3 5 4 and 10 gt 72 1 gt0 gt0 This note is for students who have studied and digested all of linear algebra and it is provided for backwards reference at a later time It is not part of the required material of the present course A symmetric matrix is positive definite if and only if all its eigenvalues are positive Every symmetric matrix A can be written in the form A QDQT where Q is an orthogonal matrix ie QQT I and D is a diagonal matrix whose diagonal entries are just the eigenvalues of A and at the same time the eigenvalues of D A is positive definite if and only if D is positive definite These facts are key ingredients for a proof of the Hurwitz test they are also behind a proof of Gershgorin39s test Global absolute extrema The information that enters into the derivative tests for local extrema is of a local nature only To calculate the gradient and the Hessian of a function at one point needs knowledge of that function in a neighborhood of that point only It does not use info about the function far away from this point From this it is clear that the question of global maxima minima aka absolute maxima minima cannot be decided by means of derivative tests Note that we say a function f has a global absolute minimum at f if ff fg7 for all 37in the domain of f unlike for a local relative minimum competitors 37 are allowed to come from anywhere in the domain not only from a neighborhood of f A similar de nition applies for global maxima Lower division calculus provides very few tools how to nd global minima or maxima How ever there is one tool that is readily available and it carries over almost verbatim from single to multi variable Theorem If f is a continuous function de ned on a bounded and closed subset of R then 1 does have a global minimum and a global maximum somewhere on this set This theorem may look like a disappointment because it merely asserts existence of a global min and a global max without giving any clue how to nd them Nevertheless this abstract knowledge is in itself very valuable and can serve as the basis to nd them by means of other tools Note that all three hypotheses are needed the function must be continuous a requirement that is usually seen to be met obviously the domain must be bounded a requirement that sometimes causes us a headache and it must be closed ie it would include its boundary This means we often have to split up our search into two parts 1 an absolute minimum may be in the interior of the domain As it would trivially be among the the local minima we could retrieve it by the critical point test gradient 0 Or else 2 an absolute minimum may be on the boundary If we can describe the boundary points in terms of one variable less than the interior points as we usually can we may set up another minimum problem for the boundary alone 7 Merging the two prongs of our search we could nally argue that among the usually nitely many candidates that our search has netted either in the interior or on the boundary the one with the smallest value for 1 must be the absolute minimum We may even omit the test with the Hessian if we are after the absolute minimum only But it is only by the a priori knowledge that an absolute minimum exists that this procedure works Short of such a priori knowledge you may well have ltered out many points at which for various reasons 1 could not possibly have an absolute minimum leaving over say half a dozen points which could not be ruled out points where the gradient does vanish and a few boundary points and a few points where 1 isn t differentiable so the gradient test doesn t apply If you d now declare the one with the smallest value of f the absolute 29 minimum you d make a logical mistake It would be like charging the only person without an alibi with having murdered a certain deceased person but not establishing beforehand that the deceased person actually died as a result of foul play In one whodunnit the person had actually died of natural causes which meant the defendant was to be acquitted The homework gives an example where there is only one candidate for an absolute minimum and a very promising candidate for that matter but still there is no absolute minimum and the sole promising candidate is only a relative minimum Example We want to design a rectangular box of sidelengths z y 2 subject to the constraint as might be imposed by the post of ce that the sum z y z is at most 3 feet Within this constraint we want to have a box of maximum volume zyz In attempting to design this box we believe and will prove that we cannot max out the volume except by maxing out the length constraint z y 2 g 3 so that we have x y z 3 Indeed for any box of dimensions myz with z y z lt 3 we can take a larger box with dimensions 1 6z 1 6y 1 Z of sum still 3 but with larger volume We could for instance solve for z and maximize zy3 7 z 7 y under the constraints z 2 O y 2 O z y g 3 This domain is a triangle in the zy plane We have a continuous function on this triangle a closed and bounded set Therefore a maximum exists On the boundary zy3 7 z 7 y is zero and this is certainly not the absolute maximum So the absolute maximum is in the interior and there it must obey the vanishing gradient7 test 8mzy3 7 z 7 3y 7 2mg 7 y2 O 8ymy3 7 z 7 3x 7 2 7 2mg 0 Since we have already disquali ed any cases with z 0 or y 0 from being the location of the absolute maximum we can simplify this to the system 3 2x y 3 z 2y with the solution m y 1 Constrained minima and maxima The previous example had one aesthetic blemish The role of m y 2 was entirely symmetric but we arbitrarily selected 2 for elimination from the list of independent variables by means of the constraint z y z 3 This is only an aesthetic issue but in many more compli cated examples experience shows that messy calculations are avoided best by retaining any symmetry among variables that may exist in the problem Moreover in some examples using a constraint to eliminate a variable can make things really complicated calculationally or may even be impossible practically Consider the following type of problem Among all m y z satisfying a constraint gz y z O we look for one that maximizes or minimizes the expression fm y We say fm y 2 has a constrained local relative minimum at 0340213 under the con straint gz y z 0 if mo yo 20 satis es this constraint gz0 yo 20 O and fm0 yo 20 fm y z for all z y z in some neighbourhood of mo yo 20 that also satisfy the constraint We assume that f and g are C1 functions and we assume that the gradient of 9 does not vanish on the level surface 9 O This technical assumption in particular guarantees that the level surface 9 0 is smooth as we will discuss later At a constrained local minimum of f we cannot expect the gradient Vf to vanish The directional derivative in directions across the level set 9 0 may very well be non zero However if we go in any direction 17 along the level set 9 O ie tangentially to it we will expect the directional derivative 85f to vanish at the minimum More precisely Let t gt gt 39yt describe a curve within the level set 9 0 that passes through 30 x0y020 at t O in formulas 39y0 x0y0 zolT and g39yt E 0 for all t Then since the composite functiont gt gt f39yt has a local minimum at t O we must have lt0f39yt 0 Evaluating this by the chain rule we have Df39y039y 0 0 Now we can do this reasoning for any curve 39y passing through x0 14020 within the level surface and this way we can represent any direction vector 17 that is tangential to the level survace by such a curve 17 39y 0 3 Let s therefore sum up 85fx0y0 20 17 Vfx0y020 0 for any vector 17 that is tangential to the level surface gx y z O or equivalently for every vector 17that is orthogonal to Vgx0y020 Yet rewording this we can say Vfx0y0 20 is orthogonal to every vector 17that is orthogonal to Vgx0 14020 A moments re ection may convince you that this simply means that Vfx0 yo 20 must be parallel to Vgx0 yo 20 ie there must be a number such that Vfx0y020 Vgx0y020 We state this in full generality as a Theorem Let g R gt R be a C1 function whose gradient does not uanish on the leuel set So 0 Let f be a C1 function of n uariables in a neighbourhood of this leuel set S If f has a constrained relatiue minimum or a constrained relatiue maximum at 0 then there exists a real number called a Lagrange multiplier such that Vf f0 Vgf0 Note for those who know the notion of linear independence from linear algebra it won t be required from you if you don t If we have seueral constraint functions gl gk all C1 and such that at each point on the joint leuel set SO Elgl 91450 0 the uectors Vgllf 1 h form a linearly independent set and if a C1 function f has a constrained local minimum or maximum at E0 then there exists Lagrange multipliers 1 k E R such that Vff0 1Vglfo Angkf0 7 In short if you have seueral constraints you get a Lagrange multiplier for each constraint The only subtlety to be obserued is a hypothesis which is intended to guarantee that these seueral constraints are independent namely the linear independence hypothesis Behind a rigorous proof of this method is the lemma guaranteed by the theorem of implicit functions outlined in the next section that in some neighbourhood of the presumed rela tive constrained minimum 0 one can indeed in principle eliminate one variable for each constraint So while the proof is more akin to actually doing an elimination in principle practical elimination in formulas is not needed like what we had done with the post of ce box example the setup with Lagrange multipliers is designed to hide any elimination that was done in the proof that the method works in principle and obtain a system of equations in which no elimination has been carried out explicitly The homework gives some examples how the method works in practice We are omitting any discussion of Hessians in connection with Lagrange multiplers Some of this is discussed in the textbook by Marsden and Tromba In many cases the constraints restrict the domain to a bounded and closed set hence guaranteeing existence of absolute maxima and minima a priori or to a domain that is at least closed with a function to minimize that has a certain growth property so that all E that are outside a bounded set can be a priori disquali ed from consideration for an absolute minimum In such cases the distinction into relative minima relative maxima and saddle points is of 3We have neither proved here that the level surface 9 0 does have a tangent plane at xoyo zo nor have we proved that an arbitrary tangential direction vector can be represented as the velocity vector of an appropriately chosen curve within the level surface Rather we believe this on intuitive grounds having merely motivation in mind Once the result we are aiming at is stated based on our motivation we would have to provide a rigorous proof yet and it would be based on an advanced theorem called implicit function theorem which we will brie y discuss afterwards However mathematically rigorous proofs in this matter are best left to a more advanced course 31

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.