Special Topics in Atmospheric and Oceanic Sciences
Special Topics in Atmospheric and Oceanic Sciences ATOC 7500
Popular in Course
Popular in Marine Science
This 47 page Class Notes was uploaded by Jon Johns on Thursday October 29, 2015. The Class Notes belongs to ATOC 7500 at University of Colorado at Boulder taught by Peter Pilewskie in Fall. Since its upload, it has received 14 views. For similar materials see /class/232047/atoc-7500-university-of-colorado-at-boulder in Marine Science at University of Colorado at Boulder.
Reviews for Special Topics in Atmospheric and Oceanic Sciences
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/29/15
SUPPLEMENTARY LECTURE NOTES FOR ATOC 7500 MESOSCALE ATMOSPHERIC MODELING SPRING 2008 Author Tomi Vukicevic CHAPTER 1 DRAFT Inverse Modeling and Data Assimilation in Geosciences A Practial Guide Author Dr Tomislava Vukicevic tomz39slavavukiceviccoloradoedu Af liation Associate Professor Department of Atmospheric and Oceanic Sciences ATOC University of Colorado Boulder Colorado USA Date of this document December 2006 Introduction 1 Why inverse modeling and data assimilation in the Earth System sciences This question could be answered in two ways depending on whether one is starting form the point of view of modeling or the need to produce quasi continuous environmental data on a spatiotemporal grid Both views are presented in the following 11 Modeling in the Earth System Sciences Physical theories in the Earth System sciences are designed to explain and possibly predict natural phenomena The explanation by a theory is also a form of prediction as it states certain consequences for certain causes Both the explanation and prediction typically include quantitative representation of the natural system state Quantitative assessment of the actual true state is fundamentally achievable only by measurements The Geophysical theories are consequently designed to explain and predict the measurements The theories are most often expressed in a form mathematical relationships which define a model The model in general represents governing physical laws and includes a set of quantities which entirely define the state by the model The defining quantities are called control parameters The control parameters are initial and boundary conditions external forces and other physical quantities which define medium or environment for a process that is modeled The quantification of the system state by application of an assumed set of control parameters by the governing laws is called forward model of the system state or simulation of the measurements Obviously under conditions of well known governing laws and accurate quantification of the control parameters the forward model would produce accurate simulation of the measurements and would have ability to predict future states It is common however that the governing physical laws are known but the control parameters values are not This condition occurs in variety of models which are based on application of fundamental laws for macro scale phenomena such as conservation of energy and mass propagation of energy through media and bulk energy and mass transformations Examples of this type of model are found across the Geoscience disciplines such as in the Atmospheric Sciences Oceanography Biogeochemistry and Hydrology What is typically not well known are initial and boundary conditions some external forcing mechanisms and bulk properties of the medium in parameterized energy and mass transformations Because the model simulates the measurements it is natural to ask whether there is a formal and objective way to use the measurements to infer the correct control parameter values for the model This problem is called inverse problem or inverse modeling problem When there is a need to use the measurements to infer the control parameters frequently in a quasi continuous manner over time the inverse modeling problem is often referred to as data assimilation Thus the first way of answering the question Why inverse modeling and data assimilation in the Earth System sciences is To objectively correct modeled state of the system or a component of it by using measurements such that the model could be effectively used to analyze and predict the system The concept of inverse modeling and data assimilation for the purpose of I u and 139 u p 39 g model J has been used first in physical sciences in l7th century in works by Euler Lagrange and Laplace on calculating orbits of celestial bodies Lewis et al 2006 Gauss first formally described a method of data assimilation in his book on Theoria Modus Corporum Coelestium written in 1809 The data assimilation approach by Gauss is summarized in the following quote from the book If astronomical observations and other quantities on which the computation of orbits is based were absolutely correct the elements aso wheter deduced from three or four observations would be strictly accurate so for indeed as the motion is supposed to take place exactly according to the laws of Kepler and therefore if other observations are were used they might be confirmed but not corrected But since all our observations and measurements are nothing more than approximations to the truth the same must be true of all calculations resting on them and the highest aim of all computations made concerning concrete phenomena must be approximate as nearly as practicable to the truth But this can be accomplished in no other way than by suitable combination of more observations than the number absolutely required for the determination of the unknown quantities This problem can only be properly undertaken when an approximate knowledge of the orbit has been already attained which is afterwards to be corrected so as to satis all the observations in the most accurate manner possible In the chapter 3 we come back to the Gauss description of the data assimilation problem as it includes not only statement of the fundamental purpose of solving the problem but also key properties of the measurements and model Since mid l9th century the inverse modeling and data assimilation methodology continued to develop mostly in technical engineering applications where the primary problem was to either optimize a controlled system performance or to devise inference of signal over noise for measurements of dynamical systems Jazwinski 1970 In the Geosciences the data assimilation for improving the model simulations was first explicitly used in l980es in Numerical Weather Prediction NWP where there is direct need to improve initial conditions to improve forecast skill Daley 1990 Kalnay 2004 More recently new research is being performed in which other than initial condition parameters are improved by the inverse modeling and data assimilation methods Braswell et al 2005 Vukicevic et al 2000 12 Environmental data by the data assimilation The repeated data assimilation in the NWP applications naturally resulted in a time record of the Atmospheric states which have been used as data for other than weather forecast purposes For example the weather analysis data are used in physical analyses of short term weather phenomena as well as in climate analysis and research on the climate processes In late l990es the data assimilation emerged as necessary procedure in most other analysis of the Earth System states such as the analysis of oceans land surface soil atmospheric trace gasses and particulates land hydrology and biogeochemistry Every one of these data analysis includes a model which control parameters are repeatedly corrected by the assimilation of measurements Examples of the Earth System data types containing temporally and spatially distributed physical quantities produced by some type of data assimilation are Ocean physical state data of o velocity components 0 pressure 0 density 0 temperature 0 salinity Ocean biological and chemical state data of 0 concentration elds of nutrients o plankton 0 dissolved and particulate matter Atmospheric physical state data of 0 temperature 0 pressure 0 wind 0 humidity o cloud properties 0 precipitation Atmospheric chemical state and particulates data of 0 concentration of trace gasses o aerosols Land surface state data of 0 temperature 0 moisture o uxes of gasses and energy 0 optical and physical characteristics Land biosphere data of 0 land cover 0 physiological characteristics of plants Land hydrology data of o hydraulic conductivity capillary pressure drainage coefficient in wetland areas leakage coefficient for river runoff Soil state data of 0 temperature 0 moisture 0 physical and chemical characteristics These data are produced at operational environmental data centers andor at sponsored national institutes and University research centers In the USA the environmental data centers are sponsored or contained within all major national agencies NOAA NASA DOE and DOD Thus the second way of answering the question Why inverse modeling and data assimilation in the Earth System sciences is To produce quasi continuous elds of spatially and temporally distributed data of the Earth System such that these data could be effectively used for the assessment of the system and for prediction CHAPTER 2 DRAFT Inverse Modeling and Data Assimilation in Geosciences A Practial Guide Author Dr Tomislava Vukicevic tomz39slavavukiceviccoloradoedu Af liation Associate Professor Department of Atmospheric and Oceanic Sciences ATOC University of Colorado Boulder Colorado USA Date of this document December 2006 2 Inverse problem basics 21 Parameter and model space Let a model within scope of the Geoscience problems be denoted m with control parameters m where m is a set of physical quantities For each choice of the parameters which is the quantitative specification of the elements of m there is different realization or simulation with The space or manifold spanned by the different values of m is called parameter space The parameter space is populated with possible values of the parameters The existence of the parameter space ie the many possible values implies both the possibility of different parameter values for the modeled natural system and the stochastic nature of the parameter quantities It is shown later in this chapter that the two causes for the existence of the parameter space cannot be easily distinguished in the inverse problem The consequence of the parameter space is however the same That is the quantitative result of m for different choices of the parameter values renders a space or manifold called the model space Because the fundamental property of model is to predict measurements the existence of space of is interpreted as the existence of different possible simulations of measurements by the model which by design is intended to represent the same governing laws as the natural system that is measured This intent is not necessarily always realized 22 Measurement space The measurement space is simpler to define than the parameter space It is the space spanned by possible values of the measured quantity within the uncertainty range of the measuring procedure The measuring procedure could include multiple measurements of the same quantity or multiple measurements of different quantities but for the same realization of the natural system In the latter case the measurements constitute a multidimensional space similar to the control parameter set In the oscillator model 11 there is only one dimensional phase space represented by 5 The measurable quantity in this example is 11 where r is discrete time In the inverse problem there is explicitly derived dependency of the parameter to measurement uncertainty which is presented in the next section Here the interest is to discuss the consequence of existence of the measurement space spanned by the measurement uncertainties The range of control parameter values which would result from the measurement uncertainty is interpreted as in the forward problem as the range of uncertainty on the parameters This property emphasizes the critical property of the modelization of the natural systems When it is necessary to solve the inverse problem in the process of understanding and modeling of the natural system the uncertainties in the measurements would render the uncertainties in estimates of what controls the system as hypothesized by the system model The parameter space which results from the variable external causes leading to the variable parameter values is related to the measurement space in more complex way than the measurement uncertainties Each individual measurement is a recorded quantity of a response of an instrument to the medium that is measured The medium when measured is at one speci c state after one realization of the possible external cause In order to capture natural variability of the parameters in the inverse problem solution which is caused by conditions external to the model it is necessary to evaluate it from many measurements and different state realizations It is shown later that validity of an evaluated range of actual variability in the inverse problem solution would depend on three factors 1 abundance of measurements 2 size of measurement errors and 3 strength of sensitivity of the forward model to the control parameters Analysis of impact of each of these factors is important subject in specific applications as it addresses potential to distinguish different causes of the natural phenomena by the specific model and available measurements 23Probabilistic nature of information in the inverse problem The property of measurements to always have errors makes them random or stochastic quantities Consequently the model control parameters which would be derived from the inverse problem solution using the measurements would also be random quantities Even without the inverse problem the model control parameters could be random quantities if their values are uncertain The random quantity also called the stochastic quantity is a quantity which exact value is not known or predictable What is known about the random quantity is a possible value from a range with an associated probability Because the measurements and control parameters are by design the stochastic quantities relationships between these quantities in the inverse modeling problem and applications in the data assimilation problems must be derived based on the relationship between the associated probabilities 231 Interpretation of probability First let A be realization of a stochastic physical quantity with the numerical value from within an interval xx dx If there are many realizations of A it would be possible to derive probability of A as chance of occurrence of A A is then an event with probability PA for which the following classical axioms of probability apply PA gt 0 13 P0 0 If A and B are disjoint events PAUBPAPB 14 If A and B are not disjoint events PAUBPAPB PAHB 15 where PA m B is joint probability A distribution of probabilities over the space of possible values of A defines the probability distribution on that space Another important function associated with the stochastic quantity and its probability distribution is the probability density px which satis es PM jpxdx 16 where x represents coordinates indicating that in the general case the event A is a set of physical quantities included in A such as the set of control parameters or set of measurements The probability density function is of critical importance in the description of the stochastic quantities because when it is known they are completely described There is another intuitive way to interpret the probability of stochastic physical quantity in the inverse modeling problem The probability could be defined as in Tarantola 2005 as subjective degree of knowledge of the true value It is somewhat difficult to understand the emphasis on subjective knowledge in the Tranatola s definition but it is instructive to consider the interpretation of probability which uses the reference to the truth In this approach the uncertainty or error which renders the physical 12 quantities stochastic is measured as deviation from the truth It is shown later that even when the truth is not known which is most of the time the uncertainty de ned as deviation from the truth could be sensible approach to interpreting the probability in the results of the inverse modeling problems The probabilistic variables and relationships 13 7 16 are the same for either interpretation of the probability 24 General inverse problem and solution 241 Conditional probability The key relationship which links the probabilities of stochastic quantities in the problem of evaluating the control parameters by inversion from the measurements is most commonly expressed by the Bayes rule 1763 for conditional probabilities PBA PABPB 1 7 PM I where A and B are statistical events The rule is actually derived from the de nition of conditional probability PAnB PAB 133 18 The left hand side is red as probability of A given B From the de nition it follows PA m B PABPB PA m B PBAPA39 1399 The Baye s rule is then readily derived Assuming that the event B is from the control parameter space and A from the measurement space then the rule 17 is red as Conditional probability of the parameter taking values de ned by the event B conditioned on the measurement taking values as defined by the event A is equal to product of conditional probability of the measurement taking values defined by the event A conditioned on the parameter taking values as defined by the eventB and probability of the parameter taking the values as defined by the event B normalized by probability of the measurement taking values as defined by the eventA This relationship apparently allows to evaluate probability of the parameter as defined by B given the measurements as defined by A assuming that right hand side rhs of 17 is known The probabilities PAB PA and PB are hard to evaluate when based on the occurrence of events approach It is far more convenient to assume probability distributions associated with the space to which the events A and B belong ie the measurement and parameter spaces respectively As the distribution is determined by the probability density 16 the problem is then transformed to finding a relationship between the probability densities on the joint parameter and measurement spaces To arrive at the relationship which relates the probability densities instead of the 14 probabilities of individual events we take the approach from Tarantola 2005 of de ning the probability densities in the joint spaces of the parameters and measurements and their conjuction It is possible but not necessary to derive the desired relationship between the probability densities as generalization of the Bayes rule 17 This approach is taken in the literature on estimation and stochasting ltering theory which addresses the inference of state of modeled time evolving systems from discrete stochastic measurements Jazwinski 1970 Sorenson 1985 In the applications in the Geoscience problem examples of the use of equivalent to the Bayes rule for probabilities is described in Cohn 1997 Rodgers 2000 Evensen 2006 and Lewis et al 2005 In the stochastic ltering theory literature the generalization of the Bayes rule is derived by a limiting process in the joint space of the measurements and modeled state Jazwinski 1970 It is beyond the scope of this text to present the theoretical derivation and indebt analysis of the use of conditional instead of the joint posterior probability density functions In the present chapter the approach from Tarantola 2005 is adapted for easy interpretation of the origin of probability density functions on the parameter and measurement spaces which apply within wide scope of the Geoscience problems where the parameters and models of many kinds are used to analyze and predict the state in conjunction with vast variety of measurements 242 Conjuction of probability distributions It is shown in section 21 that there are two sources of information about the natural system under study These are the modeled and measured information Let parameter space be denoted M spanned by points mlm2 This space is transformed into a measurement space by a forward model y m 110 In the damped oscillator example is represented by equation 11 Let the measurement space as simulated by the model be denoted O O is spanned by points y1y2 The joint space Q M X 0 which is characterized by a joint probability density f m y is the space of all possible information available about a natural system under study given the model The joint probability density on Q provides complete description of the uncertainties and natural variability in the parameters and the result of these by the model simulations which is contained in the space 0 The joint probability density f m y could also include effects of modeling errors The modeling errors would result from the use of imperfect model For example the damped oscillator model 11 may be used to simulate damped oscillations which are in reality also driven by some unknown external harmonic force When the force is not included in the equation the model would be in error relative to the actual natural system and consequently relative to the measurements It is not trivial task to design or assume the effect of modeling errors when specifying the joint probability density f m y This problem is illustrated in the exercises The other information about the natural system is contained in the actual measurements which are independent of the model Let this information be in space denoted C There is ajoint probability density on the joint space 9 C gtltM denoted pmd Notice that pmy fmy The union of measurement spaces Oand C de nes total measurement space which is denoted D The joint probability densities f m y and pm y are both de ned on D New information about the system would be obtained when the two joint probability densities are combined by conjuction Tarantola 2005 chapter 15 pmyfm y 1 111 7 vmy pmy p0quot yfm y where 7 leM vm y is constant and Vis so called homogenous probability density pm y is a posteriori probability density on the joint space D X M resulting from the combined probability distributions The knowledge of a posteriori probability density is the most complete available quantitative knowledge of information about the natural system under study By this property the expression lll defines the general inverse modeling problem Evaluate pmy from knowledge of pm y fm y and Vmy pm y contains all available quantitative information of the system in the space M X D from which solution of the inverse modeling problem is to be derived To arrive 17 at the resolution we need to introduce de nitions of marginal and conditional probability densities and a priori information The marginal probability densities for any joint space with the associated joint distribution ga b in the space spanned by points a1 a2 b1 b2 are gm ngabdb 112 g3b Lgabda When the space 611612 is independent of b1b2 then gabg10g3b 113 The conditional probability density is de ned ab gABab g 114 gBb The conditional probability density is interpreted as the probability density of points in the joint space for which b ba Using 114 the joint probability density for the information given the model is fmyfymvm 115 where the marginal probability density in the parameter space is assumed to be equal to the homogenous probability density of the parameters The conditional probability density f y m is made of the results of forward model applied over a space of control parameters without knowledge of the measurements In figure 12 the discrete examples of this probability density are shown for the damped oscillator model The probability density pm y results from the information in the joint space of the control parameters and measurements without knowledge of the model It is natural to assume that these are independent From 113 pmy pMmpDy 116 The probability density pDy results exclusively from information about the uncertainties in the measurements The probability density pM m in turn results from information of the uncertainties or variability in the control parameters which is independent of the measurements This information is called a priori Under the same assumption as in 116 the homogenous probability density in 111 is Vmy MmVDy 0 Substituting 115117 into 111 renders 1W 118 m y my The solution of the general problem 118 is to compute the marginal probability density for the control parameter space Using 1 18 in 112 Mm ID pDypVEnyfym y L19 Tarantola 2005 pM m is interpreted as projection onto M The probability densities on the rhs of 119 are assumed known for the speci c application Geosciences and other physical sciences where parameters and models of many kinds are used to analyze and predict the state in conjunction with vast variety of measurements In the present definition of the joint space M gtltD with the associated joint probability density pmd the conditional probability density in for the parameter space could be derived from application of 114 assuming existence of m m y This assumption is somewhat difficult to interpret in the general case in which the parameter space is not the same as the modeled system state as in the stochastic filtering theory When the assumption is valid it implies Mm y pm ypDy 120 Combining 118 and 120 PMmPDyfy m 1 121 7 VDypDy pmy This expression implies that the conditional probability density of parameters conditioned on the measurements is obtainable from the independent information about quantities in the space of measurements and parameters The posterior probability 20 densities in 119 and 121 are apparently different but in either case the required knowledge about the independent stochastic information in the parameter and measurements spaces is the same Before addressing common choices in general and more specifically in the examples in chapters 4 and 5 it is instructive to consider what type of information may be most useful or interesting to derive from the knowledge of posterior probability density function 243 Estimation criteria In the practice with the Geoscience problems the parameter space is often large multidimensional space In this situation it is unfeasible to either evaluate or visualize 119 or 122 Instead characteristics of the posterior probability density function are used to define single best estimate or central estimate of the parameters Cohn 1997 Jazwinski 1970 Tarantola 2005 The commonly used central estimation criteria are a Maximum likelihood define by a discrete region or continuous point with maximum probability associated with the posterior probability density function The likelihood function is Lon 13 quot H dy b Minimum variance defined by the men of the posterior probability distribution m IM mpM mdm or conditional mean IM mpm ydm 21 c Minimum absolute distance de ned by the median of the posterior density distribution The choice of criterion would depend on the purpose of estimation and characteristics of the speci c problem 244 Conjuction of Gaussian distributions It is common to assume that probability density functions associated with model and measurement spaces are Gaussian or Normal The Gaussian distribution is characterized with only two statistical parameters mean x and covariance C px exp x ltxgtTC 1x xgt 123 27239 det 2 C In the model space 123 is fym exp y mTCy m 124 27239 det CS 3 while in the measurement space my exp y ym mm ym 125 27239 dethd 7 Using 124 and 125 in 119 22 pm kpMmexp m ymTC m ymm 126 Where d denotes actual measurements k is cumulative constant and CDCdCS 127 126 indicates that the convolving of the two Gaussian probability distributions in the measurement space is also Gaussian with the summed up uncertainties from the independent modeled and measured information represented by the covariance C D Problem 1 Derive 127 Appendix When it is further assumed that the a priori probability density function in the parameter space is Gaussian pM m xp m mTC1m 128 1 e 27239detCm7 then pMm const eXp Sm 129 SW glam ym TC31 ltm ym m m C m ml Sm is apparently the weighted sum of squares When the model is linear m E Fm then pM m in 129 becomes Gaussian with the mean and covariance respectively m mW CMFTFCMFT CDde Fm CS FTch cgyl W 130 Problem 2 Derive 130 Appendix 23 In the section on Kalman Filter technique 32 it is shown that solution 130 is also derived for the data assimilation problem by the stochasting ltering theory which addresses the inference of state of modeled time evolving systems from discrete stochastic measurements Jazwinski 1970 This theory is applicable in the Atmospheric sciences and Oceanography when the interest is to produce quanti cation of the atmospheric or oceanic state in geographical discretized space and over time Cohn 1997 Kalnay 2000 Application of the maximum likelihood criterion for the central estimate by 129 implies minimization of Sm The minimization of Sm is commonly referred to as least square problem which is treated in the chapter on variational techniques 33 24 CHAPTER 3 DRAFT Inverse Modeling and Data Assimilation in Geosciences A Practial Guide Author Dr Tomislava Vukicevic tomz39slavavukiceviccoloradoedu Af liation Associate Professor Department of Atmospheric and Oceanic Sciences ATOC University of Colorado Boulder Colorado USA Date of this document December 2006 3 Inverse modeling and data assimilation techniques 31 Monte Carlo Missing brief intro bit of history mention of kind of problems which are solved with this class of techniques It is easy to understand that when a space of stochastic quantity is sampled randomly many times it is possible to derive a distribution on that space from the sample This is the desired result in the inverse problem if obtainable The random sampling is called Monte Carlo sampling Mannon 1999 The difficulty is that for multidimensional spaces such as the space of parameters which is transformed into the simulated measurements in the inverse problems in the Geosciences there are large regions of insignificant resultant probability implying need to have very large samples To reduce the number of samples it is desirable to tend to sample regions which result in 25 significant probability Several sampling techniques which posses this property are brie y described in this section Even when made efficient the Monte Carlo techniques are practical only for problems with relatively small number of parameters order of tens or less and relatively fast models In the exercises chapter 5 the reader could experience performance of an efficient Monte Carlo inverse technique called Markov Chain Monte Carlo MCMC by Metropolis and Ulam 1949 with models of varying degree of computational cost and complexity Compared to other techniques included in this practicum the MCMC is by far the most costly 311 Metropolis In the inverse problem the interest is to sample pMm using 119 This implies sampling of the conjuction of pDd and f d m from random independent samples with probability distribution pM m One of most efficient techniques to do this is called Metropolis Metropolis and Ulam 1949 First it is assumed that each step in sampling is dependent only on the previous step This is called Markov Chain reference Second the sampling is random at each step which is characteristic of the Monte Carlo sampling but the move from one step to the next is controlled in the following way 0 If Lm Z Lml then accept the transition from ml to m o If Lm lt Lml then decide to randomly move to m or to stay at ml with the following probability of accepting the move 26 Lmj PH Lani 131 where Lm IDWdy is the likelihood function Obviously there has to VD y be an initial estimate of f y m to be able to evaluate the likelihood function for the test at mwmj pD y is typically not produced by sampling but is assumed to be known density derived from knowledge about the specific measurement uncertainties The initial estimate of f y m could result from a an independent random sampling typically referred to as burn in period or b approximation by a known density distribution function such as the Gaussian In the Gaussian case the likelihood function from 126 applies but the associated covariance is not known exactly When it is further assumed that the measurements are independent random quantities the requirement to use an approximate covariance results in simpler requirement to specify an initial approximate variance for each measured quantity The approximated likelihood function is then Zm const exp m ymm 7 A391 m ymm 132 where A is diagonal matrix of the combined measurement and model uncertainties as in 127 During the random sampling using the criteria of the Metropolis technique the approximated likelihood function could be refined by introducing new estimates of the variance 27 312 Simulated Annealing When the goal is to produce just the maximum likelihood estimate by random sampling the technique analogous to the physical process of annealing could be used The physical annealing consists of rst heating than slow cooling of solids to ambient temperature to eliminate stress in the material In the numerical technique labeled the simulated annealing an energy function is de ned Em T lnLm 133 where T is equivalent of temperature The energy is always positive The posterior probability density function M pm consrpMme T is at maximum when the energy function is at minimum The technique consists of slow change of temperature toward zero to render the energy minimum The energy function for the probability density distribution in 127 which results from the conjuction of Gaussian distributions is the misf1t function in the measurement space multiplied by a constant EmTTk m yTC m y 134 The Jquot of 39 J r J would render the covariance C D diagonal as in 132 The advantage of 132 or 134 is that the conjuction is explicitly and easily evaluated for anym To test applicability of 132 and 134 or to estimate the approximate variance it is desirable if feasible to roughly evaluate 28 fym or the conjuction pDyfym by a simpler Monte Carlo sampling technique For example this could be performed by the Gibbs technique Geman and Geman 1984 32Kalman Filter The Kalman Filter class of techniques have been developed for solving the problems of system control by estimation of time evolving state of the system from erroneous measurements In this class of problems the state of the system at one time controls the state at subsequent time Kalman 1960 Jazwinski 1970 The Kalman Filter techniques have been introduced in 1990 s in the Geoscience disciplines to address similar problem An excellent introduction to the application of Kalman Filter techniques in the Atmospheric sciences is given in the article by Cohn 1997 Central to the Kalman Filter class of techniques are the use of posterior conditional probability density and assumption that the prior modeled and actual measurement probability densities are Gaussian To connect to the general inverse problem theory as presented in chapter 2 following Tarantola 2005 recall that the posterior conditional probability density resulting from conjuction of information on the space M X D is expressed by 121 from which it follows Mm y py mpMm 135 pm ydy pm ydy pmy The relationship 135 is then used as the solution of the general inverse problem 111 It is already discussed in chapter 2 that the use of either the marginal or 29 conditional posterior probability density does not change the inverse problem The problem is to nd the probability density function which combines prior knowledge about the stochastic control parameters employed by the model with the stochastic measurements The assumption of Gaussian probability densities is less general It is reasonable to pose the question When is the use of Gaussian probability density valid The Central Limit Theorem on the properties of cumulative stochastic quantities partially helps in answering the question It states Central Limit Theorem Cumulative distribution of any set of independent variables with any distribution having a nite mean and variance tends to normal distribution This theorem is readily interpreted in the measurement space as a large number of different measurements of the same physical quantity would tend to produce normally distributed cumulative estimate of that quantity The problem may occur with measured quantities which are positive semidefinite y 2 0 with large probability near or at zero A transformation of variable 9 1ny 137 could help solve the problem if the associated probability density function pq is exactly or approximately Gaussian The validity of Gaussian probability density assumption in the parameter and model spaces is hard to justify for general case and must be addressed for the each specific problem Transformations similar to 137 may be used 30 Assuming the validity of Gaussian probability density functions and the use of conditional probability density as the solution of inverse problem the theory of discrete Kalman Filter for linear class of models is derived in the following way Let a discrete dynamical or state evolution model be W Eerlwierl Gkilgii39il 138 where w is a vector of numerical solution of a set of differential equations in discrete time and space Subscript k denotes discrete time point while superscripts t and 0 refer to truth and observations respectively The symbol w is used to denote the control parameters instead of m to distinguish the speci c type of the control parameter in the state estimation problem The equation 138 expresses transformation of the true state from time step k 1 to k by the discrete linear operator F H and the associated error relative to the true transformation This error is assumed to be a linear stochastic forcing GIHEILI In what follows the exact knowledge of deterministic error relative to the truth is not required but the knowledge of error statistics is The state evolution model typically does not integrate into the observation space This condition requires that the state is transformed or mapped as in 110 In the state estimation problem the transformation is written y ka 85 139 This transformation uses a discrete linear operator H from the true state to the observations The linear operator is assumed because the original Kalman Filter was derived for the linear transformations The linear term 8 in 139 indicates that there 31 are errors in the observations or in terms of the formulation in chapter 2 that the observations are stochastic quantities This error term could also include the errors which are associated with uncertainties in the transformation operator Substitution of 139 into 138 would render one model for simulation of transformation from the state at k ltime to measurements at k The separation of the model in two parts is motivated with goal to produce estimates of the state sequentially at discrete time points as new measurements become available after the prediction is made by the evolution model 138 using an estimate of the true state at previous times The evolution between k 1 and k from the estimate is written w FHw1 GHE1 140 The predicted state is then transformed into the measurement space y kakf g 141 The inverse problem is to find an estimate of the true state w from information about stochastic quantities W y 8 and 8 The conditional probability 136 is used for the solution The prior probability density pM m is the probability density of the predicted state in 140 The probability density of measurements given the model is the probability density of y in 141 The marginal probability density of the measurements is pDy2 The posterior conditional probability density is then written MW yZ1pyZ wi pw y const 0 0 Mn y 142 32 The conditional probability density notation for the prior and measurement terms on the right hand side is used to indicate that the knowledge about them is conditioned on the observations which were used prior to making the new prediction This property implies that the solution 142 is recursive Assuming that all probability densities on the right hand side in 142 are Gaussian the solution requires that the associated mean and covarinaces are speci ed From 140 the mean and covariance of pw y1 are derived respectively w y71gt Fkilltwfgi1 y71 GH 8171 y71gt Fkilw1 Ck LWHW1 wkil Cklg1Fklw1 w1 Gk7182711Tgt 143 FkileileTil 59463 Here it is assumed that the mean of the state evolution model error is identically zero ie the error is the white noise and that the mean of the probability density at k 1 is the central posterior estimate at that time which is then propagated forward in time The mean and covariance of py w and py y71are derived using 141 and 143 along with assumption that the model error is white and that the estimate and model errors are mutually uncorrelated y wgt ka1 R ngxy HwTgt yzy1gtHw s kaz HkW1XV1 w1TgtHkaH1 Rk 144 145 33 Using 143 145 in the de nition of Gaussian probability density function and substituting in 142 results in the expression for the desired posterior conditional probability density 1 7 1 pwk yk constexp EJ 146 where 1 y 7 waTR391y 7 waw wfTC 1w wf 147 y0 waTHPfHT R 1y0 wf It can be shown for example in Cohn 1997 that U T E i D J w w Kkyk ka Ck 1w w Kkyk ka 148 2 T 71 71 1 C HR HC 149 KCHTHCHT 11 150 Substitution of 148 150 shows that the posterior conditional probability density function is Gaussian with mean and covariance respectively w w CkHTHkaHE R 1y kakf 151 C HER IH Cf1 39 The matrix Kin 150 is called Kalman gain matrix because it is applied to the difference between actual and simulated measurements to correct the prior in 151 The mean and covariance in 151 could be obtained directly from the conjuction of Gaussian probability densities in section 244 as application of 130 to the problem in this section where the prior is produced sequentially from prediction using the posterior solution at previous time instance 34 The most celebrated property of the Kalman Filter is that the entire probability density function is predicted to produce the new prior because both moments of the Gaussian probability density function are advanced in time by 143 The Kalman Filter is however difficult to apply exactly even for exactly linear models when the number of elements N in the state vector w is large as it requires prediction of the covariance matrix of the size N 2 The filter could be applied exactly to smaller size problems order of 100 with efficiency that may exceed the performance of a Monte Carlo type algorithm Feasibility of the Kalman Filter theory is extended by introduction of the ensemble approach in section 34 33 Variational techniques It is shown in section 2 that posterior probability density function pdf is expressed pMm const exp Sm SW gluon ym TC31 ltm ym m m C1 m ml 152 under condition that the prior modeled and actual measurement stochastic quantities have the Gaussian pdfs This condition is used in derivation of the Linear Kalman Filter technique in section 4 together with assumption that the models which represent evolution of a dynamical system and the mapping or transformation into the measurement space are both linear These conditions together render solution of the inverse problem 35 in the model state space m w with Gaussian posterior probability density function The mean of posterior distribution in this case is the minimum of the cost function S m The Linear Kalman Filter technique is not however widely used in the Geosciences because it requires numerical evaluation and inversion of very large matrices and because it is derived for strictly linear model Instead of evaluating the entire posterior Gaussian pdf the inverse problem could be reduced to solving for a central estimate such as the mean or maximum In section 23 it is shown that the mean is central estimate under minimum variance criterion for any distribution In the Gaussian posterior pdf case the mean and maximum are the same This property implies that the mean also satis es maximum likelihood criterion If the condition of linear model is eliminated which eliminates validity of the Gaussian posterior pdf but the prior modeled and actual measurement pdfs remain Gaussian the minimum of cost function would still satisfy the maximum likelihood criterion This property suggests the use of cost function minimum to obtain the maximum likelihood central estimate for unknown posterior distribution The inverse problem of this kind is more general than the problem addressed in the Linear Kalman Filter but less general in solution as it does not attempt to evaluate the entire posterior pdf The minimization of a quadratic misfit function such as the cost function in 151 defines an entire class of problems in the estimation theory which is commonly referred to as Least Square problem Crassidis and Junkins 2004 Lewis et al 2006 Tarantola 2005 The least square problem was first introduced by Gauss 1809 to determine 36 planetary orbits from an orbit model and telescope measurements of the line of sight angles In this original work the sum of squares of unweighted differences between model and observations is minimized This kind of least square problem is commonly referred to in modern literature as unconstrained least square problem without weights and prior In the cost function 151 the absence of weights is equivalent to assuming that C D I I is identity matrix while noprior implies C 0 The unconstrained property implies that the minimization is performed directly on the cost function without additional functional relationship on the parameter space The least square problem defined by 151 is thus referred to as constrained weighted with prior The constraint is provided by a model such as prediction model 138 in the previous section The minimization of cost function could be desirable problem to solve even for the linear models if the inversion of large matrices could be avoided The least square problems linear and nonlinear could be solved relatively efficiently by Variational technique Kalnay 2004 Lewis et al 2005 The solution is obtained by use of the variational calculus which involves evaluation of directional gradients of the cost function Crassides and Junkins 2005 Tarantola 2005 Bryson and Ho 1975 There are numerous minimization techniques described in the literature on optimal estimation and control theory which make use of the cost function directional gradients In essence they use the following well known properties of functions Necessary and sufficient condition for a minimum of a function f x which is at least twice differentiable are respectively 2 10 and ggt0 6x 6x 37 where x is vector of independent variables equivalent of parameter space Second order truncated Taylor series expansion for f x at an arbitrary point x reads fxfx x xa f x xquot2 153 6x2 a Differentiation with respect to x and setting x x together with the necessary min 5 condition yields 1 62 f 1 afx xmin x 6x2 F 6x F 154 This relationship shows that a minimum of function is directly proportional to negative of the first directional derivative at an arbitrary point xi in the neighborhood of the minimum The use of second order Taylor expansion approximation is made possible by assumption that this point is not far from the minimum For the exactly quadratic function the second order truncation is exact The cost function in 151 is exactly quadratic if the model is linear and approximately quadratic in the neighborhood of the minimum if the model is nonlinear Consequently the relationship 154 may be used to find the minimum if these conditions are satisfied It is obvious that unless the function is exactly quadratic the minimum obtained in this way may not be global The nature of minimum should be examined for each application specifically if possible by inspection of extent of the neighborhood around the computed minimum within which SxjgtSxmin The neighborhood should be spanning the range of parameter values with large cumulative probability This condition is difficult to inspect 38 for large multi dimensional problems A natural test of improvement brought about by results of the inverse problem with the prognostic type model may be to verify the prediction against new measurements assuming that the modeling errors are small within forecast temporal range If the prediction is better with than without the solution of the inverse problem the inverse technique has skill implying that the assumptions used in the technique are at least approximately valid 331 Variational solution of constrained minimization problem with prior In chapter 1 it is discussed that many models used in the Geosciences simulate time evolution of the natural system state Assume that the model of interest is written as system of ODEs d ZMaG r 155 dr The models which do not simulate time evolution could be written in the same symbolic d form but w1th d Z0 1mply1ng steady state The model 155 is equivalent to the r prognostic model 138 in section 32 but the time derivation is expressed in the continuous form The model and model error operators M 101 and G r respectively are assumed in general nonlinear The system state vector of physical quantities 139 is in discretized space The solution of 155 is subject to initial and boundary conditions respectively 117A Zprior 2 1177 39 where Q denotes boundary of the spatial domain The solution is also dependent on vectors of physical parameters 011 and model error 81 In the inverse problem the measurements are contained in a vector of measured quantities in discrete spatial points at discrete times 1k 6139 A131quot as in section 4 The transformation from the system state space into measurement space is as in section 4 YTkhlkg 157 Unlike in the linear Kalman Filter the transformation function his assumed general nonlinear The cost function 151 for the system 155 157 reads 1 r S 3 apnmTC1a am I ITC1I 10 8TQ8150 weir 17A IW 1W T C 1M 1W Jr HUM yk 3 M20 yk6r rk7lr 158 51 1kis Kroneker delta function in time The prior is represented with four terms instead of one used in section 4 because the prognostic model solution depends on four types of control parameters Each of these parameters could be varied to render the cost function minimum The problem of nding the minimum of 158 under constraint given by 155 can be compactly written S Sx WCFO 159 40 Where X is m X 1 vector of all control parameters The necessary condition at the 65 minimum is a 0 The condition for differential variation in the neighborhood of the x minimum Crassidis and Junkins 2004 is then as axo 160a 6x 5w6 quot 5x0 160b x The minimum solution is then obtained by elimination of differential variations in each component of x from 160b and substitution into 160a This could be very difficult to solve as the functional relationship in the constraint could be very complex and even not known explicitly Lagrange derived transformation of the constrained minimization problem 160 into unconstrained by linearly combining equations in 159 to define new augmented functional FS1sz 161 A is Lagrange multiplier The necessary conditions at the minimum of F read T T ai c 4 6 1 g C03 6 1 10 162 6x 6x 6x 6x 6x 6F 51 0 163 61 l 162 is system of equations for unknown A while the second condition recovers the original constraint Solving the dual system 162163 is an elegant way of automatic differential elimination 41 A simple example from Crassidis and Junkins 2004 illustrates this property Example 7 Find minimum of S6Xi 2 3 under constraint zxy z 9y 42 4z 52 36 0 The differential elimination by solving the dual system 163164 is not applied explicitly in practice with large multidimensional problems but the solution of the system 161 is used In what follows it is shown that for the large multidimensional control parameter space the vector of Largange multipliers is obtained from the solution of 161 and that this solution is directly proportional to the vector of directional gradients of the cost function with respect to each control variable This property allows numerical evaluation of the directional gradient vector which could be then used in the relationship 154 or similar to compute the minimum To derive 161 explicitly for the system 155158 define first the augmented functional 42 F 5 j lrd Z I a G 1dr 164 rrAr dr The Lagrange multiplier is a vector of the same size as 5 The variation of augmented functional by the variation of what controls 5 is as follows was 6E 5E lr M a Gg njdr r 6E l T 50 Cal 2 aml ICS W y6r rt ldr PA 6a 61 j 58Tng g CD1hz yl5rrkdr A2 r T 1 a T 6 T 1 J EIQC17 IQ Zb 1 CD hly67rk d7 619 61 A3 17A 9 T T T 6 6h glim CHIP anm Elfim I all CD1 y67 Tk dr rrAr rrAr A4 165 43 an I A a 5l Egdr PM dr 6 68 r T T j KM EIT 5a7A 5a7A 5f 6M 1 58T 1 PM dr dr 6a 6a 61 68 r T T MIM j 5f d l 6M 1 dr E8T 6 6 Mr H 7 d r 61 68 31 139 Ar El 83 166 616l6a 6 5196 161PA 6158 60 619 61pm 68 167 Substituting 167 into 166 results in four terms within B2 in 166 one for each control parameter variation When combined with AlA4 in 165 it is easy to observe that there is common factor among contributions from each control parameter 11 W T 6h 7 1 d ra 3 CD hZ yk5r rk The variation of augmented functional is now written 44 a 5aTCalaaWgg T 6M J39615Cbllglbal 1Idr Q jagTng 2 6 11 168 8 62 Mr Ar Cm 613 At the minimum 6F 0 leading to a system of ODEs for A as in 162 Given than A is arbitrary additional conditions may be set for 2 without loss of generality The condition I 0 for all 139 results in new system of ODEs for 2 called adjoint system 511 6M T 6h 7 1 d1 a 1 5 CD hon yd e rk 169 where 1 L39 to indicate that the adjoint system is solved in reversed time from end to beginning of interval The adjoint system requires nal and boundary conditions They are set rm 0 0 0 170 The convenience of conditions 169170 is seen by substitution in 168 Given that the control variables are independent the condition at the minimum reads 45 T cm a A 0 T 6M Chlofg IbW 10 9 T Q lg Z G A 0 g MT AT C1ITM 1pm 0 171 where l is solution of l69l70 It is critical to notice that this solution depends on 5 Consequently it in 171 is the solution of 169 for min only This property implies that the dual system 155 and 169 still must be solved simultaneously to make use of 171 The way out of this dif cult condition which is Virtually impossible to overcome in practice with the large multidimensional models in the Geosciences is to observe that the solution of 169 for an arbitrary point 5 is directly proportional to the directional gradient of cost function at the same point This is easily seen from 171 but without setting 6F 0 For example 6F 6 rrAr Flvim glim EILMG 1PM C1IPA Ipnur when prior is neglected for simplicity the resulting relationship reads 6S alrrAr m 172 46 Bennett A 1992 Inverse Methods in Physical Oceanography Cambridge University Press pp 346 Evensen G 2006 Data Assimilation The Ensemble Kalman Filter Springer pp Jazwinski H 1970 Stochastic Processes and Filtering Theory Mathematics in science and Engineering Vol 64 Academic Press pp 376 Kalman RE 1960 A new approach to linear ltering and prediction theory Journal of Basic Engineering TransactionsASME Series D 82 3545 Kalnay E 2003 Atmospheric Modeling Data Assimilation and Predictability Cambridge University Press pp 341 Lewis J S Lakshmivarahan and S Dhall 2005 Dynamic Data Assimilation A least squares problem Cambridge University Press pp 680 Mosegaard K and A Tarantola 2002 Probabilistic Approach to Inverse Problems International Handbook ofEarthquake amp Engineering Seismology PartA Academic Press 23 7265 O Neill A PP Mathieu and C Zehner 2004 Making the most of Earth observation with data assimilation ESA Bulletin No 118 p 32 7 38 Rodgers C D 2000 Inverse Methods for Atmospheric Sounding Theory and Practice World Scientific Pub Co Inc Series on Atmospheric Oceanic andPlanetary Physics pp Tarantola A 2005 Inverse Problem Theory and methods for model parameter estimation SIAM Philadelphia pp 342 van Leeuwen P J 2003 A varianceminimizing lter for largescale applications Mon Wea Rev131 20712084 47
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'