New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Jordane Kemmer
Jordane Kemmer
GPA 3.79


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in Statistics

This 284 page Class Notes was uploaded by Jordane Kemmer on Thursday October 15, 2015. The Class Notes belongs to ST 790M at North Carolina State University taught by Staff in Fall. Since its upload, it has received 57 views. For similar materials see /class/223964/st-790m-north-carolina-state-university in Statistics at North Carolina State University.

Similar to ST 790M at NCS


Reviews for ADTP


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/15/15
Sii39nilarly 225 may be rearrai iged to give f A v546 2n WWW MWHVWM n A 2 2n WWIIj 9 will VWIj 9 ecoi id row is imgligible coi39npared with the first and so is neglected The first row of a vector linear combination of the rai idoi n variables 7017 9 W thj and so assui ning 224 its limit at 6 9 may be expressed in the form L w where if f is a Q dii neu isioi ial vector with coi npoi iei its 91 1 1 Q R is a J X Q matrix wl39iose j q coi npoi iei it is given by vsnwo gtd N0R90TW90R90 2 377 r71 72079 aeqU vie 228 Coi39isecmently if we delii ie W90 R9UTIV60R90e 339 we have verified coi iditioi is A1 and A2 of the information sal idwich fori39nula so the final result is given by coi nbii iii ig the fori39nulae 222 224 227 228 and 229 It should be noted that each of the i39natrices IV H 9 and R is explicitly defined once 6 is specilie so the resulting expr ssion for the limiting covariai39ice matrix of is coi39nputable In practice of course in making these calculatioi39is we substitute the estii nate in for the unknown value 0 Maximum likelihood estimation If we assume that we are sai npling from a Gaussian process then it is straightfor ward in prii iciple to write down the exact likelil iood function and l39iei39ice to i39naxii39nize it numerically with respect to the unknown parai neters Kitanidis 1983 and Mardia and Marsl iall 1984 were the first to advocat istimating spatial proc v in this way Tl ie evaluation of the likelil iood function requir coi nputing the inver and determinant of the i39nodel covariai39ice matrix if there are n sai npling points then this is an n X n matrix and the process can be slow if n is large Nevertheless the pr nt author has succe fully implmnei ited this procedure for n up to 500 so COI HleC39dCiOH39dl dif culties do not seei n to be adequate reasoi i to avoid this i39netl39iod Le l 39 are the sai npling proper es of max imum likelil iood estii39nates coi npared with sn npler alteri iatives such Cres e s VVLS 54 60 procedure In this section we shall first outlii ie the computational procedure and then discuss some of the pros and cons of i39naxii39num likelil iood estii39natioi39i We can ii39icorporate igCX391 Hi11iSCiC linear r gr sion teri ns with no essei39itial cl39iai39ige in the 139ntl 1odology so the model we shall consider is Z w J39X3 2 230 with Z an n di1 nnsional vector of observatioi is X an n X 1 matrix of known regressors q lt n X of full rank 3 a q vector of unknown regression parameters and 2 the covariance matrix of the observations In many applications we may assume 2 xi79 231 Wl l X a is an unknown scale parameter and VW is a vector of stai idardimd covariances dtr1 ninel by the ui iki iown parai neter vector 6 For exai39nple the xpo1 11 1tial variograi39n structure is equivalent to a covariai39ice fui39iction 0 1 if 81 2 cov Z Z 232 l 1 2 21 exp sl if 81 7 2 so we may delii39ie a 1 2 jgCU 1 the nuggetzsill ratio 9 R and let VW dei39iote the matrix whose diagonal ei itries are all 1 1 and olf diagoi ial ent are of the form vij exp dzj wl39iere dij is the distance between the i th and j th sai npling points Of course we assume VW is nonsii igular With Z delii39ied by 230 its density is 1 2mn 2r12 exp EM XmT21z Xm Coi39iseculently the negative log likelil iood is given by 1 13o 9 glog27r 2 logo 5 log VG L le X13TV61Z X13 233 calculatioi39i note that if for given V we define 3 XTV1X1XTV1Z 39nator of 3 based on covariance matrix V we have As a sic the GLS es Z X fvlX 0 and so Z XmTvHZ XB Z 193th XmTv1z X Xx Xe Z XBTv1ltZ Xx 3 13TXTV1XltB m 234 61 wl39iich coi39iliri ns that this cl39ioice of 3 indeed mii iii nims the geimralized sum of squares criterioi39i 234 and leads to a sum of squares of gmmralimd residuals which we shall denote by 672 Z XBTV 1Z XB 235 Returning to 233 we see that if we defii39ie 36 XTV91X1XTV1Z and the corresponding G2 by 6729 from 235 we have ac mew 9 log27r 9 logoi 310gwm lane 236 2 2 2 205 It is possible to minimim 236 numerically with respect to CY and 9 or alt2r1391atively to mii iii nim it analytically with respect to oz by defining G2 9 39 In this case we have to 13911i1 1i1 nim with respect to f the function we 13639 39 e 237 7 n n 1 n fl 2 fl fl 7 7 2 og 7r 2 og L 2 ogi 9 2 The quantity 236 or 237 is often called a pro le negative log likelihood to reflect the fact that it is coi39nputed from the i39iegative log likelihood 233 by mii39iimizing analytically over some of the parai39maters The IHQCllOd given l39iere is ntially that first proposed by Kitanidis 1983 and by Mardia and Marsl iall 1984 To alculate 237 the key element is the Cholesky dcgt139npositio1391 wl39iich ei iables us to write V LLT wl39iere L is a lower triangular matrix Hence if we write 230 and 231 in the form Z X13 77 77 w j0 05V and delii ie Z L lZ X L lX 77 L ln we have Z XV 77 77 w JV0XI so that the calculatioi39i of i re es to an ordinary least squares problem for ZX Also the 7 39 y because this is just the square of L and L is just the product of diagonal ei itries The autl39ior s i139npl1 nntation of this is based on the algq rithm of Healy 1968 to calculate the Cholesky decoi npositioi i followed by the SVDFIT algorithm of Press et al 1986 Section to solve the ordinary least squares problmn all within the DFPMIN algorithm of Press et al 1986 Section 107 to solve the fui39ictioi i 39 39 ion problem with respect to 6 DFPMIN is a variable metric algoritl39n39n requiring the specification of first order derivatives of the objective function well the fui39ictioi i n A g C O H O m a 62 itself but th i A can be approximated numerically In this form the algorithm is soi newl39iat simpler than the original proposals made by Kitanidis 1983 and Mardia and Marsl iall 1984 To sui ni narize the main steps of the algoritl39n39n 1 For the current value of 9 compute V VW and hence the Cholesky drco1 nposi tion V LLT 2 Calculate L l This is easy given that L is lower triangular 3 Calculate L of L Hence WI l l2 4 Compute 2 L lZ and X L lX which is simply the product of the diagonal entr 5 Solve the ordinary least squares problem Z XV 77 the residual sum of squares is 6729 6 Define go 9 by 236 or 99 by 227 so that g is the function which we have to minimma 7 Repeat each of steps 1 6 for each 6 or each 059 pair for which g has to be evaluated The minimum will eventually be acl39iieved at a point 9 or 39 and this delir ies the r39naxii39num likelil iood estn39nator 8 Define H to be the Hessian matrix ie the matrix of second order derivatives of g with respect to the unknown parammms 5 g at the ma imum likelil iood estii nators This is also known the observed ii39iformation matrix and in the case of a quas Newton algoritl n n such DFPMIN may be obtained approximately from the algorithm i The algorithm does not attempt to evaluate H directly but maintains an approximation of it which is improved the algoritl n n cor itir nms In this case in accordance with stai idard I39naxii39num likelil iood theory the ii39iverse matrix H 1 is an approximation to the sampling covariance matrix of the param ter estir nates In particular the square roots of the diagonal er itries of H 1 are approximate standard errors of the parame g astir nates Finally the minimized value of g may be used for likelihood ratio tests in comparing one i39nodel with another we shall see numerous exai nples of this subsequently Rom arks 1 Effective operation of the algorithm requires reasonable starting values One solu tion is to calculate the approximate WLS estimators first using tl iese starting values for the MLE procedure In the author s experier ice that level of care is not usually required but it is important to use starting values that at least repre mt reasonable j as of the MLEs One general piece of advice is to build up gradually from simpler models towards 63 more complicated ones using the estii nates from sii npler i39nodels to help gauge starting value for the more complicated models 2 It is also advisable to rei nember that ef cient operation of quasi Newton algoritl n ns such DFPMIN requires that the paraimzters be at least reasoi iably well scaled eg the algoritl n n will not usually work correctly if one parameter is varying on a scale 106 times ai39iotl39ier This could require attention in particular to choosing a suitable unit for distance 3 The eader may be woi39iderii39ig why we coi39iside ed two fori39ns of g one based on 236 and the other based on 237 instead of just using 237 which has fewer parai neters r son is that for som xa139nples later on 39 39 natrix does not have the form of 231 so in this case the sii39nplification afforded by analytic solution for a is not available 5 4 The ii39iterpretation of H in step 8 is not precisely in accordai39ice with stai idard asyl nptotic theory of i39naxii39nui39n likelil iood estii nates because it is calculated from a profile log likelil iood rather than the original log likelil iood function However it can be shown that the H matrix in this case has the same ii39iterpretation when it is d ned directly from the log likelil iood See Patefield 1977 AJ 1 and J39 J 1 39 7397 139 J 139 139 Although i39naximum likelil iood estimation appears to be computationally feasible opinion is still divi oncrning its desirability when coi npared with sii npler metl iods such the approxin WLS method due to Cr 39g section 223 Asymptotic prop erties of i39naxii39nui39n likelil iood imators were considered by Mardia and Marshall 1984 who sl39iowed that the usual asyl nptotic properties of consist nicy and asyl nptotic normality are satis ed under a form of increasing doi nain asympto ic e section 223 for a par 39 1 VVLS method ui ider this form of However the condi ions given by Mar lia and Marshall are not particularly e the case of an irregular lattice of sai nplii ig points and more s iously tl39 no indic39 tion of how large the samples need to be for asyi nptotic results to be reliable indicators of sai nplii ig properties Ai39iotl39ier issue coi iceri is possible i39nultii39nodality of the likelil iood surface An example given by VVari39ies and Ripley 1987 and repeated by Ripley 1988 suggc that this can be a problem even with the simplest spatial models In fact it would appear that the original example given by VVari39ies and Ripley was in error Mardia and Watkins 1989 presentd an alte native analysis of the same data set which is discus l in sectioi i 23 below Neverthw the possibility of i39nultii39nodality is real ari ng from 39 39 39 s in the first derivat of the log likelil iood shown tlmoretically by Mardia splmrical variograi39n i39nodel and a variant of the expoimi itial i g the profile likelil iood surface 237 well or ii39istead of finding the MLE by optii39nizatioi39i In the present authors experience multimodality is not usually a dif culty in low dii39nei39isioi39ial estii natioi i problei ns but even with parai39neter dii39nei39isioi39is of the order of 4 or 5 it can happen that parallel runs of the i39naxii39nui39n likeli hood estii natioi i routine starting from different initial values lead to dif zrent parameter pr 2 w g S g if 39 2 Pi L O H O m w a ltlt H 64 estii39nates It can also happen that with poor initial values the algorithm will not converge at all Given the various dif culties that can arise it is advisable to be Cl l l the sul of the algorithm by rerum iii ig from dilierei it starting valu and to be cautious about the results if tl39iese dif culties ar The the gtretical benefit of i39naxii39nui39n likelil39iood is that we can expect the tii nates to be more ef cint than the alternative methods in large sai nple However it is not clear how big a benefit this A simulation study by Zii nimrrman and Zimi nerman 1991 compare l the MLE approximate VVLS and a number of alternative imators coi39icludii ig that the MLE is only slightly superior to the approxii39nate VVLS procechlre from this point of view It has also been pointed out that the MLE procedure depends on the assumption of a Gaussian process and therefore may perform poorly wl39iei39i the true distribution is non Gaussian but of course this does not mean that the WLS procedur 39 perform better in this case The simulations of Zii ni nerman and Zimmerman 19 1 do not address this issue since they are astricted to Gaussian proceses The present author takes the view that the computational complexity of i39naxii39nui39n likelihood or its variant REML see next 39 is i39 f by its 39 H a very widely applicable method of estn nation by which a var ty of models can be estii nated and compared using either likelil39iood ratio tests or autoi natic model selection criteria such the Akaike Ii ifori natioi i Criterion AIC There is also the advantage that i39naxii39nui39n likelil iood methods link up naturally with Bayesian procedures will be furtl39ier explored in subsr ion 226 The various disadvantagr that have b n pointed out such non robustm wl39iei39i th e are outli in the data or the ble i39nultimodality of the likelil39iood surface are caveats to keep in mind wl39iei39i using the i39netl39iod but they are not reasons to abandon i39naxii39nui39n likelil39iood estimatioi39i Multiple replications The treatment so far has been based on the assui39nptioi39i that ii39iferei39ice must be based on a sing e realizatioi i of the random field Z Of coursa we can expect to get bett estii nates if there are i39nultiple replications of Z In the cl natological exai nples of chapt r 1 we have treated the data from year to year indepei39 lent which is equivalei it to assui ning that the f are multiple ii39idepei39idei39it replications se is of course only slightly dili39e mt from computational procedure explicit we e are m replications denoted Z1 Zm The i39naxii39nui39n likelil39iood procedure in this ca that in tl39 single replicatioi39i case but to make explain he 39 39 are Supposet Then 233 is replac 1 m 7 Amend 2 log27r 2 logal 2 logl 64 2a 2 4313vaer X13 Defining Z i Z we may write 2 XmTwm Hzi X13 mltZ XmeHZ xi 2 mumHZ Z 65 This suggests the following 139nolilication of the algorithm for m 1 to coi npute the profile e log likelil iood function 236 or 237 in this c 1 For given 9 solve the GLS problem for Z letting 6739 be the resultii39ig gei ieralized residual sum of squares 2 Calculate 1 626 656 7 22 ZV9 1Z 2 m 3 Substitute into 236 or 237 i39nultiplyii39ig by m to obtain the correctly nori nalimd pro le log likelil iood As an example we coi39isider a data set based on 32 years 1965 1996 of mean winter daily mii iii nui n temperatures coi39ilii39ied to the region of latitude 40 45 0 N and longitude 90 xamples 1 32 doubt about whether the pr was l39ioi39nogeneous evei39i within this si39naller regioi i but for the purpose of the pr 1t d c39 39 we shall assume the pro is homogeneous The X13 coi npoimi it of 230 was in n ost cases oi39nitted from the i39nodel but soi39ne i39nodels corresponding to a linear spatial trei id in the latitude and longitude coordii iates were also tried in this case 3 is a vector of dii nei isioi i 2 Maxii39num likelil iood estii39nates were coi39nputed for several i39nodels with the following results Model Spatial Nui nber of NLLH AIC Trend Parameters Expoi iei itial None 3 5483 10906 Expoi iei itial Lii iear 5 5483 10866 2 par Mat ri i None 3 5471 10882 3 par Mat ri i None 4 5484 10888 3 par Mat ri i Lii iear 6 5484 10848 Gaussian None 3 5353 10646 Wave None 3 5328 10596 Spl ierical None 3 5482 10904 Table 23 Evaluation of negative log likelil iood NLLH and Akaike Ii ifori natioi i Criterioi i AIC for several models fitted to 32 years of data at 17 stations Each of the glxp139l 139ltial Gaussian wave and spherical models ii39icluded a nugget parai39neter but the Mat rn i39nodel was fitted both without a nugget 2 para1 neter versioi i 66 and with 3 paran ieter From the NLLH and AIC values tabulatid it can be Mn that the Gaussian and wave i39nodels are substantially ii39iferior but tl39 other six i39nodels are indi tii39iguisl39iable by the quality of fit The two i nodels with linear spatial trend would be r on the groui that they do not ii nprove on the mod with no trei id while the th parai ntr Mat rn is similarly re ted on COI Hp39dX39lSiOH with the two parai39neter model Comparison of the xpoi39ii itial and two parai39imter Mat rn models leads to the following COI Han39iSOHS Model Parai neter quot quot quot Stai idard error Expoi iei itial R 1814 494 Expoi iei itial 038 010 Mat ri i 91 657 407 Mat ri i 92 28 03 expoi iei itial and two parai39neter xcall scussion following 232 The unit of di tai ice h in wl39iich both R and 91 are xpr d is 100 nautical m39 using the approxii39nate COI IV X SIO11 factors 1quot latitude 60 NM 1quot loi39igitude 08x 60 48 NM 08 is approximately cos 40 0 Recall our earlier rei narks about scaling the unit of distai39ice is taken to be 100 NM ratl39ier than 1 NM because the optii nizatioi i problem is 1391111 11 X391Cally more table in this case Restricted maximum likelihood The idea of restricted maximum likelihood or REML estii nation was originally proposed by Patt on and Tl39ioi39npsoi i 1971 in coi39ii39ictioi i with variai39ice coi39i ipoi ii its in linear models However a number of authors have poii ited out that the situation coi39isidered by Patti son and Tl ioi39npson is es antially the sai39ne arises with Gaussian models for spatial data in both cases the e is a linear i39nodel with correlated error wl39iose covariai39ice matrix depends on soi39ne additioi39ial parametc 2 Thus it is natural to try to separate the two parts of the SCiIHaCiOH problei39n the linear i39nodel part and the covariance structure part Cre a 1993 is one author who has ei39itl39iusiastically advocated this approach to spatial analysis The i39notivatioi39i behind REML estii natioi39i is perl39iaps best expr d in a very sii nple case Suppose Y1 Y are independent ui iivariate random variabl each JVUL 02 with unknown L and 0 As is well known the i39naxii39nui39n likelil iood estii39nators of L and 02 are Y and 82 a Y However this defii39iition of 32 is a 2 1 2 imator of I is j Y biased estii39nator wl39iereas the more usual ui39ibiased This it appears that the i naxii39nui n likelil iood estimator is not the best one to use in 67 this case Suppose however instead of basing the i39naxii39nui39n likelihood estimator on the full joint density of Y1 Y we base it on the joint density of the vector of coi39itrasts Y1 7 Y2 17 Yn1 37 whose distribution does not depend on u The i39naxii39nui39n likelihood estimator of 02 under this formulation turns out to be the unbias imator 372 Thus by constructing an estii39nate of 02 based on an n 1 dimnsional vector of contrasts we appear to have done better than the usual i39naxii39nui39n likelihood estimator based on the full n dirmansional data vector This idea can be xt1 1dl to the general i39nodel defined by 230 and 231 If we let W ATZ be a vector of n q linearly independent contrasts ie the n q colui ni is of A are lii iearly independent and ATX 0 then we find that W M0 ATEA and the joint magatiwa log likelihood functioi i based on IV is of the form 1 20 n q 2 n q 2 1 1 Wo 9 log27r log 051 5 log ATVA IVTATl A1VV 238 As poii ited out by Patterson and Tl iompson 1971 it is possible to cl39ioose A to satisfy AAT I XXTX1XT ATA I In this case a furtl39ier calculation first given by Harville 1974 shows that 238 may be simplified to n 71 1 2 2 31Vei 2e 20g I 2am Mme 1 log270 1 1 loga 5 log XTX 5 log XTV1X 239 where 6729 is the same in 237 This is i39nii39iii39nimd with respect to oz by setting 139 G2n q in which case 239 reduces to 1379 ww 9 129 n q in q n q 1 T 1 T 1 T log27r T log 5 log X X 5 log X V 6 X 240 1 n fl V G 7 2 cg I lt l 2 Coi nparii ig 240 with 237 it can be seen that there are two substai itive changes the coef cient of log 6729 has been changed from 712 to n q 2 and there is an additioi39ial term of log XTV1X Derivation of 239 We follow Harville 1974 Recall that A is an n X n 1 matrix and let G denote the n X 1 matrix V 1XXTV1X1 so that g GTZ Let B in other words 68 the n X n matrix fori39ned by placing the matrices A and G alongside oi ie anotl39ier Then IBI IBTBIW 7 ATA ATG GTA GTG I 4T 4ll2IGTG aT 4 4T 4 1 4TG12 12 the last line depei39ugling on a well known result for the determinant of a block matrix see for instance Mardia Kent and Bibby 1979 fori39nula A23j page 457 However after noting that ATA I AAT I XTXTX1X it may quickly be verified that GTG GTAATA1ATG XTX1 Thus B XTX12 Recall that the dei isity of Z ui ider 230 231 is fzz 27r 2a 2V12 exp Elam XBTV 1Z Xm 241 Delii39ie Zl BTZ ZTAZTGT PVT 371 The Jacobian of the transfori nation from Z to Z l is B1 XTX12 Moreover using 234 235 we have that z XmTv1Z XB 029 23 3TXTv1X 3 Now 6729 is a function of elei nm1ts orthogonal to 3 and hei ice is itself a function of IV Thus a cl39iai39ige of variables in 241 leads to Jew13yw I XT Xl1227r n2a n2I 7 12 1 2 1 A T T 1 A 242 exp 7G 9 709 13 X V X3 3 205 205 Now ii39itegrate 242 with respect to 3 leading to 1 fwm XTX1227rlt 2alt 2V12XTV 1X 12exp TG29 a from which 239 follows at once 26 Bayesian procedures Bayesian procedures to spatial statistics have been considered by a number of authors in particular Le and Zidek 1992 and Hai idcock and Stein 1993 The latter authors considered the i39nodel defined by 230 and 231 with the ii39nproper prior density 7r3 05 9 3c L6 243 a 69 for some prior 7r The posterior density takes the form 7r30 9Z 3c 27r 2a 2V9r1 exp 1 Z XBTV6 1Z Xm 33 Again delii39iii39ig 39 XTVG1X1XTVG1Z and ignoring coi istai39its equatioi39i 234 leads to 2 7r3 a 9Z xii 39 2V12 exp G 6 o 205 244 1 A A exp 708 3TXTV6 1X3 3 20 Ii itegratii ig out with respect to 3 we obtain 7ro Z 3c Le x 129 a 2VG12 exp a 12XTVG1X 12 245 and a furtl39ier integration with respect to 05 leads to MGIZ x MGWWl 12G2WM V2IXTV9 1XI m 246 which is the same equatioi i 32 of Hai ldcock and Stein 1993 Coi nparii ig 244 with 240 it can be seen that if we igi iore 7r9 in 244 the i39node of the posterior de 71C ly the REML e imator This was first poii39ited out by Harville 1974 and indmd follows at oi ice from 242 on writing out the joint density of Z in this form and then ii39itegratii39ig out with respect to K the result is exactly the same if we integrate 242 with resp t to 3 However a fully Bayesian approach involves not i39naxii39nizing 7 ating with respect to the COI HpOl l 11tS of G and in this respect the two i39rmtl39iods are quite dil arent The intgration with respect to 6 must be perfori ned nunmrically ltlt 0 5 Q2 f M V N ga ga V h V P Y H S cm 7 MINQE estimation Ai39iotl39ier 139nthgtd of estii natioi i is the metl iod of mii iii nui n norm quadratic estii39natioi39i or MINQE for short which was originally developed by CR Rao see for example Rao 1979 In coi39nparisoi39i with the other i39rmtl39iods we have con39 red MINQE is restricted in scope being coi39iliimd to a particular class of spatial estii natioi i problmns but within es of problmns it seems competitive with the other 13911 tl lOdS The following iption is based on the accounts of Kitanidis 1983 and Stein 1987 Suppose we write the universal kriging i39nodel in the form Z X13 7 70 The advai itage of this i39netl iod coi39npared with i naxii39nui n lik lihood or REML or even the approximate VVLS procedure is that for fixed 05 it is a lii iear imation procedure and th efore does not require any i Moreover in many 3 it appears that thr ve to th pe ation of a On the othr hand the optimality propert of the pro zdure hold only Wl l 11 a f and this suggests iterating the procedure using the currei39it estii nate f to defii39ie a for the next iteratioi39i Kitanidis 1983 Sl lOW jl that if this procedure is iterated to COI lV X e then the MINQE estimator satis es the same equatioi is those satisfied by the i naximui n likelil iood estimator Zii i ii i iermai i and Zii ni neri naii 1991 proposed a compromise in which a crude i timator of 9 was taken for the first estii nate and tl39iei39i iterated oi39ice only With tl39ie i X f111 1 11 11ts the i39netl iod is competitive in performance with the ML and REML procedur is but more coi nputationally dei i iai idii ig than the approxii39i iate VVLS procedure 23 Examples Table 25 gives a data set ai39ialyzed by Greg quot a 19891993 The data consist of water levels in the Wolf amp aquifer in soutl i west Texas Eigl ity five i39neasurei39nei39its were taken by drilling into the ground and locating the ght the 1 and y coordinates represent i39niles from an arbitrary origin and the Z coordinate is the water level in fe above sea level The original intr in this example was the proposal to build a nuc ear was repository in Deaf Smith County which is near the anter of the mapped 71011 It is believed that any leakage of nuclear waste will fiow with the water in the aquifer so tl39iere is interest in recoi39istructii39ig the shape of the wl39iole water surface In fact the analysis Sl lOWQd fairly directly that tl39iere is a steady slope from the soutl i west to the i39iortl39i east and that leakage from the proposed site would flow directly into the city of Amarillo This is described in soi39ne detail in Chapter 4 of Cressie 1993 We coi39icei39itrate l39iere on the spatial 139110 fitting aspect of the problem a 2 Cre sie l39iii39nself gave two analys The first assui ned that the pro was intrinsi cally s ationary but because of the obvious anisotropy u d a geoi ne ally anisotropic i39nodel From this he used a kriging algorithm section 24 to reconstruct the surface This appeared satisfactory but gave a very irregular recoi39istructed surface A secoi id tecl39i i39iique was to use ian polish krigii 39 also to be discuss l in sectioi i 24 a crude i39netl iod of rei novii ig an ui iderlyii39ig trend The iduals from this trei id did appear to fol low a stationary isotropic model and were satis actorily fitted by the spl39ierical variogram model Ordinary kriging was then applied to tl39iese residuals and ad d to the trei id sur face obtaii39ied by i nediai i polish kriging to obtain a secoi39id recoi39istructed surface This was similar in gei ieral cl39iaract to the first i39netl iod but did produce a i ioticeably si nootl ier surface n In Fig 210 we show the variograi ns Vt 270 for the raw data coi nputed by both the i39netl iod of i i ioi nei its MoM and the robust i39netl iod To accoui39it for the ai iisotropy separate variograi39ns we oi39nputd for pairs of points wl39iose relative oriei itatioi39i lay in the SW NE quadrant and those in the SE NW quadrant The figure is similar to Fig 72 Also shown are fitted variograi39ns by the approximate VVLS 139nthgtd based on 216 using the power law model In contrast to Cresquot iquot anal is which forced a coi ni noi i value of the power A this analysis fitted the power law model s parately to all four plots In fact the two MoM produced A 20 and A 19 Wl liCl l is con 39 ent with a coi ni noi i A Cr claii39ned A 199 but did not rei nark on how close thi to the upper boundary for this to be an intrin cally stationary model the upper boui idary is A 2 In coi39itrast the two using the robust method of variograi39n calculatioi39i produced substantially dif arent variograi39ns note that the gures are not all plotted on the me scale and estii nates A 28 for the SVV NE directioi i and 26 for the SE NVV direction both well beyoi id the peri nitted range Of course we could force a valid lit by constraining A lt 2 in the WLS algoritl n n but this would not addr ss the question of whether an intrinsically stationary i39nodel is reasonable I believe that the gliscrepai39icies lmtwmn the MoM and robust variograms and the results of the power law lit provide ample evidence that it is not Fig 211 indicates an analysis soi newl39iat dil arei it from Cress s based on a i39nodel of the structure of 230 in which the X matrix repr Vited regre so given by the 1 and y coordii39iates ie we are assuming a linear trend surfa with com ted errors The form of correlation function was cl39iosen so to be COI ISISCQHC with an exponei itial variograi39n model For an initial analy 39 mated by an ordinary lea c 2g ion analys39 and the r iduals from that regr sion ai39ialyzed a spatial model In th tl39iere was no evidence of anisotropy and Fig 211 shows the MoM and robust variogran estn nators together with the VVLS lit separately for each variograi39n and the ML and REML For the ML and REML fits the original i39nodel 230 was taken ie we did not rely on residuals from an OLS fit of 3 The ML and REML are both reasonably close to the WLS t in fact based on a visual inspection the ML t appears closer both to the WLS lit and to the individual variogram points J Fig 212 coi itii iues this analysis by showii39ig both the Mat ri i and wave i39nodels fitted by ML and REML The Mat ri i fit is based on 2 029 ML fit or 027 REML lit in this model there appears to be no need for a nugget parai39neter which was in fact esti1 natd 0 The wave model was sug 39 ted by the apparent o quotllatory shape of the variogram points at large 13 In fact the litt l variogram does not n to follow this shape too well but still gives he best model judged by i39naxii39num likelihood over all models However there is no sigi iilicant differei a ai39nong the leading i39nodels shown in Table 26 1 y z 1 y 3 427827 1276228 1464 1032663 203424 1591 273969 907873 2553 143107 312654 2540 11629 848960 2158 181345 301812 2352 186182 764520 2455 181215 295324 2528 964655 645806 1756 98880 381448 2575 1085624 829232 1702 121634 391108 2468 564535 1805 116575 187335 2646 900421 392582 1797 616912 329491 1739 931727 330585 1714 695790 338084 1674 976110 562789 1466 667221 339326 1868 906295 350817 1729 366545 1509146 1865 925526 417524 1638 195510 1377840 1777 994900 591578 1736 212979 1318254 1579 240674 1847664 1476 223617 1371368 1771 260629 1140748 2200 211472 1392620 1408 562784 268483 1999 76846 1268375 1527 730388 188814 1680 83323 1077769 2003 802668 126159 1806 567072 1712644 1386 802301 146180 1682 590005 1645486 1089 1077742 1306 689689 1772482 1384 763992 959938 1722 709023 1613814 1030 644615 1103964 1437 730024 1629896 1092 433966 536150 1828 596624 1701054 1161 390777 619981 2118 618725 1743018 1415 1128045 455477 1725 637081 1739145 1231 542590 1478199 1606 56271 790873 2300 61320 483277 2648 182474 773919 2238 38047 404045 2560 856882 1398170 1038 22305 299111 2544 1050765 1320318 1332 23618 338200 2386 1016428 106511 3510 21889 336821 2400 1452365 280233 3490 632243 794992 1757 739931 879727 2594 107786 1751135 1402 944818 866261 2650 189889 1719169 1364 888498 767099 2533 385788 1585274 1735 1202590 807648 3571 831450 1591156 1376 860245 2811 218025 150255 2729 727910 430922 2728 235646 94144 2766 1001737 428988 3136 201130 220927 2736 788354 408214 2553 Table 25 Wolfcamp aquifer data 1 y z 1 y 3 166265 172562 2432 836906 465048 2798 299075 1751288 1024 956166 358218 2691 1009157 229781 1611 875548 293927 2946 1012954 229639 1548 Table 25 coi itii iued Model Order ML REML Expoi iei itial 1 1480 1371 Gaussian 1 1475 1366 Mat ri i 1 1483 1376 Wave 1 1494 1377 Spl ierical 1 1484 1375 Mat ri i 2 1523 Mat ri i 3 1561 Table 26 ML and REML to various models The tabulated value is log maximum ML or REML The last two rows in Table 26 show for ML estii39natioi39i in the Mat ri39i model only the results of xt1 1di1 ig the analysis to ii39iclude a quadratic or cubic trei id in both cases tl39ier soi39ne ii39nprovement in the fit but not Sigl li callC judged by the usual x2 te SW NE MOM E 0 5 10 15 20 t SENW MOM 5 4 3 E SW NE Robust 0 5 1o 15 20 25 t SENW Robust 6 A 4 Fig 210 Fitted variograms from Toan data Raw data with power law models MOM Variogram 014 WLS fit 012 MLfit 010 REML fit 008 Vt 006 004 002 00 Robust Variogram 014 WLSfit 012 MLfit 010 REMLfit 008 Vt 006 004 002 00 0 5 10 15 20 25 Fig 211 Fitted variograms from 11thde Toan data Expmmntial variograms tted by WLS ML and REML Intatllods with MOM and robust variograms MOM Variogram 014 MLMatern RENVave 012 REMatern 010 MLNVave a 008 g 006 004 002 00 I I I I I I 0 5 10 15 20 25 t Robust Variogram 014 MLMatern gt RENVave 012 REMatern 010 MLNVave a 008 gt 006 004 002 00 I I I I I I 0 5 10 15 20 25 t Fig 212 Mat rn and wave models tted to demanded Toan data Our second example is based on the data in Table 27 which were originally given by Davis 1973 and have been re ai ialyzed by amongst others Ripley 1981 1988 and Hai ldcock and Stein 1993 The data are 51 measurei39mants of the height of the earth s surface within a 310 foot square The 1 and y coordinates have been expre ad in units of 50 feet For the following analysis the variable 2 will be replaced tl iroughout by z 100 for reasons of numerical stability 1 y z 1 y z 3 61 870 52 32 805 14 62 793 840 24 61 755 3 24 890 62 690 20 27 820 57 62 800 38 2 3 873 16 52 800 22 875 29 51 730 6 17 873 53 728 15 18 865 57 710 21 18 841 48 56 780 21 11 862 53 50 804 31 11 908 62 52 855 45 18 855 2 830 55 17 850 9 42 813 57 10 882 23 48 762 62 10 910 25 45 765 4 5 940 30 45 740 14 6 915 35 45 765 14 1 890 41 760 21 7 880 49 42 790 23 3 870 820 31 0 880 9 32 855 41 8 960 17 38 812 54 4 890 24 38 773 60 1 860 37 35 812 57 30 830 45 32 827 60 705 Table 27 Davis data Ripley 1981 sl iowed contour plots of fitted surfaces from lii iear up to quintic Wl liCl l demonstrate that tl iere is no sii nple lol nil lal lt trend the 3 appears to be in the previous example Fig 213 shows MoM and robust variograms with tted power law curves computed separately for the SW NE and SE NW quadrai its in Fig 210 In this case 79 tl39iere does not seem to be an argumei it about the validity of an ii39itii39isically stationary assumption all four fitted values of A are well below 2 but the strong anisotropy is disturbing The same plots fitted to the residuals from a linear trei id are better Fig 214 but still not satisfactory it appe39 that tl iere is much more persiste z 39 SE NVV direction than the SVV NE In this case an xpo1 i1 itial i39nodel has been fitted Wari39ies and Ripley 1987 made the claim repeated by Ripley 1988 that this was an exai39nple of a i39nultii39nodal likelil iood They fitted an isotropic expoimi itial variogram with no nugget to the raw data ie no trei id and produced an apparei39itly irregular profile likelihood for the range parai39n39neter R However the same model was fitted to H same data by Mardia and Watkins 1989 who found no trace of i39nultii39nodality The xent author s calculatioi39i shown in Fig 215a supports the COI IClIlSiOI l of Mardia and Watkins For this the profile likelihood was evaluated for values of R in i39nultiples of 0001 from 55 to 65 It iiunlttmgi 39 y to a ma imum at R 612 and then decreased i39noi39iotonically exactly claimed by Mardia and Watkins However they did show that i39nultii39nodality can be a problem wl39iei39i the log likelihood is not everywl39iere twice dil arentiable l iappens for the splmrical model for example In any case is clear from Fig 213 fitting an xpo1 i1 itial variogram with no trei id is not a sensible analysis for this data set Hai idcock and Stein 1993 analyzed the sai ne data set by a Bayesian analysis based on the Mat rn covariance fui39iction They us 1 a lil ar trei id in 1 and y togetlmr with one additioi39ial covariate the l39iorizoi ital distal ice from the survey point to the clo 7 eai n For the presei39it analysis a linear trei id in 1 and 1 has been used though we know from Fig 214 that this is not fully satisfactory eitl ier Fig 215b shows a profile likelil iood plot in 92 which shows that the i39naxii39nui39n is attained at about 92 119 In contrast the posterior density shown in Fig 32 of Handcock and Stein 1993 has a I39node attaii39ied slightly below 92 1 and falls off i39nuch more sharply on eitl39ier side of the i39node For exai nple from their plot is appears that the posterior density at 92 15 is only about 10 of its i39naxii39nui39n value wlmreas in Fig 215b the value of L at 92 2 is still half its i39naxii39num value This shows that the two i39mathods are not in practice equivalent f n 54 Furtl39ier based on the Iatern covariance function with higlmr order trei id produced log L7784 in the quadrati 92 137 and log L8584 in the cubic se 92 161 compared with log L7274 in the linear trei id i39nodel In each case the improvement is highly Sigl lil lcal lC ju l by a x2 test with respectively 3 and 4 gr as of freedom wl39iich reii iforces the uns isfactory nature of a sii nple trei id model for this exai39nple 80 SW NE MOM SW NE Robust E 0123456 0123456 t t 0123456 0123456 t t Fig 213 Fitted variograms to Davis data Raw data with power law models 81 SW NE MOM SW NE Robust E 0123456 0123456 t t 0123456 0123456 t t Fig 214 Fitted variograms t0 detngulded Davis data with expommtial variogram l39noclels 82 a 1000Iog L b Log L 05 10 15 20 Shape Fig 215 Pro le log likelihoods for Davis data a Exponential i39nodel tted to raw data no trend 1 Mat ri i i39nodel tted to data with linear trei id 83 Course on Environmental Statistics July 14 15 03 SPECTRAL METHODS FOR SPATIAL PROCESSES Montserrat Fuentes Statistics Department N CSU fuentesstatncsuedu httpwwwstatncsuedufuentes Spectral methods are a powerful tool for studying the spatialtemporal structure of random elds and generally o er signi cant computational bene ts Fourier Analysis I Discrete Fourier Analysis A discrete Fourier analysis of a spatial process also called a harmonic analysis is a decomposition of the process into a sum of sinusoidal components sines and cosines waves The coef cients of these sinusoidal components are the discrete Fourier transform of the process Dobson units 280 285 290 295 300 305 193978 I 193980 I 193982 I 193984 I 193986 I 193988 I 193990 I 193992 I 193994 Figure 1 Monthly average total ozone levels TOMS 6508 to 650N Why Fourier basis Sinusoids have some Characteristic properties Which give them a distinguished role to represent a spatial process Z o A sinusoid of frequency w cycles time unit or period 1w in time domain units may be written as Zt Rcoswt l p Where R is the amplitude and p is the phase If the location is changed to a t a b Which incorporates a change of both origin and scale Z t becomes Zu Za ab Rcoswab d wa R cosw u p Where R R w cab and p p l wa Thus the amplitude is unchanged the frequency is multiplied by b the reciprocal of the change in the space domain and the phase is altered by an amount involving the change of space origin and the frequency of the sinusoid Since the space origin associated with a set of data is often arbitrary the simplicity of these relationships is useful o The sum of sinusoids With a common frequency is another sinusoid With the same frequency In fact since Rcoswt l o Rcoswtcos Rsinwtsin any sinusoid With frequency w is a linear combination of the two basis functions coswt and sinwt and the converse is also true o A further useful feature of the sinusoids is their behavior under sampling ie when we observe a process that is de ned on a continuous space at an equally spaced set of values in a lattice there is lost of information If the distance between neighboring observations in the lattice is A the sinusoids Rcosw8 and Rcosw 8 l 15 are indistinguishable if w w is a integer multiple of 27rA This phenomenon known as aliasing Aliasing is a relatively simple phenomenon when one takes a discrete set of observations on a continuous function information is lost Figure 2 Undersampling in space Ozone example The dominant part of the annual cycle may be expected to be of the form St u l Rcos27rwt M Alcos27rwt l Blsin27rwt Where the frequency is w 1 12 cycles per month The seasonal behavior in the ozone data could not be described by a single sinusoid With the annual frequency 112 cycle per month If a semi annual wave is added to the model the resulting ve parameter model is u A1 cos27rwt l B1sin27rwt l A2c0s47rwt l B281H47TUJt Frequencies that are integer multiples of 171 are said to be harmonic With respect to the span of the data and are known as the Fourier frequencies A sinusoid With the jth Fourier frequency ie frequency jn executes j complete cycles in the span of the data thus providing a useful interpretation of the index j Fourier series for periodic functionsl We could model a time series of length N N2 Xt u ZAjCOS27TCth Bjsin27rwjt j1 for W jN where the Aj and Bj rv s are uncorrelated with zero mean and variance of such that varXt o2 0 Thus we decompose the process variance into N 2 components each associated with the expected squared amplitude of sinusoids of a particular frequency For mathematical convenience we could use complex exponentials instead of sinusoids directly em coszl isin1 Thus cos1 ew em and 1 isc ic sinII Ze e 10 Fourier transforms Periodic functions are represented by a discrete set of frequency components non periodic functions involve a continuous range of frequencies Supposed that g is a real or complex valued function we de ne M dgsgtexpzwtsds lt1 The function f in 1 is said to be the Fourier transform of g Then g has the representation 1 t gs W Rd fwexp zw sdw The function g and f are said to be a Fourier transform pair It is often useful to think of functions and their transforms as occupying two domains These domains are referred to as the time or space and frequency domains respectively 11 Spectral representationl Can any process Z s be represented using Fourier basis Yes if it is stationary Z is a stationary process when the mean is constant and covZx y Zy Cx where C is the covariance function that provides a measure of spatial correlation by describing how sample data are related with distance and direction DDDD m Figure 3 Covariance models Exponential and Gaussian 12 Generic Variogram Sill 20 Varioglram val e nugget 5 ra nge 50 Distance Figure 4 Variogram and Covariance The variogram is 7X Covariance value 15 Generic Covariance 20 40 60 80 Distance Where C is the covariance function 100 Can Xx 13 data CoordY 00 02 04 06 08 10 K x gtltJr X ax gtlt 5 W X h 3 00 02 04 06 08 10 Coord X data 02 04 06 08 Coord X 00 02 0 4 06 Coord Y 08 Figure 5 Simulated spatial process using an exponential covariance Cx ye lxlp COIx 0 With range p 23 sill0 1 nugget CD0 Circles 1st quantile Triangles 2nd Plus 3rd Crosses 4th 14 Semivariance Distance Figure 6 Plot of the theoretical and empirical variogram function The variogram is 7X CO CX Where C is the covariance function Using an exponential covariance With range 3 sill 1 nugget 0 15 O 39 Xx 00 gtZlt T O 1 3 gt 0 gt64 I quot E O a J 8 V X 0 o X gt lt X g N O X g X 05 3 5 00054 0 739 O b o pk 00 02 04 06 08 10 y 392 0 0 CoordX O O In 39 In 2 F 2 F N I N U 39 U 39 O v 39 O quot lj II39 I In I 392 In Iquot In N N 00 02 04 06 08 10 00 02 04 06 08 10 CoordX CoordY Figure 7 Exponential with range 3 sill 1 nugget 0 16 Semivariance Distance Figure 8 Exponential with range 3 sill 1 nugget 0 17 Mat rn class of covariances C X is a Mat rn stationary covariance at a distance x 0ltxgt 7 V2W2lxlp lCu2V12lxlp 2V 1FVd2 Where CV is a modi ed Bessel function d is the dimensionality Parameters smoothing parameter V gt 0 range p gt 0 and sill 0 For V we get Cx re lxlp Critical parameter V The larger V the smoother the process is Z Will be n times mean square differentiable if and only if V gt n 18 Matern Covariances 0 0 o 0 8 o C 9 E gt 8 V O l o 39 Exponential Si1 range1 nu 12 cgt o 0 2 4 6 8 10 Distance Figure 9 Covariances Mat rn Class for l 1 2 exponential covari ance and l 32 19 O 4 0 3 W13 X gt L0 E O 8 quot ltgtlt 0 O X N A X 0 AM 0 gtlt X 0 00 02 04 06 08 10 CoordX O O o o d r m 39 In C 39 939 39 N u 1 0 g C 0 T 0 T 390 Inquot 390 quot 39 T quot39 T quot 39 39 39 s a 39 a 00 02 04 06 08 10 00 02 04 06 08 10 CoordX CoordY Figure 10 Mat rn 1 31 ange 3 sill 1 nugget 0 2O 08 Semivariance 02 Distance Figure 11 Mat rn 1 31 ange 3 sill 1 nugget 0 21 O 39 XX gtgtltlt v X 0 gtltgtlt X i 3 gt 0 a gt E O EEK 4 4 X4 8 V gt lt gtltgtltJr 0 o N i Q X C r o f x if gt3 0 X 0 TL 00 02 04 06 08 10 y 9 0 0 CoordX N N N o 39 N o 39 n a 1 390 390 I T T T T 00 02 04 06 08 10 00 02 04 06 08 10 CoordX CoordY Figure 12 Mat rn 1 31 ange 3 sill 2 nugget 2 22 Semivariance Distance Figure 13 Mat rn 1 31 ange 3 sill 2 nugget 2 23 Consider a weakly stationary process Z with mean 0 and covariance C Before we can apply the ideas of Fourier series and Fourier integrals we must rst ask 0 Can we represent a typical realization as a Fourier series The answer to this question is clearly No 0 The next question is Can we represent a typical realization as a Fourier integral No Maybe we can not distribute the power over a continuous range of frequencies but over a set of frequencies with discontinuities This will lead to a Fourier type integral called Fourier Stieltjes Zs e szdYw where Y measures the average contributions from all components with frequencies less than or equal to w 24 The spectral representation theorem To every stationary Z 5 there can be assigned a process Yw With orthogonal increments such that we have for each xed 5 the spectral representation Zs eisdeww 2 R2 The Y process is called the spectral process associated With a stationary process Z The process Y has orthogonal increments ElYw3 Yw2Yw1 Ywol 0 When w3w2 and w1w0 are disjoint intervals If we de ne F as EldYwl2 dFw F is a positive measure 25 Bochner s theorem We derive the spectral representation of C Cs eisdemw R2 Thus C is nonnegative de nite if and only if it can be represented in the form above Where F is real never decreasing and bounded If we compare the spectral representation of 08 and Z 5 05 R2 eisdemw Zs R2 eisdeY u it Will be seen that the elementary harmonic oscillations are respectively eiSdeF a eiSdeYQu 26 We have 111 EZs2 00 2 FR2 Thus F determines the power spectrum of the Z process We may think of this as a distribution of a spectral mass of total amount CO over the w aXis F only differs by a multiplicative constant from an ordinary df If F has a density with respect to Lebesgue measure this density is the spectral density f F I de ned as the Fourier transform of the autocovariance function then 8111 EZs2 FR2 2 fwdw R2 27 Matern Covariances 0 0 o 0 8 o C 9 E gt 8 V O l o 39 Exponential Si1 range1 nu 12 cgt o 0 2 4 6 8 10 Distance Figure 14 Covariances Mat rn Class for l 1 2 exponential covari ance and l 32 28 Matern Spectral Densities 10 Exponential 08 06 Spectral Density 04 02 Frequencies Figure 15 Spectral Densities Matern Class for l 1 2 exponential covariance and l 32 29 Spectral densities I Mat rn class Mat rn spectral density fw W www2gtlt V gt 3 With parameters 1 gt 0 04 gt 0 and o gt 0 the value d is the dimension of the spatial process Z Here the vector of covariance parameters is 1 0 o 1 oz The parameter oz can be interpreted as the autocorrelation range The parameter 1 measures the degree of smoothness of the process Z and o is a scale parameter For high frequencies fw N lwl2 d 4 3O Gaussian model The density of a a spatial process With an isotropic Gaussian covariance 2 Cr re O 1 2 ma altmgt 12e w 460 Note that C and f both are the same type of exponential functions The parameter 0 is the variance of the process and 04 1 is a parameter that explains how fast the correlation decays For this process the covariance is in nitely differentiable so the corresponding Z process has mean square derivatives of all orders then m Zzltjgtltogtxw a 2m 90 31 as m increases this means that is possible to predict Z perfectly for all 16 any location of interest based on observing Z s in a neighborhood of 0 Where x 0 is an arbitrary reference point in space This type of behavior usually would be considered unrealistic for a physical process 32 Gaussian vs Matern Covariance 08 Covariance 04 02 quot Gaussian 00 Figure 16 Plot of Gaussian covariance 6 22 dashed line and mat rn covariance ex1 i solid line 1 32 Both are of the form 1 5162 i 0ixi3 for small X 33 Spectral Densities Matern nu 32 08 Spectral Density 04 02 Gaussian 3 2 1 0 1 2 3 Frequencies Figure 17 Plot of Gaussian spectral density 6 24 dashed line and mat rn spectral density 1 i lxl2quot12 solid line 1 32 34 Empirical Covariance Empirical covariances are likely to be a poor way to distinguish between possible models BEHAVIOR AT ORIGIN For Gaussian and Matern 1 32 1 1 x2 l 0lxl3 for small X However Matern 1 32 has only two derivatives at origin and Gaussian is analytic More accurate to focus on the high frequency behavior of the spectrum low frequency behavior of spectrum have little effect on interpolation 35 Nitric Acid concentrations 50 45 40 35 30 25 100 90 80 7O 60 Figure 18 This gure shows the output of Models 3 for the week starting July 11 1995 We divide the domain in 9 subregions 36 Region 7 Region 8 Region 9 E E siiii E rangEZ4BB smuutnness 7 E siiiZiEyrangEZBH smuutnness E E iZZ rangEZZEA smuuthnEss 5 E f E E E El 1 Z 3 4 5 E El 1 Z 3 4 5 E El 1 Z 3 4 5 B Frequency Frequency Frequency Region 4 Region 5 Region 6 E E siiii i rangEZ4Ei smuuthESSZE E D Em range szsmmmess E E sum zranqemaasrnemnnessw r3 r2 r3 El 1 Z 3 4 5 E El 1 Z 3 4 5 E El 1 Z 3 4 5 B Frequency Frequency Frequency Region 1 Region 2 Region 3 E g Siii3 rangEZEBE smuutnness a E D iiiZZ rangEZSi4 smuutnness 5 E m sii i E rangEZZZZ smuutnness 5 3 3 3 e V e 2 re r r3 El El 1 3 4 5 E El 1 Z 3 4 5 B Frequency Frequency Figure 19 This gure shows the estimated parameters for the Mat rn spectral densities of the local stationary processes Z for 239 1 9 37 Region 7 Region 8 Region 9 Range496 Si16 Range973 Si15 Range254 Si2 s quot s quot s quot i N i N i N u an inn i n 25D sun 35D n mm mm sun 4mm arm u an inn ism zuu 25m sun 35D Distance km Distance km Distance km Region 4 Region 5 Region 6 Range481 Si11 Range152 Si12 Range466 Si12 s quot s quot s quot i N i N i N D an inn ism zuu 25m sun 3 u an inn ism mu 25m sun 3 u D an inn ism zuu 25m sun 3 u Distance km Distance km Distance km Region 1 Region 2 Region 3 Range566 Si3 Range314 Si2 Range222 Si18 E q E q E q 5 N 5 N 5 N u an inn i n 25D sun 35D u an inn i 25m sun 35D u an inn ism zuu 25m sun 35D Distance km Distance km Distance km Figure 20 Mat rn estimated covariances using a likelihood approach 38 ESTIMATING THE SPECTRUM o The periodogram In estimates the spectral density f of a process Z observed in a grid n X n 17100 27m2 ZZj 73wTj jEJ we consider only the periodogram at the Fourier frequencies 27Tjn forj E J 12jn o Periodogram values are approximately independent 0 Periodogram asymptotically unbiased The periodogram is simply the discrete fourier transform of the sample covariance 39 Spectral weightednonlinear least squares technique Consider modeling the spatial structure of Z by tting a spectral density f to the periodogram values We could use a weighted non linear least squares WNLS procedure that gives more weight to higher frequency values because high frequencies are important for interpolation We propose using as weights f w1 to give higher weight to higher frequencies This is reasonable since for large N the approximate standard deviation of the periodogram I N is f This is similar to the weighted least squares method used in the space domain to t a variogram model Cressie 1985 Though periodogram values are approximately independent while variogram values are not 4O Likelihood function In general environmental datasets are very large and calculating the determinants that we have in the likelihood function can be often infeasible Spectral methods could be used to approximate the likelihood and obtain the maximum likelihood estimate of the covariance parameters Whittle 1954 proposed the following approximation to the Gaussian negative log likelihood Z 1ng 27Tjn IN hir1 f lt27rjngtgt 1 lt5 jEJN sum considered at the Fourier frequencies 27rjn We recommend to leave out of the sum the small frequencies at least the frequency 0 to avoid the problem of estimating the unknown mean of Z 41 NONSTATIONARY MODELS The most extensively studied method for nonstationarity is the deformation approach due to Sampson and Guttorp 1992 Haas 1995 proposed an approach to nonstationary spatial kriging based on moving windows Higdon Swall and Kern 1999 use a moving average speci cation of a Gaussian process Nychka and Saltzman 1998 present an extension of the empirical orthogonal functions EOF approach that is popular among atmospheric scientists Wikle et al 2001 use a waveletbased representation for the covariance Fuentes 2001 2002 and Fuentes and Smith 2001 introduced a nonstationary covariance model with parameteres varying with location 42 New Model for nonstationarityl SPATIAL SPECTRA Fuentes 2002 We present an approach for the spectral analysis of non stationary spatial processes Z X Which is based on the concept of spatial spectra this means spectral functions Which are space dependent fxw The spectral representation of Z X is always interpreted as its representation in the form of superposition of sine and cosine waves of different frequencies w M 2 explt XTwgtqgtxltwgtdYltwgt 43 The functions xw are slowly varying functions of x satisfying l xwl2dw lt 00 R2 The covariance function C of Z x is COVZX7 ZY CX17X2 f expz ltx1 x2gtTwgtqgtx1ltwgtqgt2ltwgtdw R2 R2 In particular varzltxgt oltxxgt l xwl2dw R2 Then 1 quotxwlqgtxwl2 is the spatial spectral density of Z 44 Nonparametric spatial spectral estimate We propose a nonparametric estimate of the spectral density We rst de ne Jxw0 Jxw0 A Z Z gAuZAx 11 U1 L391 39 L1 u2x2 n2 eXp iAX uTwo Where s 8182 and gu is a lter We refer to Jxw2 as the spatial periodogram at a location x for a frequency w Jxw2 is an approximately unbiased estimate of fxw Fuentes 2002 but as its variance may be shown to be independent of N it Will not be a very useful estimate in practice Then we estimate fxw by smoothing the values of Jxw2 over neighboring values of x 45 More precisely let Wp be a weight function or Window depending on the parameter p Then we estimate fxw0 by fxwo Z 2 WW le vwol2 U1 il39l 39 L1 7121392 39 L2 Thus fxw0 can be interpreted as an average of the total energy of the process contained Within a band of frequencies in the region of wo and an region in space in the neighborhood of x Parametric estimate We assume a parametric model for the spectral density a Mat rn With parameter 0 changing With location ma W lwl2gtlt Wgt The parameters 04 1 and 7 vary With location 46 New Spatialtemporal modelsl A spatial temporal eld Z s t where 5 represent space and t time is separable if CovZs t Zs t 018 S C392tt for some spatial covariance 01 and temporal covariance 02 A class of nonseparable spatial temporal models were proposed by Cressie and Huang JASA 99 Gneiting JASA 02 and by Stein 2003 Here we de ne a new class of generalized spectral representations for nonstationary and nonseparable spatial temporal processes For this new class of spatial temporal models the spectral representation itself and the corresponding spectral distribution function or spectral density can change slowly on space and time 47 ww d OO tccd gl g 39agaggll l I 91 3 6 amp9 ia I M ampW v Q wxw xwv Q o 9 QA A 2o y mocmtmgtoo 29933 lt 9 92 OZ 9L 0L eouepero 39 G 9 uuuuuun vu ooo o aw x alv quot quot quot Q s s Iii awww apmw 9v9 3 t INI ff 9 s N x Qm mocmtmgtoo mEmEQmmlcoc lt sM s waWuvv4 4v v910Q s Moo 0 9 92 oz 9L 0L eouepeAOO Let Z be a nonseparable process we propose the following representation Zxt expo u t7 xtw7dyw7 6 R3 It is easy to see that then the covariance function C of the nonseparable process Z x t is given by the formula COVZX1t1ZX2t2 CX1t1X2t2 RB expz39x1 X2Twexpz39t1 t2TT X1t1wT 2t2w7dwd7 7 In particular varZXt come Xt Xtw72dwd7 8 50 Locally in a neighborhood of s7 xi we propose the following parametric model for o aw W W 3l7l2gtlt W gt2 9 With parameters V7 gt 0 77 gt 0 and 77 gt 0 Where d is the dimension d 3 for space and time The parameter 100 explains the rate of decay of the spatial dependency for the temporal component the rate of decay is explained by iozi 77 is a scale parameter 51 Conclusions I A Stationary spatial process can be always represented in terms of sines and cosines Fourier basis This is called a spectral represent at ion Spectral methods offer enormous computational bene ts Whittle likelihood FFT A nonstationary process can be also represented using Fourier basis with varying amplitude Fuentes 02 Wavelets is an alternative approach used by Wikle and Nychka amount others Nonseparable valid spatial temporal covariance models can be easily obtained using spectral methods Cressie and Huang 99 Stein 03 Nonstationary and nonseparable valid spatial temporal covariance models can be also obtained using spectral methods Fuentes 03 52 Books about spectral methods for spatial data 0 Christakos G 1992 o Stein 1999 o Yaglom 1987 53 References I Cressie N and Huang H C 1999 Classes of nonseparable spatio temporal stationary covariance functions Journal of the American Statistical Association 94 1330 1340 Christakos G 1992 Random Field Models in Earth Sciences San Diego Academic Press Fuentes M 2001 A new high frequency kriging approach for nonstationary environmental processes Enuirometrics 12 469 483 Fuentes M 2002 Spectral methods for nonstationary spatial processesBiometrilta 89 197 210 Fuentes M 2003 Testing for separability of spatial temporal covariance functions Tech report 2545 at NCSU Statistics 54 Department under review by JASA Fuentes M 2002b Modeling and testing for non stationarity of spatial processses Tech report 2533 at NCSU Statistics Department under review by JMA Fuentes M and Smith R 2001 A new class of nonstationary models Tech report at North Carolina State University Institute of Statistics Mimeo Series 2534 Gneiting T 2002 Nonseparable stationary covariance functions for space time data JASA 97 590 600 Haas TC 1995 Local prediction of a spatio temporal process with an application to wet sulfate deposition J Amer Statist Assoc 90 118971199 Higdon D Swall J and Kern J 1999 Non stationary spatial modeling In Bayesian Statistics 6 eds JM Bernardo et al Oxford 55 University Press pp 7 6177 68 Mat rn B 1986 Spatial Variation Lecture Notes in Statistics Number 36 Springer Varlag New York Second edition originally published in 1960 Nychka D Wikle C and Royle A 2002 Multiresolution models for nonstationary spatial covariance functions Statistical Modeling 2 299 314 Sampson PD and Guttorp P 1992 Nonparametric estimation of nonstationary spatial covariance structure J Amer Statist Assoc 87 108 119 Stein M L 1999 Interpolation of Spatial Data Springer New York Stein M L 2003 Space time covariance functions Tech Report University of Chicago 56 Wikle C K Milliff R F Nychka D and Berliner M 2001 Spatiotemporal Hierarchical Bayesian Modeling Tropical Ocean Surface Winds Journal of the American Statistical Association 95 107 6 1987 Yaglom A M 1987 Correlation Theory of stationary and related random functions I Springer Verlag New York 57 ST 790 M Fall 2004 SPATIAL STATISTICS AND DATA ASSIMILATION Fuentes Foley Statistics Department N CSU fuentes st atncsuedu httpWWWstatncsuedufuentes Data Assimilation Data assimilation is the process of combining observations With a numerical deterministic model that is used to simulate the evolution of the state process of interest More generally we consider the problem of data assimilation as Given the data we wish to describe the current state 0ff0r eg the atmosphere or ocean air quality Data State of Nature gt State of Nature Data Data Assimilation I We can represent the two sources of information that are merged in data assimilation in the form of an observation equation and a state or system equation that are functions of the state vector X t of the underlying unobservable system Notation Xtt7 ermxl gt true unknown state vector at time t7 0 Y pxl gt new vector of observat1ons X501 gt analysis state vector best estimate of the unknown state at time t7 based on data assimilation methods f Xingtlt1 gt output from the numerical model Data Assimilation I OBSERVATION EQUATION Y7 HAXE 63 1 STATE EQUATION Xi MAXQLII m 2 where 67 N 07 U7 N 07 and COX67 717 o is an observation operator Which interpolates the model variables to the observation locations and transforms the model variables to the observed variables 0 represents the forward integration of the current state by the numerical model DA Example I o The State Vector Air pollution eg ozone concentration values at n grid locations X Ot810t8n o The Data Air pollution concentrations measured at m monitoring stations x1 Y 00ZE1 00rm o The Model Air quality numerical models can be used to estimate the air pollution concentrations at the n grid locations Xf 0f31 0f8n 1 Standard Kalman Filter I OBSERVATION EQUATION Y Hz39Xf 62 3 STATE EQUATION X Mz39Xf I 717 4 Where 6 N NOR vi N NO Qi and Cox 2217 0 0 Standard Kalman ltering assumes that observation and forecasting error is Normal 0 KF also assumes that the observation and state equations are linear systems of the state variables ie H pxn MWLXTO Kalman Filter Best Linear Unbiased EstimatorI Forecasting Step 0 Begin With estimates for state vector and forecast error covariance X L1 Pfil 0 Use numerical model represented in state equation X Mm1 m U N NO Qi to evolve these estimates from time t 1 to ti V X MZXle 5 Pif MiPz39a iMz39T Qi 6 V Kalman Filter Best Linear Unbiased Estimatorl Analysis Step 0 Combine forecast background and observations to produce best estimate for state vector at time t7 analysis vector Y5 P1in 67 7 X3 X a 8 Where 6 NOR7 6 N0 Pf and coxgo El 0 Z771 Kalman Filter Best Linear Unbiased Estimatorl 0 We want a linear estimator X7 WMX WHY 0 We want an unbiased estimator gt Wl L I W27LH7L 9 X X Waxy H XZgt 10 0 We want the 77196815 estimator Find W7 that minimizes the mean square estimation analyis error E e e T Where a Xf Xf 10 Kalman Filter Update I o The optimal weight matrix the Kalman gain matrix mep Ki PHLTR HiPiinTYl 11 o The BLUE for X7 is X X KAY HvLXf 12 0 Notice that the Kalman gain matrix Will be larger When the forecast error covariance P is large compared to the observation error covariance Ri Also the larger K7 matrix in 12 Will mean a larger correction to the initial forecast X 11 Kalman Filter Update I 0 Since X7 an unbiased estimator the analysis error covariance can be found by pa EeeT I may 13 1 1 1 o Pf is equal to the forecast background error covariance reduced by the identity matrix minus the optimal weight matrix 0 The Kalman lter is very similar to other data assimilation methods such as Optimal Interpolation OI and 3D Var methods except that in these methods the forecast or background error covariance is estimated once and then remains constant Pf B rather than being advanced by the numerical model 12 Standard Kalman Filter I o For normally distributed forecast and observation errors the least squares est Unbiased imator X7 is the Uniformly Minimum Variance Estimator for the true state Xf For non normal errors X3 is still the LS solution but it may no longer be optimal o For normally distributed forecast and observation errors the estimators estimators X57139 Z are equivalent to the maximum likelihood 13 Adapting the Kalman Filterl o In atmospheric science applications of DA the state vector of interest is of very high dimension and the dynamic systems being modeled are unstable and possibly chaotic This implies The numerical model is nonlinear The KF update step for the error covariance is extremely computationally expensive Normality assumption for the error terms may not be realistic 14 Adapting the Kalman Filterl 0 Also simplifying assumptions mean that all errors are not accounted for Estimation error from estimating RZ39 QZ39 ignored Often it is assumed that 217 E 0 which implies the system dynamics are known perfectly Xi Mz39lXLll 15 Extended Kalman Filter I The Extended Kalman Filter uses a linear tangent model L7 to approximate the non linear system and then the standard KF is used Forecast Step Xi Mil 511 14 Pif Z LiPia leT Qi Analysis Step X X KAY HZ39XZ W mzu mmmf 16 Extended Kalman Filter I o EKF has been Widely used by operational weather forecasting centers 0 However for strongly nonlinear systems Where observations are not very frequent the linearization can be very inaccurate 0 Also this method remains very computationally expensive so simplifying assumptions and approximations are used to replace the updating step 15 Z LiPia leT Qi 17 Ensemble Kalman Filter EnsKF I o The Ensemble Kalman Filter uses a Monte Carlo approach to generate samples of the state vector and carry out an ensemble of data assimilation cycles This ensemble is then used to estimate the forecast error covariance Forecast Step Sample Xjt71 N PXat 1Pat 1 1 Compute X ti MjXjt 1j1K 18 Calculate sample mean and covariance 1K Ar f X 75 K21X K lt16 15quot Xf Xfo AfT t K1j1 H X Ensemble Kalman Filter EnsKF I 0 Analysis Step Perturb the observations at time t7 by adding random noise typically Gaussian Y0t7 m 77939 N NORpxpj1K Update each member of the forecast sample With the KF Xjt A X my HXj j 1 K PfHTR HPfHT 1 This now serves as a sample for time t7 and the process repeats The ensemble sample mean and covariance can be used as estimates of the updated state vector and error 2O covariance 21 Ensemble Kalman Filter EnsKF I 0 Typically only 10 to 100 ensembles are used to approximate the rst two moments of the update distribution The EnsKF requires considerably less computational cost compared to the EKF and it avoids problems associated With the linearization of the forecast model However it is still expensive to run multiple DA cycles and these ensembles may not be representative of the true statespace probability distribution Also empirical estimates tend to underestimate the forecast error covariance o EnsKF can be used With non Gaussian error distributions In this case estimates for only the rst two moments 22 of the update distribution may not be su icient 23 Bayesian Framework I Data State of Nature gt State of Nature Data 0 Prior distribution PXfYi01 o Likelihood LX YiO P1Q0Xfl o1 o Posterior distribution Bayes Theorem PXfY O1gtlt POQWXRYO PeriY07YiO 1 Yo YO dZXtZ 2397 239 i l 239 24 Bayesian Framework Normal Examplel Under the assumptions of linearity and normality of the standard KF PRKHV IKXjH4AJANXjPf LIKELIHOOD POQOle iil N NH7Xf Ri POSTERIOR PXfDQOYZ01 N NX7f Pf The posterior mean is equal to the KF update X2 X KAY Hz39Xf The posterior covariance is equal to the KF analysis error covariance mzu mmmf 25 Bayesian Framework I Typically the model error term in the state equation is ignored Also uncertainty in the estimation of the observation and model covariances Rz and Qi is not accounted for As a result the analysis error covariance Pf is often underestimated A Bayesian framework allows us to relax the assumption of linearity and normality used in the Kalman Filter and keep track of the many sources of error in DA We can account for uncertainty in parameterizing Rz and Qi as 26 well as Observation and forecasting bias 27 Bayesian Framework I PXflKOJ O1Olt PXfl331x Pmole f 1 18 o The posterior distribution is typically not a known distribution Usually we cannot nd a closed form expression for the mean and covariance or information on the skewness of the distribution 0 Instead we sample from this distribution and use sample estimates to describe the distribution of the updated state vector 28 Particle Filter o For high dimensional problems it can be very dif cult to sample from the posterior distribution PX Yi0Yi01 One possibility is a particle lter method 0 Particle lters are sequential sampling methods based on point mass particle representations of the probability density 0 We can use MCMC sampling methods or a sequential importance sampling SIS approach to approximate the posterior density at 29 More detail about Particle FiltersI We present here a methodology to relax the assumption of linearity and normality and furthermore take into account the error in the numerical model by using Bayesian statistical methods to approximate the posterior distribution To simplify the notation let us denote xt the true state at time t and y all the data till time 75 By using Bayes theorem we can update the prior and obtain the posterior distribution Pxtly1t Pth 1t0lt PthY1t 1PYtlxt that is written in terms of the prior and the likelihood of the observations 3O We could use a sequential importance sampling SIS to approximate the posterior Importance sampling is an attractive method because it focuses not on the plausible realizations of the posterior but the important ones 31 The posterior density at time t is approximated by 2 W173 Xt X1 239 Where 6 is the Dirac delta measure and xi are a set of support points Where the weights are normalized to add to 1 and are chosen using the principle of Importance Sampling eg Doucet at Ll 2001 described next Suppose px oc 7rx is a probability density from Which it is dif cult to draW samples but for Which 7r can be evaluated Let xi N qx for 239 1 N be samples that are easily generated from a proposal q called a proposal or importance density A weighted approximation to the density is given by pJc N Zwio p 32 Where I ZOC mi qw is the normalized weight of the i th particle The SIS algorithm consists of recursive propagation of the weights and support points as each measurement is received sequentially 33 Algorithm for SIS 0 Draw X7 N QXk qu 17Yk o Assign the weight 1117 qX7J X8k17Y1k o The posterior ltered density pxkyk can be approximated by N Z wZ5Xk X2 34 A common problem with SIS is the degeneracy phenomenon after few iterations only one particle will have non negligible weight This is because the variance of the importance weights increases over time There are modi ed versions of SIS to overcome that problem the sampling importance resampling SIR lters Gordon at Ll 1993 the auxiliary sampling importance resampling ASIR lters Pitt and Shephard 1999 and the regularised particle lters RPF Musso et al 2001 35 A common approach is SIR the basic idea is to eliminate particles which have small weights and to concentrate on particles with large weights The resampling steps involves generating a new set of particles by resampling with replacement N times from the previous support points xi taking into account their weights The resulting sample is an iid sample from the discretized posterior and the new weights are now reset to 1N 36 Forecasting step For forecasting purposes predictive distributions for xtk are available in standard forms The updated density Pxt1lxt becomes the input for the forecasting step The DA forecasting density Pxt1lyt is written in terms of the updated density posterior of xt PXt1lYt olt PXt1lXtPthytdXt 19 In the standard Kalman lter and other basic DA approaches this forecasting step is purely deterministic since the error term Vt is usually ignored However in expression 19 the statistical modeling of the error process Vt plays a very important role Thus it is important to use a flexible class of space time models 37 Particle Filter 0 Particle lters are more computationally expensive than the other ltering methods 0 However the result is an approximation to the complete posterior distribution rather than just the rst two moments 38 Model Diagnostics Veri cation I So why is it so important to keep track and quantify all sources of uncertainty Good estimates of the analysis error covariance covariance of posterior distribution can be used to produce calibration maps Rather than directly comparing how close a predicted value is to an observed value we can compute a 90 prediction interval for each observed value This allows us to decide if large differences between predicted 39 and Observed values are due to serious modeling errors or to large variability in the dynamic system of interest 4O Summary I For normal errors7 KF provides the least squares solution and is the best unbiased estimator of the unknown state The KF solution is dependent on several unrealistic assumptions about the dynamics of the numerical model Many adaptations of the standard KF eXist and have their own set of advantages and disadvantages Further research will show Which methods are most effective for different applications in data assimilation A proper statistical framework can provide a set of methods for keeping track of different sources of error and bias in an effort to evaluate and improve forecasting model performance 41 References 1 2 Anderson TW 1958 An Introduction to Multiuariate Statistical Analysis New York John Wiley Arulampalam S Maskell S Gordan N and T Clapp 2002 A tutorial on particle lters for on line non linearnon Gaussian Bayesian tracking IEEE Trans Signal Processing 50 2 174 188 Bengtsson T Snyder C and D Nychka 2003 A nonlinear lter that extends to high dimensional systems unpub Dee D P and A M Da Silva 1998 Data assimilation in the presence of forecast bias Quarterly Journal of Royal Meteorology Society 124 269 295 Gordon M Salmond D and Smith AFl1 1993 Novel Approach to Non linear and Non Gaussian Bayesian State Estimation IEE ProceedingsF 140 107 113 Gordon A E Claudia F Giulivi Taro Takahashi Stewart Sutherland John Morrison and Donald Olson 2003 42 6 Gordon M Salmond D and Smith AFll 1993 Novel Approach to Non linear and Non Gaussian Bayesian State Estimation IEE ProceedingsF 140 107 113 7 Holland G 1980 An analytic model of the Wind and pressure pro les in hurricanes Monthly Weather Review 108 1212 1218 8 Houtekamer P L and H L Mitchell 1998 Data assimilation using an ensemble Kalman lter technique Monthly Weather Review 126 796 811 9 Ide K Courtier P Ghil M and A C Lorenc 1997 Uni ed notation for data assimilation operational sequential and variational Journal of Meteorology Society of Japan 75 1B 181 189 10 Kalnay E 2003 Atmospheric Modeling Data Assimilation and Predictabilty Cambridge Cambridge University Press 11 Liu J S 2001 Monte Carlo Strategies in Scienti c Computing New York Springer 12 Meinhold R J and Singpurvvalla N D 1983 Understanding 43 the Kalman lter The American Statistician 37 2 123 127 Musso C Oudjane N and LeGland F 2001 Improving Regularised Particle Filters in in Sequential Monte Carlo Methods in Practice eds A Doucet JFG de Freitas and JJ gordon Springer Verlage New York Peng M Xie L and Pietrafesa J 2004 A numerical study of storm surge and inundation in the Croatan Albemarle Pamlico Estuary System Estuarine Coastal and Shelf Science 59 121 137 Pitt M and Shephard N 1999 Filtering Via Simulation Auxiliary Particle Filters Journal of the American Statistical Association 94 446 590 599 44 for fixed 139 However according to Cressie 1993 page 77 it appears that 3 is a more ef cie it estii nator than 3 Examples In cl39iapter 1 we have already seen variograi39ns coi39nputed for four i39nete orological variables and four regions of the USA by both the MoM and robust metl iods In each of Figs 111 114 the MoM estii nate is on the left hand side and the robust estii nate 3 is on the right hand side The two estii nates seem to be rather similar e in the case of animal i39naxii39nui39n precipitations Fig 113 for which the MoM 39 is generally larger than the robust ii39nate This is to be xpctl because in this case we saw that the distributioi39i is ind Md affect d by outliers so one would expect the two estii nates to bel39iave in dil arent ways Another way to compute th e plots is a variogram cloud This metl iod of coi39nputii39ig the variograi39n is available wl39iei39i there are i39nultiple replications of the spatial eld This assui nptioi i is satis ed for the data in cl39iapter 1 we had many years of data wl39iicl39i at least for the present analysis we are treating ii39idepei ident from year to year In the variograi39n cloud one point is plotted for each pair of station di nce between stations and sj ii739 say is plotted along the 1 axis and an e VarZs Z sj is plotted along the y axis For the latter we may use ither the or the robust method Recall that the Z values we are usi 1g for this coi nparisoi i are not the raw data but are tai idardimd residuals from a lii iear reg on in time so the sample means and standard deviations at each station have already be 11 adjus l to be 0 and 1 tively Cm 1 0 Figs 24 and 25 show the variograi39n clouds for the data of cl39iapter 1 coi nputed for the winter mean daily mii iii nui n temperatures analogous to Fig 111 and the annual i39naxii39nui39n daily precipitations Fig 113 As can be seen the scatter in the plots is very great calling into question whether tl39iese are l39ioi39nogeneous spatial proc 7 Our present focus however is on the comparison of thm MoM and robust estimats and one way to directly by plotting one agaii l the other Fig 26 sho 39s a plot of robust quot for each of the four subdivisioi39is of the USA cor spoi iding to the variograi39n clouds i Fig 24 Fig 27 shows the same things coi39nputed for the variograi39n clouds in Fig 25 The 45quot line through the origin is shown to provide a coi nparisoi i betweei39i the two estii nates 01113KXUI3 11121 11111111111111111 103nm 1120111 no pameq 1232p mp 10 93016 131101 11112180111 f7 z 39Sgg we we mum nnZL mum Inna nna nnv DUI I mum nnZL mum Inna nna nnv DUI I WOA 15nqoa suog1215 as wow suog1215 as we we mum Inna nna nnv DUI H mm Inna nna nnv DUI I 15nqoa suog1215 M3 wow suog1215 M3 15 15 mm mum Inna nna nnv DUI l mu mum Inna nna nnv DUI I 15nqoa suog1215 am Wow suog1215 am we we mum mm mm Inna nna nnv DUI I mum nnZL mum Inna nna nnv DUI I 15nqoa suog1215 MN wow suog1215 MN 9 3911013 4231613016 Kmep ulnumxmem 1211111112 no pomeq 1232p mp 10 93016 131101 umxf oymA 3393 393 15m lSlCI mum nnZL mum Inna nna nnv DUI I mum nnZL mum Inna nna nnv DUI I WOA 15nqoa suog1215 as wow suog1215 as we we mum Inna nna nnv DUI H mm Inna nna nnv DUI I 15nqoa suog1215 M3 wow suog1215 M3 15 15 mm mum Inna nna nnv DUI l mu mum Inna nna nnv DUI I 15nqoa suog1215 am Wow suog1215 am we we mum mm mm Inna nna nnv DUI I mum nnZL mum Inna nna nnv DUI I 15nqoa suog1215 MN wow suog1215 MN 8 NW b NE m m 6 6 6 6 m m 6 6 3 3 D D o o DC D 00 05 10 15 20 00 05 10 15 MOM estimate MOM estimate c SW d SE 3 m m 6 6 5 6 2 a m m 6 6 3 3 D D o o I 1 I 0 00 10 20 30 00 05 10 15 MoM estimate MoM estimate Fig 26 Plot of robust VS MOM astit39nators for the variogram cloud in Fig 24 47 8 NW b NE Robust estimate Robust estimate 10 15 20 25 10 20 30 MOM estimate MOM estimate c SW d SE Robust estimate Robust estimate 10 15 20 25 30 10 20 30 MoM estimate MoM estimate Fig 27 Plot of robust VS MOM astit39nators for the variogram cloud in Fig 25 48 From Fig 26 it can be is tightly c ed aroui id robust estii na good tii39nates are in good agr 7 39t The scatterplot 45quot line and and the correlation betwe n the MoM and L Fig 27 tells a completely dil arent story In this case the bulk of the es imates lie below the 45quot lii ie ii39idicatii39ig that the robust estii39na er than the 39nate We also observe that there is 1391111Cl1 greater variability in the atterplot in tl c the two estii nates seem ali39nost uncorrelated C m f H H w Our coi39iclusioi39i is that for a st 3s for which the marginal values are cl e to i39iori39nally di ributed appears to be the case for the data in Fig 24 it i39nakes little dil arence wl39iich estii nator is coi i iputed However for a highly skewed marginal distributioi i in Fig 25 it does make a big dil rrence It remaii39is open to discussion wl39iich is the corr stimator for an exai nple like this one given that ultimately our real int 39 in the behavior of the most extrei ne rainfalls However given the Cg 11jl 11 y of the MoM estii nator to be greatly al zcted by e an a small 1391111 11l 1 of outlying values it is probably more reasoi iable to use the robust 39 i an indicator of what is going on in the bulk of the hile acknowledging the 39eed for alteri iative measure to cl39iaracterim spatial depi39idi ic in the XCX39 139H values of the proc f Y w a Y O 1 3 n V 2 C r V H H Inspecting the variogram cloud for homogeneity An alteri39iative issue briefly toucl39ied on above coi39ic ns the homogeneity of the pro ie the assumption that the spatial proc is both tationary and isotropic It is Fig 28 for the NW s tions in the wii39iter daily i39nii39iii nui n tei i iperature plo in other wor Fig 28 superimposes the top left l39iai39id plots of Figs 111 and 24 The dots represei39it the points of the variograi39n cloud while the circles represei it averaged values of the variograi39n cloud over subii itervals on the distai39ice scale wl39iich is exactly how Fig 111 was computed The boui idaries of the distai39ice subii itervals are also il l ll 39lttCg d repre g 1 lC jl vertical lines on the plot Each point of the variograi39n cloud ie each dot in Fig 27 is the variance between two stations coi nputed from 32 years data while the con poi39idii39ig bii ii ied values the circles in the figure are averages of all the variograi39n cloud points within eacl39i istai ice subii39iterval One can ask whethr the data would support a l39iypothe quots that all the vari ances within a single bin are equal It might be possible to test this using for exai i iple Bartlett s test for the equality of varianc 39 i The 39 sump tions of Bart are not st ictly sat for exai nple 39 J39l ent pairs of st tions leadii ig to dil arent dots of the variograi39n cloud are not actually indepei39ident Nevertl ieless one might expect that within a single narrow bin Bartlett s test would give reasoi iably reliable ai39iswers In this case it quickly l 01 11 s apparei39it that evei i crude tests of l39ioi39no 39 ance bin lead to decisive gt the very largest distai39ices The process we are sai npling from is not spatially l39ioi39i iogei ieous 49 The same coi iclusion applies to the other three subdivisioi39is of the coi39itiimntal USA and e gaorological variables NW stations MOM 05 600 800 1000 1200 1400 Dist Fig 28 Two forms of variograi n plot super nposed Given the COI lCluSiOll that these processes are not spatially homogeneous we must be cautious in our ii39iterpretation of Figs 111 114 and 24 25 The mat variograi39n is not nec arily a valid measure of the variai39ice betwmn any two individual stations but in stead represei39its an averaged value over the regioi i Given this alternative interpretation however coi nparisons beth n the variograi39ns still sum to be justified for example tei nperature averages are correlatml acr ss very wi 39 les but the range of spa 39 1 lei39ice for pr ipitation i39naxii39na is i39nuch s1 aller c nwhile we defer detailed s to cl39iapter 3 S More details of the calculations Consider the variograi39n cloud points correspondii ig to distances between 590 and 610 nautical mil We can assume that the ith pair corr ponds to locations 81 and that we have observations Yij Z 81 135 Z 82 233 for time points Iii739 at which both observatioi is Z 81 133 Z 82 135 exist Because of missing values in the original data set not all Z s 23 points are well gle ned Table 21 shows values m wl39iere m is the number of observatioi39is Yij and Sini 1 Si ZjOi39 Yi2 where is the mean of the values of Yij j varies A A2 4 A A 1 m a I m a I n a I m a 1 29 114082 2 30 8010 3 29 91791 4 29 49816 5 31 76427 6 31 7 31 146821 8 32 122803 9 31 37191 10 31 65614 11 31 132307 12 29 56116 13 25 14 32 126053 15 31 42792 16 24 67272 17 32 38503 18 30 48921 32 84835 20 31 76461 21 31 45127 22 30 48219 57729 24 32 130333 25 32 80796 26 32 60602 29 75231 28 29 59545 29 32 77198 30 28 134613 31 29 115666 NNH Imus be M Table 21 31 variogram cloud estii nates for pairs of stations corresponding to distai ices betwen 590 and 600 NM The ith variograi39n cloud estii nate is based on m pairs of observations Standard tests for equality of variai39ices are the likelil iood ratio test and Bartlett s 139ngt1ificatio139i sm eg Kendall and Stuart 1979 section 249 For the likelihood ratio test we work not with but with Sini we then define an overall variai39ice 32 Z SiN Wl l 1 N 2m and I T Znilog i1 0139 wl39iere r is the nul nber of groups here 31 Accordii ig to asyl nptotic tl39ieory wider the null l39iypotl iesis that the true variai39ices are all equal the distribution of T is approxii nately Xi r Bartlett s modification uses in place of 33 32 Z S N r recoInputing T with tl39iese values and then defines r 1 1 1 1 T 1 7 7 7 T 3r 1ni 1 N r The distributioi ial appl OXiIHaCiOll is again nil but the distributioi i of T is believed to be closer to this than that of T In the present example we find 32 778 T 718 32 805 T 651 The value of 32 is the one used for the superii39nposed plot with the circles in Fig 27 Based on T w xgo we r V i it the null l39iypotl iesis of l39ioi39nog1 1eity with a pvvalue of 00003 Based on T w xgo we r t the null hypoth is of l39ioi39nog1 1ity with a pvvalue of 00021 Either way the result points to overwlmh ning rejection of the null l39iypotl iesis Similar results are obtaii39ied for the vast i39najority of the vertical bins in Fig 28 Fig 29 Subdivision of stations inside dashed rectangle used for calculatioi39is of Table 22 As a second example of these calculatiol ls coi39isider the subset of stations 1391clgtsl by the dashed lines in Fig 29 Tl iese are 17 stations with latitude between 40 and 45 quotN and longitudes between 90 and 100 quotW This is a region for which one might anticipate the process would be reasoi lably l39ioi39nog1 1eous The calculatioi39is of T and the associated p values are shown for a variety of distanc in Table 22 The results sugg that the l39ioi39noge rity assui nptioi i is not good at short distances less than 140 NM but at longer distances is masoi iable Distai ice T r 1 p value NM 35 276 5 00002 70 302 10 0004 105 154 8 03 140 170 14 20 175 108 11 37 210 139 14 38 245 133 15 51 280 262 18 07 315 231 14 04 350 56 6 35 Table 22 Table of T values for homogeneity test 17 stations in latitudes 40 45 quotN and loi igitudes 90 100 0W 23 Fitting parametric models to the sample variogmm In this section we again assume we are sai npling from a homogeneous spatial process in which the variograi n has beei39i estii nated for a sequence of distances h by one of the i39netl39iods of section 221 Although the properties of the semivariogram estii nators 70 Sh and SIM have beei39i extei isively ii39ivestiga d for a single value of h a function over all 1quot they all lack a Very ii39nportai39it property Hey fail the coi39iditioi ial non positive delii39iiter coi iditioi i i39nei39itioi39ied at the ei id of section 21 Thus it is possible that spatial predi ions derived from such estii nators will appear to have i39iegative variai39ices The i39nost coi ni noi i way of avoidii39ig this dif culty is to replace the ei npirical 701 by soi39ne parametric form wl39iich is known to be coi iditionally non positiv d lii iite such one of the fai nilies listed in section 21 It may well be COHSi i X39 i on gei ieral statistical i39nodelii39ig groui ids to seek a parai39netric family which adequately i39nodels the ved data but this provides an additioi39ial and specific i39notivation to do that Note that in g 11 1quotlttl there is no need to restrict ourselves to isotropic i39nodels though it is usually coi39ivei39iiei39it to consider isotropic i39nodels first n i39 A p h V L Tl iree i39netl39iods will be coi39isidered 0 Least squares estii natioi i o i39naxii39nui39n likelil iood ML and restricted i39naxii39nui39n likelil iood REML 53 Classes of Nonseparable Spatio Temporal Stationary Covariance Functions Noel Cressie and Hsin Cheng Huang 1999 1 Introduction Let Zs t z s E D C Rd t C 000 denote a spatio temporal random process observed at N spacetime coordinates s1 t17 7SN tNi 11 Motivations 0 To achieve the optimal prediction of Zs0 to7 a model is needed for how various parts of the process co vary in space and time ZSo 750 So 750 CSm 750271Z E 1 Where E E covZ7 CSO to E covZs0 to7 Z7 and u E EZ 12 Previous Studies 0 Rodriguezlturbe and Mejia 1974 Chwl 01lth191gt02ltuw2gt 7 example Ch u H eXpEt91HhH E 921111 7 often chosen for convenience rather than for their ability to t the data well 7 at least they satisfy a positivede niteness 7 do not model spacetime interaction 0 Myers and Journel 19907 Rouhani and Myers 1990 Chwl 0101191 0201192 E certain con gurations often make 2 to be singular 0 Jones and Zhang 1997 7 developed a fourparameter family of spectral densities that yield such functions 7 not expressed in closed form 13 Objective 0 Nonseparable stationary covariance functions that consider spacetime interactions are in great demand 0 To introduce new parametric families Ch ulH covZs t7 Zs hgt u7 VB 6 G C R17 that will substantially increase the choices a modeler has for valid spatiotemporal stationary covariances o A new and simple methodology for developing whole classes of nonseparable spatiotemporal stationary covariance functions in closed form 2 Theoretical Results on PositiveDe niteness Assume that C is continuous and its spectral distribution function possesses a spectral density gw 739 2 0 o Bochnerls Theorem 1955 Ch u expihw iuTgwT dwdT o If C is integrable7 then gw 739 27070171 exp7ihw 7 iuTCh u dhdu 2w 1exp7iu739hw u du7 where hw u E 2W dexp7ihwCh u dh expiu739gw739 d7 0 for the construction of C7 or equivalently of y7 assume that WU u MW 10 1 satisfying the following two conditions Cl For each 01 6 Rd pw is a continuous autocorrelation function7 Mun u du lt 007 and Mu gt 0 C2 fkw dw lt oo 0 Under the two conditions Ch u E expihwpwukw dw 2 0 Simply de ne hw u 7 POI 7 W7 01 9w7739 d7 0 Main goal is to nd functions hw u given by 1 satisfying Cl and C2 and for Which the integral in 2 can be evaluated 3 Classes of Continuous SpatioTemporal Stationary Covariance Models ex 1 Let pwu exp7HwH2u24 exp76u2 6 gt 0 and kw exp7COHwH24 co gt 0 Therefore 2 7d 2 llhll2 2 Ch u olt u 50 exp7u2 C0 exp76u 6 gt 0 As 6 A 0 Ch39ulHa 2exp 7m WhereHaba2 agt0 bgt0 a2gt0 7 a2u24r1d2 a2u2l 7 7 7 7 7 7 7 7 The semivariogram model is de ned by h 6 if u 0 7 W E 1 b2 h 2 a a21imexp a2ull2 l11r2a1llhll 2 0therW1se ex 2 Let pwu exp7HwH2u4 exp76u2 6 gt 0 and kw exp7conH24 co gt 0 Therefore 7 thl2 W 00 Chgu olt cord2 exp exp76u2 6 gt 0 A 5 0 S C 7 a2 WW 2 Chul9Wexp 7W a20b20a gt0 The semivariogram model is de ned by h 67 if uHhHO Y aul a2 17 1 exp 7 szhH2 alul1d2 a 1 u 72a1HhHa2 otherwise ex 5 Let pwu exp7HwHu2eXp76u2 6 gt 0 and Mu exp7conH co gt 0 Therefore HhHZ Ed12 Ch u olt 112 co d 1 W exp76u2 6 gt 0 As 6 7gt 0 02a2u21 2 Chu10W a20b20a gt0 h 6 if u HhH 0 v u E 2 2 2 7 a u 1 2 a2 4 a 1 a2u2 D2 b2h2d12 T a1HhH 0therW1se Let mu exp7wauexp76u2 6 gt 0 and W exp7conH co gt 0 Therefore HhHQ Ed12 Chiu Olt 5070 1 W efo 6u2iv 6 gt 0 As 6 7gt 0 02011111 1 Chu9 a20b2002gt0 a1u112 VHhH2d12 The sernivariograrn model is de ned by if u HhH 0 7hu19 E 2 7 1711 1 2 a2 a 1 au12b2h2d12T a1HhH 0therW1se Let 0V2 7 00 1101112 1101112 POInu 7 u2 Cod2 exp 4u260 460 and 1101112 Mu exp74 C0 co gt 0 Therefore Ch u olt exp7u2 coHhH2 7 aouQ a0 gt 0 co gt 0 So a f y y t39 t y stationary covariance family is given by Ch 7219 72 exp7aZu2 7 2HhH2 7 cuZHhHZ Where 9 a bc 72 a 2 01 2 O c 2 0 72 gt O The sernivariograrn model is de ned by if u O 7hu19 E a217 ep712u2 7 2HhH2 7 cu2HhHZ 7392 olehHe 2 otherwise ex 6 Let U2 50 HWHQ HWHQ 2 7 76 39 6 0 pwu GOV2 exp CO 460 eXp u 7 gt 7 and w 2 Mu exp 7 H4C 0 co gt 0 Therefore7 Ch u 0lt eXPu CoHhH2 aOW eXP5u2 do gt 0 As 6 7gt 07 Ch u H a2 eXp7a u 7 22 hH2 7 c u HhH2 a 2 07 b 2 07 c 2 07 a2 gt 0 The sernivariograrn model is de ned by if u HhH 0 7hu9 E a217 exp 7a u 7 b2HhH2 7 7392 othHe 2 otherwise ex 7 Let MW u 13 1 112 CHwH2 V d21 CHWH2Vd2 C gt 07 V gt 0 and Ma 1 CHWWrHZ c gt 0 V gt 0 Therefore7 V 1 u21 12 u21 12 2 V 2 d2 2 KV 2 1f gt 0 emu u 1u 6 uc u 5 7 1 u2 1Vu2cd2 1f 0 So a i y t A y t39 t y stationary covariance family is given by 0220d2 b azuz 1 12 p azuz 1 12 came 7 a m M K 1 aw 0 h f h gt 0 7 ed2 azuz 1ua2u2 0d2 1f 0 Where a 2 07 b 2 07 c gt 07 1 gt 07 a2 gt 0 The sernivariogram model is de ned by 0 if u HhH O 2 p 2 20 b 12712 1 1 701MB 0 1 12712 1 a2u2 cF1 2 12712 c Other mse a2u21 12 a M W T2a1HhH2 4 Discussion 0 A discontinuity at the origin is allowed by adding a nugget effect7 721h 07u 07 to Ch ulH o The spatial isotropy can be relaxed by replacing With HAhH for any nonsingular matrix A o The results allow the construction of valid covariance models in Rd1 based on spatial covariance models in Rd and R1 the modulus of the spatial lag the modulus 0fthe spatial lag 25 M C 3901 C F3 01 00 25 M Q 3901 5 Q m 00 Contour Plot of Chu in Ex 1 Contour Plot of ChU in EX 2 10 10 08 08 O E E a 06 06 3 C O LI39J 3 04 g 04 ll l E Q CD 5 02 02 00 00 00 05 10 15 20 25 the temporal lag the temporal lag Contour Plot of Chu in Ex 3 Contour Plot of Chu in Ex 4 U E E 5 D LO 0 E 5 m 3 6 O E CD E 00 05 10 15 20 25 00 05 10 15 20 25 the temporal lag the temporal lag Contour Plot of Chu in Ex 5 c0 Contour Plot of Chu in Ex 5 01 the modulus of the Spatial lag the modulus of the Spatial lag 00 05 10 15 20 25 00 05 10 15 20 25 the temporal lag the temporal lag Contour Plot of Chu in Ex 5 05 Contour Plot of Chu in Ex 5 010 10 08 CI D E E E E a a 06 CD 0 E E 5 quot5 I m 3 D 5 5 04 U quotC5 C O E E CD CD E E 02 00 00 05 10 15 20 25 00 05 10 15 20 25 the temporal leg the temporal lag Contour Plot of Chu in Ex 6 00 Contour Plot of Chu in Ex 6 01 10 08 C5 U E E E E 66 is 00 CD G E E a 5 U1 U 3 3 E 04 3 quotC5 quotC5 0 O E E CD CD E E 02 00 00 05 10 15 20 25 00 05 10 15 20 25 the temporal lag the temporal leg Contour Plot of Chu in Ex 6 05 Contour Plot of Chu in Ex 6 c10 10 08 CF D E E E E a a 06 CD 0 E E 5 quot5 I m 3 D 3 E 04 U quotC5 C O E E l CD CD E E 02 00 00 05 10 15 20 25 00 05 10 15 20 25 the temporal leg the temporal lag Contour Plnt 0f Chu in Ex 7c4v 5 25 10 20 08 CF E E E 15 00 CD 5 quot5 Ln 3 E 10 04 15 D E CD E 05 02 00 00 00 05 10 15 20 25 the temporal lag Slide 1 Slide 2 ST790 NCSU Fall 2004 NONSTATIONARY SPATIAL PROCESSES Montserrat Fuentes Statistics Department N CSU fuentes statncsueclu httpwwwstatncsueduNfuentes nstationary models I Consider a stochastic process Zs s 6 D where D is a subset of Rd drdimensional Euclidean space For example Zs may represent the concentration of 502 at a speci c location 5 Let M5 EZ5gt S 6 D denote the mean Value at location 5 We also assume that the Variance of Zs exists for all s 6 D Z is secondeorde r stationary if Ms E t and COVZ51gtZS2 C51i 52 where Cs is the covariance function Slide 3 Slide 4 The need for nonstationary models NilncACid concentrations Figure 1 This gure shows the output of Models73 for the week startr ing July 11 1995 We divide the domain in 9 suhregions quota a Z W m 5w 1 mm sum 1 m as 5m in in in g p g h g h m a m a 3 m l v 2 KalySsssim mgquot m a it t it in 1 in Figure 2 This gure shows the Matern Variograms 71 corresponding to the processes Z Note mix 010 7 Clix Where C is the corresponding covariance Slide 5 Slide 6 In this chapter we consider a number of di erent approaches to processes which are not spatially stationary a Deformation methods in which it is assumed that the process is stationary and isotropic only after some nonlinear deformation of the sampling space b Moving7window methods in which the predictor or interpolator at a particular location is based on a window of observations centered at that location c EOF and methods based on an eigenfunctions expansion of the covariance function d Kernel7based methods Deformation methods I We consider a spatial process Zs s 6 D where D Q Rd is a domain of spatial locations Usually d 2 or 3 here d 2 Spatial dependence is usually characterized in terms of either the covariance function C5152 Cov Z51Z52 5152 6 D or the dispersion D5152 Var Z51 7 Z52 5152 6 D Slide 7 Slide 8 Much of the literature is concerned with processes which satisfy some or all of i intrinsic stationarity D5152 depends on 51 and 52 only through the vector di erence 51 7 52 ii stationai ity C5152 depends on 51 and 52 only through 51 7 52 this implies intrinsic stationarity but not conversely iii isotropy D5152 or C5152 depends only on H517 52H the Euclidean norm of 51 7 52 or equivalently the Euclidean distance between the locations 51 and 52 In this case we often write D5152 270H51 7 52H where 70 is an isotropic semivariogram function When all of i7iii hold we shall call the process homogeneous Classical geostatistics is concerned primarily with homogeneous processes for which by now a very extensive literature exists eg Cressie 1993 Until recently however not much was known about the modeling of inhomogeneous processes One old approach Journel and Huijbregts 1978 for stationary non7isotropic processes is to write the dispersion in the form D5152 270HA051 52W or by extension 171 D51gt52 2 EMU1761 52M Here 140141 are arbitrary matrices and 7071 isotropic semivariogram functions However this is still quite a restrictive class of models Slide 9 Slide 10 A much more radical extension has been proposed in a series of papers by Sampson and Guttorp i see in particular Sampson and Guttorp 1992 They considered models of the form D51 52 270f51gtf52 with 70 again an isotropic semiVariogram and f a smooth nonlinear map from Rd to Ralf In principle one may permit J 3i d though in most of the SampsoneGuttorp work it is assumed that d d and we shall continue to assume that here The idea is that the map f takes the coordinates from the real geographical or G space into an alternative dispersion or D space in which the process is homogeneous This approach may not be universally applicable to inhomogeneous processes The precise original methodology used by Sampson and Guttorp contained a number of rather ad hOC features Brie y it consists of three stages a A mapping of the 7L sampling points from the G space into the D space is found to minimize a stress criterion 8 zlt7 hr where d is the obserVed dispersion between sites i and j h is the distance between sites i and j in D space and the minimization is taken oVer all monotonically increasing functions 5 This formulation of the problem permits it to be solVed by a multidimensional scaling MDS algorithm Slide 11 Slide 12 b The mapping of the N sampling points is then extended to a smooth function from the entire G space into the D space using a representation based on thin plate splines c The function 5 is replaced by a smooth function g so d N gh7 which satis es the positiVe de niteness condition required for g to be the Variogram of a homogeneous process For this purpose Sampson and Guttorp used a Very general representation of g as a mixture of Gaussianetype Variograms The SampsoneGuttorp approach implies some restrictions on the models considered In particular by using MDS to model the locations so that increasing distances correspond to increasing dispersions the possibility that g may be nonemonotone is excluded Maximum likelihood Versions of the method were deVeloped by Mardia and Goodall 1993 and Smith 1996 There are also two recent Bayesian approaches due to Damian et aL 2001 and Schmidt and O Hagan 2000 Examples presented in class Slide 13 Slide 14 MovingWindow Approaches I The idea of a mOVll JnglDdOW approach is that to t a spatial model and to perform kriging at a sampling location 5 we should restrict ourselves to a window of sampling stations close to 5 within which it is reasonable to assume a homogeneous model Thus the method retains all the mathematical techniques of homogeneous processes while not assuming that homogeneity applies across the whole sampling region For the present description we will follow Haas 1995 who develops the method in the context of spatioitemporal processes Suppose we have spatioitemporal data Zt 5 where t denotes time and 5 denotes space Speci cally we have a sample Ztz 51 at 7L timeispace points 7251 1 S i S In most environmental applications this will consist of a xed time series of observations at each measuring station 51 but this format is not required for the methodology Slide 15 Slide 16 I The method requires the speci cation of two parameters the time window mT and the sampling fraction fa I Once these parameters are speci ed the window is de ned as follows Suppose we want to predict or interpolate at a speci c time to and location 50 Restrict the observations to those which lie within the time window to 7 950 Within that window pick out observations in order of space ie rst select all the observations at the spatial location closest to 50 then those at the location second closest to 50 and so on until a xed number m nfc of observations has been selected Prediction at to 50 will be based entirely on this group of g observations I The next step is to consider the form of regression model suitable for both the mean and stande deviation of Z Haas considered a general model of the form Z025 Mts Mt75 Rtgt5 in terms of additional functions M and 1b where M is typically a regression function of coVariates such as meteorology in terms of additional parameters 3 which are also estimated separately within each window Slide 17 Slide 18 I For model tting and kriging it is necessary to specify a suitable spatioetemporal corelation structure for the residual process R5t restricted to the given window The basic covariance model assumed by Has is CRt1gt51gtRt252 arm emcsm e 51 1 where CT denotes the temporal covariance function and 05 the spatial covariance function each of which has been assumed stationary within the window For the functions CT and 05 he assumed the spherical form of isotropic covariance structure The same form of covariance is assumed for both the spatial and temporal scale though of course the parameters may be quite di erent for the two functions The product form in which the spatiortemporal oovarianoe function is written as a product of a function of spaoe and a function of time is known as the separability assumption and is widely discussed in the context oftimerspaoe processes It is an assumption which is very Widely used because of its oonvenience though it is often criticized as unrealistic When applied to actual timespalce data Onoe the model functions for it l or and Cs are parametrically speci ed under as assumption of joint normality we could estimate the model by maximum likelihood Haas avoided this but instead described an algorithm including first OLS and later GLS regression to estimate the parameters of it and ll along with the approximate WLs prooedure to estimate the parameters of CT and OS I Finally once the model is tted kriging is used to calculate an optimal predictor at to so say Z toso and its prediction standard error Slide 19 Slide 20 Reconstructing the full covariance matrix One disadvantage of the wov39ingewindow approach is that it does not lead to a single model to describe the whole data set For example di erent covariance functions are tted to di erent portions of the data set and if we simply combine these together to form an overall estimated covariance matrix the result may not be positive de nite Though Haas 1996 suggested one way to get a positive de nite estimate of the full spatial covariance by nding the nearest in some metric positive de nite matrix to the spatial covariance matrix estimated from the moving window approach this is not very intuitive and does not correspond to a model for the continuous underlying eld Examples presented in class The EOF method I EOF Empirical orthogonal functions A very classical approach to spatial statistics dating back at least as far as Cohen and Jones 1969 is to represent a spatial eld in terms of the KarhuneneLoeve expansion of its covariance function This leads to representations of the form Zs Z apAi2 ps where 115 are a xed basis of orthogonal functions M are coef cients to be estimated and up are independent standard normal RVs Models of this form have become very widely used in geophysical sciences see eg Creutin and Obled 1982 Slide 21 Slide 22 The covariance of Z would be of the form Cltsygt wasmm The corresponding approximation to the covariance C of Z corresponds to an empirical orthogonal function EOF decomposition principal component analysis of the covariance matrix of the observations we truncate the sum at 1 This covariance or variogram is calculated with observations assumed to be realizations of the spatial process replicated over time The estimated coe icient AV is also called the yeth spatial principal component is the orthogonal projection of the original process at location 5 onto the yeth spatial EOF vector 72 We need replications over time to use this EOF method Nychka at al 2002 have recently proposed models of the EOFeform in which 115 are replaced by wavelet basis functions The wavelet representation is motivated by nonstationarity and they also emphasize the computational applicability of the approach in very large systems There is also the possibility of a mixture of the two kinds of models Nychka and Salthan 1998 Holland et aL 1999 based on representations of the form M 25 asgtp12Zos Z aux22145 111 in which Z0 5 is a stationary isotropic process p is a positive constant and 05 is a scaling function Slide 23 Slide 24 Models de ned by kernel smoothingl A broad class of stationary Gaussian processes may be represented in the form Zs Ks 7 uXudu with some kernel function and a constantevariance Gaussian white noise process Then the covariance of the process is Ch f Ku 7 hKudu For a Gaussian kernel x exp 7 Then the covariance of the process is Cu x exp Extension to nonstationary processes Higdon Swall and Kern 1999 considered extensions of the form zltsgt KsltugtXltudu where the kernel K5 depends on position 5 The ideais to model K5u as an unknown function in terms of speci c parameters which can then be estimated in a Bayes framework A A Figure 3 Convolution approach Higdon with dinean kernel functions Then the covariance of the process is Cu u st uKS 39uds Slide 25 Slide 26 In the case where K5 is a Gaussian kernel for each 5 this leads to tractable expressions for the covariance function and hence the likelihood function for the process Gaussian kernel K5u x exp 7uTUS 1u A new model for nonstationarity Consider a Gaussian spatial process We represent Z as a convolution of local stationary processes Fuentes 2001 Fuentes 2002 and Fuentes and Smith 2001 29 D Kx7sZ9sxds where K is a kernel function and Z9ac ac 6 D is a family of independent stationary Gaussian processes indexed by 0 Slide 27 Slide 28 The covariance C5152 0 of Z is a convolution of the local covariances 095S1 52 C5152 0 K51 7 5K52 7 5C55 51 7 52d5 D If K is a sharply peaked kernel function and 05 varies slowly with 5 this has the property that for X near 5 the process looks like77 a stationary process with parameter 05 On the other hand since 05 may vary substantially over the whole space it also allows signi cant nonstationarity The method has features in common with Haas s approach but there is no problem about it being a well7de ned process with a positive de nite covariance function DISCRETE VERSION Suppose 05 is constant within subregions of stationarity 5 Then the nonstationary process Z observed on a region D is a MIXTU E of orthogonal local stationary processes k ZX Z ZzXWzX where 51 Sk are well7de ned subregions that cover D and Z1 is a local stationary process in the subregion SI wzx is a positive kernel function centered at the centroid of 5 The value of k number of subregions is chosen using a BIC or AlC criteria Slide 29 Slide 30 The nonstationary covariance of Z is de ned in terms of the local stationary covariances of the processes Zz for i 1 Ic k COVZXgt ZY ZWzXWtYCOVZzXgt ZtY this is a valid nonstationary covariance The covariance parameters can be estimated with the MLEs or using a Bayesian approach We use the estimated covariance for prediction kriging Example of nonstationary kriging Fuentes 2002b i i 39lli Figure 4 The graph shows the locations of the 513 sites Where the ozone ambient concentrations are measured hourly Slide 31 Slide 32 Figure 5 Each graph shows for each subregion the empirical semivariogram circles and the likelihood estimate solid line based on the oovarianoe model COVZXgtZY 231 KX SiKy SiCessX Y Ozone Standard years 35797 Figure 6 The graph shows the interpolated kriging values ofthe ozone air quality design values ppb The dots show the location of the monitoring sites the design values are calculated as the Sryear average 9597 ofthe annual fourthehighest daily maximum amphour average ozone oonoentration Slide 33 Slide 34 Standard Error in the Prediction Figure 7 Standard error for the posterior predictive distribution of the ozone ambient air quality design values ppb using data from 1995 to 1997 The dots show the locations ofthe monitoring sites for the ozone REFERENCE I Cohen A and Jones RH 1969 Regression on a random eld J Amer Statist Assac 64 117271182 Creutin JD and Obled c 1982 Objective analysis and mapping techniques for rainfall elds an objective comparison Water Resararces Research 18 413431 cressie N 1993 Stattstr39cs far spattat Data Second edition John Wiley New York Damian D Sampson P D and Guttorp P 2000 Bayesian estimation of semirparametric nonrstationary spatial covariance structure Eacr39rarcraetrrcs 121617176 Fuentes M 2001 A new high frequency kriging approach for nonstationary environmental processes Eactrametrrcs 12 469483 Slide 35 Slide 36 Fuentes M 2002 Spectral methods for nonstationary spatial processeer39araetrr39Ra 89 197210 Fuentes M 2002b Interpolation of nonstationary air pollution prooesses a spatial spectral approach Stattsttcal Madeltrca 2 2817298 Fuentes M 2003 Statistical assessment of geographic areas of compliance with air quality standards Jaarrcal 0f Geaphysiaal Research 7 Atmaspheres to appear Fuentes M and Smith R 2001 A new class of nonstationary models Tech report at North Carolina State University Institute of Statistics Mimeo Series 2534 Haas TC 1995 Iocal prediction of a spatiortemporal process with an application to wet sulfate deposition J Amer Statist Assac 90 118971199 Haas TC 1996 A method for statistically assessing spatiortemporal pollutant trends and meteorological transport models Preprint University of WisoonsinrMilwau ee Higdon D Swa11 J and Kern J 1999 Nonrstationary spatial modeling In Bayesian Statrstr39cs 6 eds M Bernardo et at Oxford University Press pp 7617768 Holland D Saltcman N Cox LH and Nychka D 1999 Spatial prediction of sulfur dioxide in the eastern United States In aea I 7 Geastattstrcs far Envinmmental Appttcatrarcs eds GomeaHernandec 1 Soares A and Froidevaux R Kluwer Dordrecht 65776 Journe1AG and Huijbregts 01 1978 Mtrcrrca Geastattstr39cs Academic Press London Mardia KV and Goodall CR 1993 Spatialrtemporal analysis of multivariate environmental monitoring data In Matttaarr39ate Fractrarcraerctat Stattsttcs eds GP Pati1 and CR aoE1sevier Science Publishers pp 3477386 Matern B 1986 spattat Vartattarc Iecture Notes in Statistics Number 36 Springer Varlag New York Second edition originally published in 1960 Slide 37 Nychka D and Saltzman N 1998 Design of air quality networks In case Stadtcs in Eactmamcatot Stottsttcs eds D Nychka W Piegorsch and LH Cox Lecture Notes in Statistics number 132 Springer Verlag New York pp 51776 Nychka D Wikle c and Roy1e A 2002 Multiresolution models for nonstationary spatial covariance functions Stottsttcol Madslt39ng 2 2997314 Sampson PD and Guttorp P 1992 Nonparametric estimation of nonstationary spatial covariance structure J Amer Statist Assoc 87 1087119 Schmidt A M and O7Hagatn A 2003 Bayesian inferenoe for nonstationary spatial covariance structure via spatial deformations Jaurnal 0f the Royal Stottsttcol soctcty Swim B to appear Research Report No 49800 Department of Prohahi1ity and Statistics University of Shemeld Smith RL 1996 Estimating nonstationary spatia1 corre1ations Preprint University of North Carolina LIKELIHOOD REML AND BAYESIAN ESTIMATION Montserrat Fuentes Statistics Department N CS U fuentes statncsuedu http wwwstatncsuedu Nfuentes Estimation of the parametersl Maximum likelihood estimation If we assume that we are sampling from a Gaussian process then it is straightforward in principle to write down the exact likelihood function and hence to maximize it numerically with respect to the unknown parameters Kitanidis 1983 and Mardia and Marshall 1984 were the rst to advocate estimating spatial processes in this way The spatial model we consider is ZNJWX E 1 with Z an n dimensional vector of observations X an n X q matrix of known regressors q lt n X of full rank 6 a q vector of unknown regression parameters and E the covariance matrix of the observations In many applications we may assume 2 04V 9 2 where 04 is an unknown scale parameter and V09 is a vector of standardized covariances determined by the unknown parameter vector 9 Parametric models for the covariance For example the exponential variogram structure is equivalent to a covariance function 0 1 if 51 52 cov Z 51Z 52 3 lt lt Clexp 51 52R if513 52 so we may de ne 04 1 q5 COCO 1 the nuggetzsill ratio7 t9 q5 R and let V09 denote the matrix Whose diagonal entries are all 1 1 q and off diagonal entries are of the form 1213 exp dijR Where dz j is the distance between the i th and j th sampling points We assume V09 is nonsingular The Matc m class of covariances This was originally given by Matern 1960 but largely neglected in favor of simpler analytic forms However more recently Handcock and Stein 1993 and Handcock and Wallis 1994 demonstrated its exibility in handling a variety of spatial data sets The class is best de ned in terms of its isotropic covariance we have Ch Collh Where 000 2 1 and 1 2mt 9 2 C a gt 292 1F92 62 91 Here 91 gt 0 is the spatial scale parameter and 92 gt 0 is a smoothness parameter The function is the usual gamma function While K92 is the modi ed Bessel function of the third kind of order 92 Abramowitz and Stegun 1964 As special cases 92 corresponds to the exponential form of semivariogram and the limit 92 gt 00 COW results in the Gaussian form With Z de ned by 1 its density is lt2wgt 2121 12exp lt4 1 T 1 ZXB 2 z Xm Consequently the negative log likelihood is given by 1 1 as 04 a glong Hg log 15 log V 9lZ XBTV61Z X 5 Although maximum likelihood estimation appears to be computationally feasible opinion is still divided concerning its desirability When compared With simpler methods such as the approximate WLS weighted least squares method due to Cressie et al 1980 o Asymptotic properties of maximum likelihood estimators were considered by Mardia and Marshall 1984 who showed that the usual asymptotic properties of consistency and asymptotic normality are satis ed under a form of increasing domain asymptotics However the conditions given by Mardia and Marshall are not particularly easy to check especially in the case of an irregular lattice of sampling points and there is no indication of how large the samples need to be for asymptotic results to be reliable indicators of sampling properties Another issue concerns possible multimodality of the likelihood surface An example given by Warnes and Ripley 1987 and repeated by Ripley 1988 suggests that this can be a problem even with the simplest spatial models The theoretical bene t of maximum likelihood is that we can expect the estimates to be more e icient than the alternative methods in large samples However it is not clear how big a bene t this is A simulation study by Zimmerman and Zimmerman 1991 compared the MLE approximate WLS and a number of alternative estimators concluding that the MLE is only slightly superior to the approximate WLS procedure from this point of view Restricted maximum likelihood The idea of restricted maximum likelihood or REML estimation was originally proposed by Patterson and Thompson 1971 in connection with variance components in linear models The motivation behind REML estimation is perhaps best expressed in a very simple case EXAMPLE Suppose Y1 Yn are iid NW 02 with unknown u and 02 The MLE of u andogare 1 AY n and A2 1 2 2 YQ Y a 2 gt i However 32 is a biased estimator whereas the more usual unbiased 1 2m 37 n l estimator of 02 is 1 Then it appears that the maximum likelihood estimator is not the best one to use in this case Suppose that instead of basing the maximum likelihood estimator on the full joint density of Y1 Y we base it on the joint density of the vector of contrasts Y1 37 Y2 37 34H 37 whose distribution does not depend on u The maximum likelihood estimator of 02 under this formulation turns out to be the unbiased 1 2m 37 n l estimator 1 Thus by constructing an estimate of 02 based on an n 1dimensional vector of contrasts we appear to have done better than the usual MLE based on the full ndimensional data vector 10 This idea can be extended to the general model de ned by 1 and 2 If we let W ATZ be a vector of n q linearly independent contrasts ie the n q columns of A are linearly independent and ATX 0 then we nd that W N N0ATEA and the joint negative log likelihood function based on W is of the form gin0419 n q n q 2 2 It is possible to choose A to satisfy AAT 2 I XXTX1XT7 ATA 2 I Patterson and Thompson 1971 1 1 log27r log 15 log IATV9A1WTATV9A 1W 11 In this case a further calculation shows that n n q 2 logXTV61X 6 1 Woa6 2 q10g27r leggy 5 log IXTX 1 1 2 510gV6 0 9 Where 026 is the sum of squares of generalized residuals 02 Z XBgtTv 1ltZ X8 7 and E XTV1X1XTV1Z is the GLS estimator of 6 based on covariance matrix V 12 Coord Y data 40 20 60 120 100 60 40 A A 0 s a 8 0 x o 39 o on LAA 15 N o 00 o Wx a0 b 00 w 99 o c x gt O o 0 9 E o A 8 0808 0 g x 0 Cl 0 0 8 86 00 0 ogao b ago a o Y o o I o I I I I I I I I I I I I I 40 20 0 20 40 60 0 20 40 60 80 100 120 CoordX data 0 o o o QC Pquot 00 o o c 0 o 5 o o gt 39 800 0 o o 8 6 E o a 390 o o d3 o 8 o o ma a o a 0 o W 0 o o o o o 00 I I I I I I 40 2o 0 20 4o 60 C rdX CoordX Figure 1 Example Fish data 13 empirical and estimated variograms 8 o O O LO 0 8 0 LO 0 O 39 39 39 39 39 39 quota 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 n 39 I o quot 390 2 f o E O 039 3 8 II o O O 39 ML 3 REML OLS WLS O 9 O 0 20 40 60 80 100 distance Figure 2 Semivariogram OLS and WNLS estimation with likelihood and REML vs 14 kriging estimates kriging std errors O O V V O O N N gt gt 99 99 I O I O C C 9 9 8 8 8 8 2 I 2 I O O V V O 0 LO LO 40 20 0 20 40 60 40 20 0 20 40 60 locationsx locationsx Figure 3 Universal Kriging and Standard errors in prediction using a geostatistical approach 15 Bayesian procedures Bayesian procedures to spatial statistics have been considered by a number of authors in particular Le and Zidek 1992 and Handcock and Stein 1993 The latter authors considered the model de ned by 1 and 2 with the improper prior density 9 WM oc 8 for some prior 716 The posterior density takes the form Win oc wen a Wlwml m G X exp 2 XBTV61Z 25 16 De ning 2 XTV 91X1XTV 91Z and ignoring constants the previous equation leads to WWW anZIV9I1Zexp 0250 lt9 exp iw BgtTXTVlt6gt 1Xlt6 8 Integrating out with respect to B we obtain 2 ag 912 OC a nZlvwn l exp 02 faq21XTV9 1Xl 1 10 and a further integration with respect to 04 leads to 711913OC 7r6quotIV6 I1202GU gmIXTV6 1XI12 11 17 Using the prior 7r6oa 9 oc for some prior 7r 9 The posterior distribution of 6 given Z 04 and 9 is multivariate normal with mean 6 66 ie the GLS estimator of 6 given the covariance matrix V099 and covariance matrix 04XTV61X1 The posterior distribution of 0 given Z and 9 is A 71 q 061279 N a9X2 n q 18 PRIORS The choice of 7W9 is rather arbitrary Improper priors ie a function that integrates to 00 often yield proper posteriors but this is not the case here An improper at prior not integrable at 00 for the range and smoothness parameters of the covariance give improper posteriors eg Stein 1999 We recommend to use vague informative priors for these two parameters 19 L J X Density 0004 0000 300 350 400 450 500 550 Density 10 15 05 Figure 4 Posterior for sill and range for sh data 20 o O o O O O 00 8 8 C O C Q g g o g g I o I O O V 8 bayesian o N REML WLS O 0 20 40 60 80 100 distance Figure 5 Semivariogram estimation pared to REML and WNLS O O o O O LO 0 8 0 LO Q o O V O O 9 o variogram O posteriormean O N varlogram posteriormedian 8 parameters posterior mode 0 0 20 40 60 80 100 distance using a Bayesian approach com 21 UNC Biostatistics October 26 2005 SPATIAL ASSOCIATION BETWEEN FINE PARTICLES AND HUMAN HEALTH EFFECTS M Fuentes HR Song S Ghosh B Reich Statistics Department N CSU D Holland US EPA and J Davis Marine Earth Atmospheric Sciences NCSU What is PM I Particulate matter PM is a collective name for ne solid or liquid particles added to the atmosphere by processes at the earth s surface Particulate matter includes dust smoke soot pollen and soil particles PM has been linked to a range of serious cardiovascular and respiratory health problems Some of the recent epidemiologic studies suggest that exposures to PM may result in tens of thousands of excess deaths per year and many more cases of illness among the US population Fine particulate PMm are particles With aerodynamic diameter less than or equal to 25 micrometers In 1997 EPA established standards for PMM since these particles caused the greatest health concern because of their ability to penetrate into the respiratory tract Main constituents of the PM mixture sulfate ammonium nitrate total carbon and crustal material that contain calcium aluminum silicon magnesium and iron O 10 ugh nquot3 O 1Eyugquotc1 rg Su atva kw Ammaniurn V 1 Nilraha Stinging r TcrlalCartnzan Crustal Figure 1 Annual average PM25 concentrations ug m3 and particle type in 2002 Objectives I Our main objective is to quantify uncertainties about the impacts of ne PM exposure on mortality We use the best available spatial PM25 information from monitoring networks output of numerical models and MODIS satellite data to better estimate the hypothesized increased rate of mortality With increased PM levels We take into account ozone weather socio economic factors and other confounders In addition a spatial temporal analysis is used to identify the constituents of the PM mixture that are the most signi cant in causing mortality DATA I There are two main monitoring networks for ambient PM25 and its constituents in the US o The Federal Reference Method FRM monitoring network Following the guidance on monitoring published by EPA a national network of about 1000 PM25 monitoring sites has been established 0 The Interagency Monitoring of Protected Visual Environments IMPROVE network program includes measurement of the composition and concentration of the ne particles at 156 national parks and wilderness areas as required by the Clean Air Act 15ltmeanlt20 20ltmean Figure 2 Annual PM25 values Mgm3 The National Ambient air quality standard is 15 ugm3 locations reporting a value above this threshold are out of compliance Satellite data for air pollution for the rst time Moderate Resolution Imaging Spectroradiometer MODIS satellite data The MODIS Aerosol Product monitors the ambient aerosol optical thickness Prior to MODIS satellite measurements were limited to re ectance measurements in one GOES METEOSAT or two AVHRR channels There has been no real attempt to retrieve aerosol content over land on a global scale We use MODIS daily satellite data as an important covariate in our statistical model for ne PM and its components The spatial resolution is 10kmgtlt 10 km Layer 1 HNOSe IC0NC 35 I17 11 lull x Ium lulu Imus E WE July JESS I7llll gt4ch Mill IUIIIIIINEEJII MIX quot1119318553 An important source of information on pollution levels in particular of ne particles over large areas can be obtained from the regional scale air quality models Our approach adjusts for meteorology ozone and other confounding in uences Weather variables Average Daily Wind Speed Average Daily Dew Point Temperature Max and Min daily Temperature Average Daily Station Pressure Based on eight 3 hourly observations per day 10 Figure 3 Weather stations 363 11 The National Center for Health Statistics has given access to mortality data natural deaths and cardiovascular mortality Monthly mortality counts have been obtained from the National Center for Health Statistics With counts of mortality by county for all counties in US for 1999 and 2000 12 050 E E 51100 1o13oo 301500 5011477 Figure 4 Mortality counts natural deaths per county for 2000 J une 2000 13 Numerical Model CMAQ Spatiotemporal Model Stage 1 Satellite Data gt MODIS Monitoring data 4 IMPROVE 4 Weather Data I Ambient Pollution Concentrations Environmental Health Model Stage 2 Socio Economic and Demographic Data Relative Risk of Mortality and Morbidity 4 Health Data 4 Weather Data 14 Generalized Poisson Regressionl Dominici Samet and Zeger 2000 analyzed data for the 20 largest US cities and then Daniels ct LL 2000 extended this analysis to estimate the PM mortality dose response curve This type of spatial analysis has been done only for PM because until recently daily PMM data were available at only a very small number of stations We plan to make a substantial contribution in addressing major limitations of previous studies by studying the association between mortality and ne particles PMM not at a few locations or just for the largest US cities but for the entire coterminous US taking into account the spatial variability of the PM eld We use a generalized Poisson regression GPR model 15 The GPR model subsumes the standard Poisson model and allows for over as well as under dispersion Let a response variable Yjt which is a count be a generalized Poisson random variable Consul 1989 To model mortality data is de ned as the number of non accidental deaths in county j and month t Following Famoye 1993 the probability function of a generalized Poisson random variable is given by y PrlY 21 lt u lt1aygty 1expw 1ozu y 10sz 1 y01 06 where om dId lt 0 mina0 u 2 0 and 04 are the parameters of the distribution 16 Notice that for 04 lt 0 the GPoi distribution is truncated at We denote Y N GPd u GPd 0u P0239u Where P0239u denotes a regular Poisson distribution With mean u 17 The following result provides statistical interpretation of the parameters u and oz Lemma IfY N GPozu then EY u and VarY LL1 04102 From the above result it follows that When 04 gt 0 VarY gt E Y and for 04 lt 0 VarY lt E Y thus representing the over and under dispersion respectively 18 A GPHB model for count data Gamma forj1J t 1T Xi an vfflttgtajlttgt vat v1jlttgtv1jlttgt EN 039 mm aw N Z Bjjt jlta 2 t 7 j j 7 jtT 1jta v 1jt 19 X j vector of county speci c covariates including weather and ozone The complex relations involving the weather data are modeled using a basis expansion with priors on the coe icients Vjt vector of exposure covariate in county j qut captures the regional clustering at time t We adopt a dynamic MCAR prior Zst is the the total PM mass at location 8 EU is the proportion of total PM mass explained by the ith PM component in county j 2O The MCAR prior Gelfand and Vounatsou 2002 used here is allowed to change over time by using dynamic coe icients 1 1 qgtjlttgt jlttgt N W Z Bjjtht em 7 m 7 7 We impose Z qbij 0 for 7L 1 I Banerjee and Carlin 2003 and Jin and Carlin 2003 have used MCAR for cancer studies We extend their models to include dynamic parameters eg E t that change with time Bjjt are the adjacency coe icients Bjjt 1 ifj is adjacent to j Bjjt 0 otherwise 23 t is a positive de nite matrix that explains the conditional variability and cross covariance relationships at time 73 between the different exposure variables sulfate ammonium nitrate total carbon and crustal material that contain calcium aluminum silicon magnesium and iron given the neighboring sites 21 For computational convenience in this study we model the precision matrix of the MIAR model as Wt D 32500 1 where I t is a symmetric pd JI gtlt JI matrix D Diagm is a J x J matrix and 29503 is a pd and symmetric I X I matrix that explains the variation across the I PM components An alternative approach to the MIAR is to use a geostatistical framework to explain the cross dependency working with the covariance matrix instead of with the precision matrix 22 LONG TERM TREND The complex relations involving the weather data are usually modeled using a Generalized Additive Models GAM Instead of putting a prior on the number of terms and then using RJMCMC we propose a different approach We write the long term trend f t as a countable or a large nite Lagrange polynomial wavelet or Fourier basis expansion with some chosen functions B t whose coe icients have to be determined N Z amt Instead of putting a prior on the number of terms RJMCMC we put independent priors on the coe icients Ci such that the prior for Ci is a mixture of a measure degenerate at 0 dirac delta and a normal Ni p7 cz39 oc 11115 1 w Nz The weight 111139 attached to 0 increases with increasing 7L The number of non zero terms then gets a prior implicitly induced from those on the ci s 23 Modeling Zst ne PMI 24 bias parameter MODELS OUTPUT noise parameter measurement error parameter GROUND DATA TRUE VALUES large scale MET VARIABLES short scale structure parameter 25 FRM and IMPROVE measurements are not ground truth there is measurement error Thus we assume there is an underlying unobserved eld Zs t which measures the true values of PM25 at location 8 We denote the FEM observation by Zpst A ZF87t Z87t F87tv eps N N 0 0 represents measurement error at location 8 We use a similar representation for the observed IMPROVE measurements Z 2W Zst 61w 61st N N0a represents the measurement error at location 8 26 We model the output of the EPA deterministic numerical models as follows 2a am bstZst epst 1 as 73 measures the additive bias of the air quality models at location 8 and bs t the multiplicative bias The process epst N N0 0 explains the random deviation at location 8 and time t with respect to the underlying true process Z s 75 Since the outputs of CMAQ are not point measurements but areal estimations in subregions R1 RL that cover the domain D we have 2mm astds b Zstds epstds 2 R R R forl 1L 27 Z is unobservable and modeled Z 0M da the spatial temporal trend ust is a function of MODIS satellite data and some metereological covariates f1 fp with unknown coef cients 3 M amp0 The coe icients vary in space and time We assume Z s t has zero mean uncorrelated errors s t The coe icients can be modeled using a simple dynamic spatiotemporal model with a purely temporal component 71 and a spatiotemporal component 75 t amp With n 7 Dn0 77 is a spatial temporal correlated error term 28 For spatial prediction we simulate values of Z from the posterior predictive distribution PZZ 2 where Z are all available data from the IMPROVE and FRM monitoring stations ie Z ZAF 21 The output of the numerical models for the entire country is denoted as For measuring the association between PM and mortality we use values of the parameters 7 and 3 estimated from the posterior distribution Pm lx 22 Prior for 3 We treat 35 as a spatial process with a multivariate normal prior 29 Application In the following application we present only spatial analyses Mortality natural deaths per county Ozone Apart from the weather variables we also include the ozone as a confounder in our model We study the changes in the association between PM and mortality due to the ozone Linear Covariates Ozone max temp min temp dew point wind speed and elevation Age and ethnic group are intercept based covariates We also allow interaction between these two later covariates and the exposure variable The age covariate is treated as a categorical variable with 3 groups 0 14 15 64 65 gt We study 3 ethic groups Caucasian African American and Hispanic PRIORS The log of relative risk parameter has a multivariate spatial prior 3O The MIAR prior for the speciated components assumes a separable covariance The dispersion parameter in the GP was not signi cantly different from zero Poisson model 31 97 165 165216 216253 253281 281 302 302317 317338 338366 366403 403453 Figure 6 Weather variables Max temperature June 2000 We also use Min Temp Wind Speed Dew Point and Pressure as weather covariates 32 EXPOSURE 17000004932618 4932618 7481376 7481376 9490944 9490944 11075389 11075389 12324643 12324643 13909088 13909088 15918656 15918656 18467415 18467415 21700033 21700033 25799999 Figure 7 PMM total mass Mgm3 This is an average of the daily PM25 values for the month of June 2000 33 lab 5 has his E 3157 5532 5532 5352 5352 711 711 8096 8096 1022 Figure 8 Bayesian estimates mean of posterior distribution of log relative rates of mortality percent increase in mortality per increase of 10 ugm3 of PMM concentrations in June 2000 34 E 0951 2336 2335 2305 2305 3197 3197 3745 3745 5346 Figure 9 Bayesian estimates mean of posterior distribution of log relative rates of mortality percent increase in mortality per increase of 10 ugm3 of PM10 concentrations in June 2000 35 o magi H ML E Es efaf w E Figure 10 Posterior distribution for the log of relative rate of mortality due to N03 by geographic regions as de ned by the US Census Region 1 New England 2 Middle Atlantic 3 Midwest East North Central 4 Midwest West North Central 5 South South Atlantic 6 South South Central 1 vvest mountam E5 vvest FaCIHC y 36 West Southern California i n3 E E Q55 n 5E Ti 51 m 3393 885 g egg L m o 1 Figure 11 Posterior distribution for the log of relative rate of mortality due to 53904 by geographic regions as de ned by the US Census Region 1 New England 2 Middle Atlantic 3 Midwest East North Central 4 Midwest West North Central 5 South South Atlantic 6 South South Central 1 We st mountain 5 We st FaCIHC u 37 West Southern California Figure 12 a Mean of the posterior distribution for the log of relative risk of death due to PMM for the Caucasian population in California b African American population o Hispanic population 38 0021 324 32131462 Figure 13 Expected mortality June 2000 39 DIAGNOSTICS Figure 14 Selected locations for cross validation and model diagnostics 96 locations 4O original predicted Figure 15 Model diagnostics for mortality 95 credible intervals at loca tion 1th eliminating observation ith Good calibration only 4 of the time the truth does not lie in the interval 41 measured predicted Figure 16 Model diagnostics for PM 95 credible intervals at location 1th eliminating observation ith Total number of observations is 909 Good calibration39 onlv 4 of the time the truth does not lie in the interval 42 Model comparison by using the root mean squared prediction error all models have the same covariates GLlVl Poisson family 857 RMSPE Negative Binomial with dispersion parameter 244 Poisson framework with spatially varying risk parameter 099 Test for overdz39spersz39on No evidence of overdispersion dispersion parameter for negative binomial not signi cantly different from 0 Test for underoverdz39spersz39on No evidence of under overdispersion dispersion parameter for GP not signi cantly different from 0 DI C is smaller for Poisson 43 Conclusions I On average the log relative rate of monthly deaths associated with a unit change in the average total PM25 mass percent increase in mortality per increase in 10ugm3of pollutant is 7 The RR changes signi cantly with location from 2 eg WY to 11 eg NY The RR due to PM is about half the RR due to ne PM On the Western US the N03 and Crustal components seem to have more impact on human health than the other PM components On The Eastern US the 53902 and NH4 explain most of the PM effect All this might suggest a need for different regulations air quality standards in the Eastern vs the Western States In uence of ozone Ozone does not seems to have a signi cant impact on estimated mortality rates and RR parameters 44 Our current work I Adverse health effects are associated with inter individual variability in exposures that are not a simple function of ambient PM levels We have access to an exposure simulator model called the Stochastic Human Exposure and Dose Simulation for Particulate Matter SHEDS PM which is a population exposure and dose model for particulate matter developed by the US EPA s National Exposure Research Laboratory The SHEDS models are multi pathway probabilistic and physically based human exposure models that simulate variability and uncertainty in short term and cumulative human exposure and dose In our statistical framework SHEDS PM is used to provide a more accurate estimate of human exposures than simply assuming that exposures are directly proportional to ambient PM concentrations 45 Numeric al Model CMAQ Monitoring data FRM HVIPROVE STN Sp atiotemporal Model Stage 1 S atellite D ata gt MODIS 4 Weather Data Ambient Pollution Concentrations I Exposure Factor 4 Human Activity Distributions Exposure Model PM Panel Stage 2 Studies Data 39 census Data Population Exposure lt Health Data SocioEconomic Environmental Health and Demographic Model Stage 3 Data lt Weather Data Relative Risk of Mortality and Morbidity 46 SPATIAL STATISTICS Montserrat Fuentes Statistics Department N CSU fuentes statncsuedu httpWWWstatncsueduNfuentes Intro to spatial datal We clasify spatial data into one of three basic types I point referenced data The variable of interest varies continuously over a domain D and we observe it at some point locations in D I a39neal lattice data The domain D is partitioned into a nite number of areal units We observe the average or total of some variable of interest for each partition I point pattern data Now D is random its index set gives the locations of random events that are the spatial point pattern The variable of interest indicates where the event occurs or possibly gives some additional covariate information producing a marked point pattern process Slide 1 Slide 2 Point referenced data Often called geocoded or geostatistical data Example pollution concentrations at some monitoring stations Areal data Often called lattice data Example Mean income values for each county in Minnesota Some spatial data sets feature both point and arealelevel data Example of a dataset ozone values from monitoring stations in Atlanta and number of asthma cases in each zip code in Atlanta Point pattern data The response is often xed ocurrence of the event and only the locations where it occurs are thought of at random Example residences of persons su ering from a particular disease In some cases this information might be supplemented by age or other coariate infromation producing a marked point pattern Point level models I The location index 5 longitude latitude varies continuously over D Y is the process of interest Supposed we are given observations Y E Ysz at some known locations Sui 1 n We assume YlA N m 29 where NH is a nedimensional normal a is the mean and 26 gives the covariance betwee Ys and Y57 The covariance matrix 2 measures the strength of the spatial dependency between any pairs Ysz and Y57 Slide 3 Slide 4 Areal models I The geographic regions or blocks zip codes counties are denoted by BI and the data are typically sums or averages of variables over these blocks To explain spatial dependency we de ne a nighborhood structure One this is de ned models resembling autoregressive time series models are considered Two Very popular models SAR simultaneously autoregressive model and CAR conditional Writing Y E YB we assume Y and independent Y N N 02 and impose the CAR model zi 71N N M 2mm 7 MW 71 where ll qu j i 72 is the conditional variance and the a are constants such that aquot 0 for i 1 n autoregressive model The CAR model is used to incorporate spatial correlation through a vector of spatially varying random e ects lt75 451 on Slide 5 Slide 6 Point process models I In a point process the spatial domain D is itself random the lements of the index set D are the locations of random events that constitute the spatial point pattern Ys is 1 for all s 6 D indicating ocurrence of the event but it may also provide additional covariate information in which case the data constitute a marked point process The questions of interest are whether the data are clustered or not We use Poisson process with mean A to count the number of observations in D The expected number of ocurrences in a region Ais MA Cartography Map pro39 39ct39onsl We study the geometry of and determine distances on the surface of the earth A map projection is a systematic representation of all or part of the surface of the earth on a plane This comprises lines called Meridians longitudes and parallels latitudes Since the sphere cannot be attened onto a plane without distortion the strategy is to use an intermediate surface developable surface that can be attened The sphere is rst projected onto this surface which is then laid out as a plane Commonly used surfaces cylinder the cone and the plane itself Slide 7 Slide 8 The basic idea for map projection consider a sphere with coordinates A lt75 for longitude and latitude and construct a coordinate system acy so that x f y 90545 where f and g are appropriate functions to be determined depending on the properties we want our map to possess Equalarea maps Use to display arealireferenced data An example is a sinusoidal projection Sinusoidal projection Obtained by specifying lg15 R R radius of the earth this imposes equally spaced straight lines for parallels and results in the following analytical expressions for f and 9 f0 45 19605075 90 lt75 R45 Both the Equator and central meridian are standard lines thus the whole map is twice wide as tall Slide 9 Slide 10 Another popular equaliarea projection with equally spaced straight lines for the meridians is the Lambert cylindrical projection giVen by f A lt75 RA 905 lt75 R sin This projection s perspectiVe is easily Visualized by rolling a exible sheet around the globe and projecting each point horizontally onto the tube so formed In other words light rays shoot from the cylinders axis towards its surface which is afterwards cut along a meridian and unrolled Like most cylindrical projections it is quite acceptable for the tropics but practically useless at polar regions which are rather compressed resulting in a map much broader than tall Again like in other cylindrical projections deformation is uniform along the same parallel Properties I Meridians are equally spaced I Parallels get closer near poles I Parallels are sines I True scale at equator I History 7 anented in 1772 by Johann Heinrich Lambert with along with 6 other projections 7 Prototype for Behrmann and other modi ed cylindrical equaliarea projections Slide 11 Slide 12 1g pxojectlon Wlth a tangent cyllndez Slide 13 um 1 Schennatlc development oILambext s equalraxea cyllnducal Mezcatoz Fxojectlon MAW RA 9049 Rlntan7r4 dz The g1eat Flemlsh cutogxaphex Gezhazd Klemex ln d became lameus Wlth the Lat lze nanne Gexaxdus Mezcatoz A xevolutlonary a Joumey duectlen and keeplng a censtant beaung ls eneugh te get te one s destlnatlen Slide 14 l ll Jul L ll H HHHE WT Wm lg excatox map glatml e spaclng uxe enventlena equate a 107 nnap azbltzauly cllpped ai pda all ls 35 deg N and 35 deg s Pxopextles Confozmal It s a pzojectlon oz whlch lecal lnhnlteslnnal angles en a sphexe axe nnapped te the sanne angles ln the pzojectlon Fazallels unequalh spaced dlstance lncxeases away tom equate dlxecth pzepextlenal te lnczeasmg scale Loxodxomes ox xhumb llnes axe stxalght thumbs axe curves that mtezsect the mexldlans at a constant angle Used oz navlgatlon and zeglons neax equate Hlstory 7 Invented ln 1569 by Gexaxdus Mezcatoz Flanders aphlcally 7 Stande ox mantlme nnapplng ln the 17th and 18th centuues Slide 16 7 Used for mapping the worldoceansequatorial regions in 19th century 7 Used for mapping the worldUS Coastal and Geodetic Survey other planets in 20th century 7 Much criticism recently Northings and Eastings Map projections lead to complex equations relating longitude and latitude to the positions of points on a given map Thus rectangular grids have been developed in which each point is designated merely by its distance from two perpendicular axes on a at map The yiaxis is the chosen central meridian y increasing north and the xiaxis is perpendicular to the yiaxis at a latitude of origing on the central meridian with x increasing east The x coordinates are called eastings and the y coordinates northings The grid lines do not coincide with any meridians and parllels except for the central meridian and the equator Slide 17 Slide 18 Universal Transverse Mercator Projection UTM The world is divided into 60 northisouth zones each covering a strip six degress wide in longitude These zones are numbered consecutively beginning with Zone 1 between 180 degrees and 174 degrees west longitude going eastward to zone 60 between 174 and 180 degrees east longitude The northing values are measured continuously from zero at the Equator in a northerly direction to avoid engative numbers for location south of the Equator we assigne the Equator an arbitrary false northing value of 10000000 meters A central meridian through the middle of each 6 degree zone is assigned an easting value of 500000 The northing of a point is the value of the nearest UTM grid line south of it plus its distance north of that line its easting is the value of the nearest UTM grid line west of it plus its distance east of that line The UTM system was introduced in the 1940 s by the US Army It is widely used in topographic and military mapping Slide 19 Slide 20 Spatial modeling of pointileVel data often requires computing distances between points on the earth7s surface Thus we can wonder about a planar map projection which would preserve distances between points The existence of a map is precluded by Gauss7 Theorem Eggregium in di erential geometrial Projections perserVe area and shapes distances are always distorted Calculating distance on the earth s surface We must account for the curvature of the earth when computing distances We nd the shortest distance geodesic between two points P1 611 and P2 62x2 The solution is D Rab where R is the radius of the earth and lt75 is the angle measured in radians satisfying 00qu sin 61 sin 92 cos 61 cos 62 cos1 7 A2 The geodesic is the arc of the great circle a circle with radius equal to the radius of the earth joining the two points Slide 21 Slide 22 24 Kriging Prediction and Interpolation We now turn to the 11CX al topic of this sub t the use of spatial covariance modc for prediction and ii iterpolatioi i The i39iai39ne frecuiently used for this process is kriging though COI HI HOI lly used that term refers only to the coi39istructioi39i of a spatial predictor in te ns of ki iowi39i i39nodel parai39i ieters w39 reas our approach will ultii nately take the model paran ieters into accoui it well and in that sei39ise is more gei ieral than traditioi ial kriging The problei n may be stated in the following form given observatioi39is of a vector lie 281 282 28n pr t the value 230 for soi ne 1 A ge ralization is to pred39 t the Joint value at several points or an integral such Z A 4 zsds for soi39ne set A if z i39neasures the density of an ore then Z A i39neasures the total quantity of ore over a region However we shall see tl iese problei ns are generally dealt with a di t gei iralization of the 13911 Cl10 10l0gy for a single point so we coi icei itrate on that in our initial discussion We shall take three approac39 L L s to this an approach based on Lagrange i39nultipli an approach bas and a Bayesian approacl i The Lagrange i39nultipler approach is the most di ac vation of the kriging e imator and is the one most COIHI HOI lly given in textbooks but it does not give so i39nucl39i insight into what is going on The coi iditioi ial ii iferei ice approach extei ids the ideas ii ivolved in REML estii natioi i section 225 and shows how the kriging predictor may be derived a coi39iditioi39ial i39nean in an appropriate space of predictors Finally the Bayesian approacl39i is given which leads to the same ai iswers the stai idard kriging predictor wl39iei39i the i39nodel parai39ne EX39S f are known in the notation of sectioi39is 224 226 but it also extends to the case Wl1X39 tl iese parai i ieters are ui39iki39iown The reader i39iew to the subject is X39 COl 1113911 11 to pick one of the th approacl39ies and work through the fori39nulae in detail oi ice the i39netl iod is fully und tood from one approach it is relatively straightforward to cl ck that the other approaches lead to tl ai ne ai39iswers 241 Lagrange multiplier approach Let us write the vector Z 281 zsT and 20 280 We need to know the joint covariance matrix of Z and 20 let us suppose Cov 251 wl39iere 2 is the COV39dX39iaHCO matrix of Z 05 is the variai39ice of Zn and T is the vector of cr OV39dX i39dll betwec i Z and 30 For soi ne of our calculations followii39ig 231 we shall write 2 xi79 T aw 05 05009 252 in teri39ns of the scale parai39neter a and functions V w and vo of a linite dii nensional pa rai neter 6 84 The basic model will be assumed to be in 230 with Z X13 r for soi ne matrix of covariates X and we also sume zo cog3 no for soi39ne given vector 510 the vector of covariates at 20 or so if we are thinking in terms of the original stations Both 77 and no repre 1t randoi39n errors with mean 0 This is the uniuersal kriging problmn th ecial case EZu1 Ezo u 253 in which 1 denotes the n vector of ones and n is soi ne overall coi istant is the ordinary kriging problem in which there is an ui iki iown coi39ni noi i mean but no other re rion cof ci1391t We considr predictors of form a ATZ 254 subject to the coi istraint ATX scoT 255 The reason for the constraint 255 will appear i39noi39mn itarily Let us considr the predictioi39i error in 254 We may write A T T Zo Zn 3390 l3 3910 A X 391 255 770 ATTI wl39iere we have used the constraint 255 in other words the reason for this constraint is to i39nake the promdure work without assui ning 3 is known The reader might at this point be wondering why we i39nake such a big fuss about 3 being ui39iki39iowi39i when we are ii39nplicitly assui39ning that 6 or the ovariancs 2 and T are known this is a valid point to ra 3 but we return to it later If we assume 255 and hence 256 the mean squared predictioi39i error bco139nes I3zJ 20V 05 2ATT ATEA 257 We are tlmrefore led to the following constraiimd optii39nization problei n minimize 257 subject to 255 Solution to the constrained optimization problem Coi39isider the Lagrangian L 05 2ATT ATEA 2ATX 33 258 wl39iere 21 is a vector of Lagrai ige i39nultipliers S in A the optimal A will be attained at some stationary point of L Dil armitiatiiig with respect to the coi npoimi its of A in 258 this is achieved when 0 I39 2A Xv or in other words A 2 HT XV To nd 1 substitute back in 255 to get 1 XTE1X1U XTE1T The final result is A 217 2 1XXT21X10 XTE lT 259 or the prmlictor A a ATZ 519 XT2 17T3 TTzlz 260 The resulting prediction error variance 257 becomes 05 2ATT ATEA 03 2TT21T 2TT 31XXT21X1U XTE lT 7712 17 232 1Xr39Tz lx 1cJ XTE1T 50 XT2 17TXT2 1X 1z0 XTE1T 05 23 17 69 XT2 17TXT2 1X 1510 XT 31T 261 Since it will come in useful later we also give the extension of this calculation to handle the prediction covariance between two stations Suppose instead of a Sil lgl unobserved 20 we have to pr e t two variables 2 1 3 Ilm Zb b 3 I cor asponding to two ui iobserved stations 8 and 81 Suppose the JOIHC covariance matrix is given by Z 2 Tu Tb T COV Zu Ta 0m 0a b e 2 2 3b 71 gab abb and the optimal prmlictors are Eu A312 31 AZTZ where A 217 2 1XXT2 1X 1zru XTzzlra 1 1 T 1 1 T 1 2 63 Ab 2 7391 2 XX 2 X er X 2 71 The mean squared prediction error is then EZLAZTZZI A212 as win A37 sz 0a RUE 1714 2 1XXT2 1X 1zru XTz lrm 7393121Tb 2 1XXT 3 1X 1zb XTzlm 732 17 TEE XXTz EX 1ga XTz lm TEE 14X4XT214X15 XTz lm 51 XTz lmXTz EX 1gb XTz lm 0a 73217 51 XT217uXT2EX1gab XTzlm 264 I l 86 Essentially the sa139ne calculations are give1 1 in a 1391u139nber of other books on spatial statistics eg pp 48 49 of Ripley 1981 or pp 154 155 of Cre 39 1993 Note that the for139nula are so1 nti1 1 1s expre id in ter139r of the variogra139n ratl391er than the covaria1391c 139natri as but at least for stationary pro 7 it is straightforward to pass from one to the otl r Stationarity itself plays no role in the pr quot tion for139nulae we have derived though it is usual for the reaso1 1s explai1 1ed in earlier ectio1391s to work with eitl391er stationary or i1391tri1391sically stationary pron 23 Conditional inference approach This is co1 1ditio1391al in the se1391se that it exploits the d2co1391 1position i139nplicit in the 391tially 3 and a co1391 1po13911 1t which is ortl391ogo1391al to that and co13911itio1 1i1 1g on the er we can derive the kriging premlictor from this point of view As a first step we assume 3 is known We use the following classical result of mul tivariate analysis see eg Mardia Kent and Bibby 1979 p 63 if we consider a partitio1391d 139nultivariate 1391or139nal vector X1 1 211 212 N V X2 J 2 221 222 tl391e1391 the co1391ditio1391al distributio1391 of X1 given X2 is nor139nal with 139nean m Enzyme m 265 and varia1391ce 1 211 212222 221 266 Applying this result with 211 05 212 739 222 2 in our previous notation we deduce that the co1391ditio1391al distributio1391 of 20 given Z is 1391or139nal with 139nean 233 7T2 1z X 267 and varia1391ce 05 TTE lr 268 Wl ien 3 is unknown the obvious solution is to substitute 9 for 3 in 267 This leads to the proposed predictor 2 53 7T21z Xx ATZ 269 say wl391ere as the reader may easily cl391eck A is given by 259 Thus this argu139ne1391t leads very quickly to the correct for1 nula for 30 though it does not so far prove that it has any properties which might 139nake it desirable a predictor Note that the equatio1 1 255 wl391ich for139nd a key part of our earlier derivatio1391 follows directly from 269 Because 87 of this the prmliction error 20 ATZ has mean 3311 ATX 0 and variai39ice given by 261 before Tlie key step of this proof is to note that 20 ATZ is indepei ident of Z XE Since Z X13 is in one to oi ie correspoi idence with the vector W ATZ which is used to delii39ie the REML estii nator section 225 this establisl39ies that the conditional gistributioi i of Zn given W is nori nal with mean ATZ and variai39ice given by 261 To establish the indep1 1l1 1c just referred to write Z XX RZ wl39iere R I XXT2 1X 1XT2 1 and note that the covariai39ice of RZ and Zn ATZ is ERZ Xena sag3 Z XBT m7 2A and because aWarytl iing is jointly nonnaL it will suf ce to prove that the latter quantity is 0 However 7 2A XXTE 1X 1U XTE1T and he nc e my 2A I XXTE 1X 1XT2 1XXT2 1X 1U XTE1T XXT2 1X 10 XTE1T XXT2 1X 1U XTE lT which establishes the desired result 243 Bayesian approach The fact that the pr ing arguimzi it is equivalent to a Bayesian approach has been noted in other cont it lies at the heart of Meinhold and Singpurwalla s 1983 derivatioi39i of the Kali an lilterii ig aquations from purely Bayesian considerations The Bayesian approach geimraliws autoi natically to the case in which the variograi39n parai39neters are ui iknown wlmreas the cla sical approach ess tially i39nakes the assui39nptioi39i that these are known and only deals with the question of ui icertaii ity of i39nodel parai39neters in a very periplmral way This is one major reason for viewing the problem in Bayesian teri39ns and the close parallels betwmn this and the more traditioi39ial approaches of sections 241 and 242 adds to its justification The model tl39irougl39iout this discussioi39i is the same in sectioi i 241 writing the covariances in the form of 252 and taking 243 the prior The cl39ioice of 7r is largely arbitrary but the equivalence of Bayesian and least squares approach works only for the c ssical noi iinfori native prior on 3 88 Simplest case 3 05 9 all known This follows in seetioi i 242 ZUZ3 05 9 w jzro XT21TT3 TTE1Z 05 7712 17 270 Note that E T and 05 may all be written in teri ns of oz and 9 using 252 We shall now ii39nprove upon 270 by sum ssively rei novii39ig the eoi iditioi iii ig on 3 oz and G We write 7rfy for the gei ierie dei isity of one variable 1 eoi39ulitioned on another variable y Wl l X the variables 1 and y will be different from one usage to the next To rei nove the eoi iditioi iii ig on 3 we write Manila NZUIZJXmm llmmd 271 wl39iere the first factor ii39iside the integral is given by 270 and the second derived from 244 Note that it follows from 244 that the posterior distributioi i of 3 given Z a and G is i39nultivariate nori39nal with i39neai39i 3 39 ie the GLS estii nator of 3 given the Covariai39iee matrix V and covariai39iee matrix aXTl 91X1 Coi nbinii ig this with 270 we nd that the conditional distributioi i of 20 given a and 6 is multivariate normal with i39neai39i 309 30 XT2 17T TTzlz A 272 519 XTV 1wT3wTV 1Z and covariai39im matrix 510 XT21TTXT21X1510 XT21T 00 7712 17 lm XTV91 w9TXTV9l1Xl1 l o XTV9139w9l 2 73 vow TTVG1T Xi06 The next step is to rei nove the eoi iditioi iii ig on 05 Similarly to 271 we have 7rzoZ WZQIZQ 97FQ IZ9dX 274 The posterior distributioi i of a given Z and 9 may be obtained from 245 the result is that 6729 has a xiw distribution De ne Then with slight abuse of notation we have 7 may 39 9 2 1 xvi 1 Conditionally on Z and G we tl39iei i have L30 7 w 039 39Vo6 mm W N 275 N tnw since the numerator and dei39ioi39nii39iator in 275 are coi39iditioi ially independent given Z and The final result agrees with equatioi39i 31 of Hai ldcock and Stein 1993 except that they have a factor n n q i39nultiplying ampG which results from a slightly different delii iitioi i of the latter quantity Finally following Hai idcock and Stein we ii39itegrate over 9 to obtain 7rzoZ WZUZ7rZd 276 wl39iere the first factor in the integrand is determined by 248 and the second by 246 This part has to be carried out numerically but should be straigl itforward since for most i39nodels of interest the dii nm isioi i of 6 is 2 or at most 3 Hai ldcock and Stein give several exai nples An aside Bcsag s candidate 3 formula An interesting alternative versioi39i of 276 is the fori nula 7rzoz7rZ NZUIZ elmZ 277 given by Besag 1989 allegedly based on a student s answer to an exai nii iatioi i question Note that in 277 6 is fixed and arbitrary there is no ii39itegratioi39i in this fori39nula at least not explicitly The derivation of 277 is an i1 Ill Iljli39lttCg conseqtmnce of the fact that 7rzoZ7rzo Z and 7rzoz 97rZ 90 are each equal to the joint density of Zn and 9 conditional on Z The fori39nula is likely to brings any practical benefits for kriging but the fori39nula seems worth knowing about 244 Prediction at multiple sites The fori39nulae 263 and 264 are very easilyextended to multiple sites For example if s 31 sq are several sites for prediction with associated covariate vectors 51 511 zrq and covaria1391c1s Tu Tq 0W a defined by analogy with 262 then the optimal pre dictor of Caz 2be qzq whe e z zq are the unobserved values of the process at s sq and Cu cg are arbitrary scalars is T CuAu 39i39 CbAb 39i39 39i39 311 Z with A given by an obvious extension of 263 The prediction variance is obtained the sum of teri ns of the form covzu A32 21 AbTZ given by 264 with 11 ranging over all possible pairs of indices and the covarianc The extension to prlicting quantities of form ZA zsds A where A is some subset of the observation space should now be clear The point predictor is 24 A 2sds 278 where 28 is the prmlictor at the site and the prliction error variance is EZA 24 A A Ezsl 2slz82 382d81d82 279 where the individual covariance teri39ns are derived from 264 These calculations have been presei39ited for the most COI HI HOI lly ai39ialyzed jei39iario in which 3 is unknown and a and 6 are known The case where all three parameters are unknown is at the present time probably best l39iandled by Bayesian tcl39n391iques where 91 SPATIAL STATISTICS AND DATA ASSIMILATION Montserrat Fuentes Statistics Department N CS U fuentes statncsuedu httpwwwstatncsueduNfuentes Areal data models I I Spatial smoothers I Brook s Lemma and Gibbs distribution I CAR models 7 Gaussian case 7 NoneGaussian case I SAR models 7 Gaussian case 7 NoneGaussian case I CAR Vs SAR I STAR models Slide 1 Slide 2 Inference for areal datal Nate This chapter is very impartant far hmmhml spatial madellng 0f any type 0f data using MOMO methads For areal units the inferential issues are I Is there a spatial pattern How strong is it Spatial pattern suggest that observations close to each other have more similar Values than those far from each other I Do we want to smooth the data How much I If we modify the areal units to new units from zip codes to county Values7 what can we say about the new counts we expect for the latter give those for the former This is the modi able areal unit problem MAUP Exploratory tools I Proximity matrix W proximity matrix The entries in W connect di erent Values of the process Yb 7 Yn in some fashion Generally w is set to zero Examples symmetric W wij 1 ifi and j share common boundary wij could be distance between centroids of regions 139 and j wij 1 ifj is one of the K nearest neighbors of i W does not need to be symmetric The wij might be standardized by Zj wij wiir We can de ne distance intervals7 07117 dlyd 7 and so on Then7 Slide 3 Slide 4 we call I First order neighbors of unit 139 all units within distance d1 of i I Second order neighbors all units within distance d2 of i but separated by more more than d1 Analogous to W we can de ne W0 as the proximity matrix for the rstiorder neighbors This means 1 ifi and j and rstiorder neighbors And so on Measures of spatial associationl The standard statistics are the Moran s I and Geary7s C They are analogues for areal data of the empirical correlation function and the Variogram Moran s I WZWMK WOC 7 37 211739 wij Y2 I is not supported on 7171 Under the hypothesis of independence7 I is asymptotically normal with mean 171 7 1 Slide 5 Slide 6 Geary7s C n i 1 2739 wij i Zij wij i 37V C is neVer negative7 and has mean 1 for the null model a Low Values between 0 and 1 indicate positiVe spatial association Under the null hypothesis we have asymptotic normality However for testing is preferable to use Monte Carlo By permuting the Values of Yi s The correlogram is a useful tool to study spatial association with areal data 9 17 and obtain say I0 Then7 we replace it with and obtain Ia Working with I we can replace wij with the previously de ned w A plot of I versus 7 is called a correlogram If there is spatial pattern7 We expect I to decline in 7 Initially and then Vary about 0 Slide 7 Slide 8 Spa 1 smoothers W provides a spatial smoother We can replace K by Vi ZwinjWH 739 This ensures that the Value for an areal unit 139 looks more like its neighbors Alternativelyy We can consider to take into account the actual Value of v lt1wgtxa2 for a 6 071 This can be viewed as a lte r We will revisit this Brook s Lemma I Given 10117 7gm7 the full conditional distributions7 then gill7397 j for i 17 7a are uniquely determined Brook s lemma proves the converse7 and it enables us to retrieve the unique joint distribution determined by the conditionals We can not write down an arbitrary set of conditionals and assert that they determine the joint distribution Example YiiY2 N N0 0 011Y27 7i Y2iY1 N N o iyigy 7 Thus7 is linear in topic in the hierarchical modeling chapter a0 alEDZ Slide 9 Slide 10 then7 is linear in However it must also be the case that Brookls Lemma 30 131EiY13l7 This can not be in general Therefore7 there is no joint distribution py1 7 7y Also7 10117 7gm might be improper even if the conditionals are proper Example Min17112 olt exp12y1 7 y22 1011 iy2is Ny271 andpy2iy1 N Ny171 But 1011712 is improper 10y17iy2n7yn Py27iy1oyya7yn py107 i112 7ynpy207 iylmy 7372 X Milm lit107 71727143 pltym07iy107quot397yn710 here yo 1107 73770 is any xed point in the support ofp The 109107 71770 joint distribution is determined up to a proportionality constant by the conditionals Slide 11 Slide 12 De nitions Markov Random Field MRF We specify a set of full conditional distributions for the X such that Hyde77139 i Wall77139 6 5i The notion of using local speci cation to determine a joint distribution is refereed to as a MRF Clique A clique is a set of cells such that each element is a neighbor of every other element We use notation i N j if i is a neighbor of j and j is a neighbor of 139 Potential A potential of order 107 it is a function of k arguments that is exchangeable in these arguments The arguments of the potential would be the Values taken by Variables associated with the cells for a clique of size k Example for k 2 we have if i andj are a clique of size 2 This is a potential of order 2 Gibbs distribution 10117 7y is a Gibbs distribution if it is a function of the X only through potentials on cliques py177yn0ltexp lt7 2 kya77yakgt k aEMk 4506 is a potential of order k7 Mk is the collection of all subsets of size k7 a indexes this set7 and y gt 0 is a scale parameter HammersleyeCli ord Theorem If we have a MRI7 7 ie if the conditional de nes a unique joint Slide 13 Slide 14 distribution then this joint distribution is a Gibbs distribution For continuous data on R7 a common choice for joint distribution is 1 Mil17 71 X exP yj21l N J 12739 We will study next this type of distributions7 which are Gibbs distributions on potential of order 1 and 27 and then yilyjyj i 1W yjmin2mi 7651 where mi is the number of neighbors of i Slide 15 Slide 16 CAR models I Conditionally autoregressive models CAR The are widely used in MCMC methods for tting certain classes of hierarchical spatial models As we will see later The Gaussian auteuo39r mal case K yjyj iN NZbijyj77i2 739 i 17 77L These full conditionals are compatible7 through Brooks7 lemma we obtain 10117 39 7371 0lt exp12yTD 11 BY where B Inf and D is diagonal with D 73 For Y to be normal we need rst to prove the symmetry of 2y 17 B 1D The simple resulting conditions are gm 2 2 Ti 7 for all iyj Thus7 B is not symmetric in this setting Suppose we set 1715 wijwi7 and 73 721114 Then7 the condition is satisfied7 and we have that Y7 py177yn x exp71272yTDw 7 Wy7 1 where Du is diagonal with Dw i w Slide 17 Slide 18 A second problem is that Du 7 W1 0 then E1 is singular and 2y does not exits thus7 this distribution is improper we can rewrite 1 as follows Hwymilk X 6XP 1272 yj2 7 The impropriety of p is clear7 since we can add any constant to all the Yi and the distribution is una ected The K are not centered A constraint such as Z x 7 o 1 would solve the problem This is the IAR model7 intrinsically autoregressive model A joint distribution that is improper but has all full conditionals proper The impropriety can be remedied in an obvious way Rede ne 71 2y Du 7 pW and choose p to make Egl nonsingular This is guaranteed if p 6 117 1 17 where A1 lt lt An are the ordered eigenvalues of D12WD1 The bounds can be simplified7 by replacing W by W Diag1wiW then7 1 2y 7 Dw1 7 aw where Du is diagonal If a lt 17 then I 7 DalNV is nonsingular Slide 19 Slide 20 Interpretation of the p parameter The additional parameter p7 when it is zero7 the Yi become independent p should not be interpreted as a parameter that explains the spatial dependency For instance7 in a simulation study when p 871 17 when p 97 I 5 But7 an improper choice p 1 may enable wider scope for posterior spatial patterns7 and might be preferable We may write the CAR model as YBY5 I 7 BY 5 If 101 is proper7 then Y N N0 I 7 B 1D then 5 N N07 DU 7 By7 ie the components of 5 are not independent Also COV67Y D Slide 21 Slide 22 The noniGaussian CAR In many cases eg binary data the normality assumption might not be appropriate We can start with any exponential family model 10yilyjyj 7 139 0lt EXPWMM 7 X9i 6139 is a canonical link7 eg 6i ibijyj X is some speci c function7 and it is a noninegative dispersion parameter If you write 6139 113 Zi ibijyjy for some coVariates my then we have 10yilyjyj i 0lt EXPWT 11 nil739 z Autologistic model When K are binary7 the previous model gives us the autologistic model and PY 1 WHY m7 waijyjy i where wij 17 ifi N j7 and zero otherwise The joint distribution by Brook s lemma is 101117 71171 0lt 6XP YZyii 1b Zwijyiyj i 5 Slide 23 Slide 24 SAR models I Simultaneous autoregressiVe models SAR Remember that we may write the CAR model as YBY5 I 7 BY 5 Suppose that instead of letting Y induce the distribution of 5 We let 5 induce a distribution Y Suppose the 5 N N0 D where D is diagonal 02 Now Yz Zbijyj 5h 739 Therefore if I 7 B is full rank YN N07 1 B 151 B 1 Also COVE Y DU 7 B 1 If D 021 then Y N N0021 7 B 1I 7 B 1 Slide 25 Slide 26 Common choices for B I B pW where W is called contiguity matrix W has entries 1 or 0 according to whether or not i and j are neighbors with w 0 Here p is called a spatial autoregression parameter We need to impose p 6 lAl lAn where A are ordered eigenvalues of W To get I 7 pW nonsingular Alternatively W can be replaced by W and replace B pW then M lt l I With point7referenced data B is taken to be pW where W is the matrix of inter7point distances A SAR model is usually used in a regression context ie the residuals U Y 7 X are assumed to follow a SAR model rather than Y itself Then YBYI7BX e SAR models are well suited to maximum likelihood estimation but not at all for MCMC tting of Bayesian models Because it is dif cult to introduce SAR random e ects in the CAR framework is easy because of the hierarchical conditional representation Slide 27 Slide 28 CAR versus SAR I The CAR and SAR models are equivalent only if I 7 B 1D 1 7 B1D1 7 EH where the tilde indicates the SAR matrices Any SAR model can be represented as a CAR model because D is diagonal But the converse is NOT TRUE Also7 correlation among pairs can switch in nonintuitiVe ways7 by Varying the p parameter Example7 working with the adjacency relationships generated by the lower 48 contiguous US states7 Wall 2004 nds that when p 49 in proper CAR model7 and corrAlabama7Flo39rida 27 and corralabama705039rgia 16 But when p 9757 we inst ad get corrAlal7ama7Flo39rida 657 and corralal7ama7 Georgia 67 Slide 29 STAR models I SAR models have been extended to handle spatiotemporal data The measurements Y are spatially soociated at each xed t but7 we might want to associate7 say Ygg with Y and Ygg De ne WS that provides a spatial contiguity matrix for the Y s And let WT de ne a temporal contiguity matrix for the Y7s We can de ne in our SAR model B PsWs zWT We can also introduce W3 WT to incorporate interaction between space and time This models are referred to as spatiotemporal autoregressiVe STAR models Slide 31 Slide 30 ST 790 M Fall 2004 SPATIAL STATISTICS AND DATA ASSIMILATION Montserrat Fuentes Statistics Department N CSU fuentes statncsuedu httpwwwstatncsueduNfuentes Spatialtemporal modeling I I General modeling I PointeleVel modeling With continuous time I Separable and nonseparable spatialetemporal models I Dynamic spatioetemporal models Slide 1 Slide 2 Due to the proliferation of data sets that are both spatially and temporally indexed7 spatialetemporal modeling has received an dramatically increased attention in the last few years We Will consider the case of pointeleVel spatial data and areal data General modeling formulation I Consider the case of pointereferenced data where time is discretized We can look at spacetime indexed data Yac7t in two ways I Writing Yac7t Yxt as a spatially Varying time series model I Write Yac7t as a temporally Varying spatial model We can create the singular Value decomposition or EOF already discussed Assume T gt 7L7 where T is the number of observations oVer time7 and 7L the number of location sites7 we can write T Y UDV ZdZUZVZT 1 where U is 7L X 7L orthogonal matrix with columns U17 each Slide 3 Slide 4 U1 111517 7U5L7 and V is a T X T orthogonal matrix with columns V17 each V 1111 739L1TT7 D is a 7L X T matrix of the form A7 0T where A is T X T diagonal with diagonal entries DZ Assume dl s are arranged in decreasing order of their absolute values U1 WT is referred to as the lth empirical orthogonal function We can write Y5it ngu s m Note that T YYT Z delUlT 11 The rst and second maybe we need more than that EOFs provide most of the information of the spatial structure EOFs are a useful exploratory tool to learn about the spatial structure of the data7 but for full inference we need a full spatiotemporal model Slide 5 Slide 6 Model formulation Denote ls7 t measurement at location 5 and time t mt mt em M is the mean structure and 5 the residual If we write M577 5779196779 where ac are known coVariates7 and 3 are spatiotemporally Varying coef cients The error term 5 can be rewritten as w5t 557t where e is a Gaussian white noise process and w is a meanizero spatiotemporal process Then7 this would be a hierarchical model with a conditionally independent rst stage given M and w The distribution Stage 1 Y Mw N NormalA wye instead of a normal it could be a member of the exponential family7 with a link function g that it is g7757 t M5 7 w57t Slide 7 Slide 8 Spatiotemporal richness is captured by extending the model for 55 t Assuming that time is discretized We could have the 3 next di erent models that avoid speci cation of spacetime interactions model 1 est at ws est model 2 est ast 55t model 3 e5t wts 55t 55t are iid normals The rst one provides an additive form in temporal and spatial e ects The second provides temporal evolution at each spatial location The third provides a spatial structure at each time If t is continuous we could model 010 in the previous model as onedimensional stationary Gaussian process 04 041 70rtm N N070 E with 075 Min M 45 Alternatively we could model in a discrete time setting 047 W0 1 W where 77 are iid if ipi lt 1 we have an stationary AR model but p is also allowed to be 1 that leads to an improper prior Regarding the as t components we could model them as Mt wt 7 1 W 77t are iid The wts are modeled as independent spatial processes Slide 9 Slide 10 For areal data we write Ya m an DOW Mi amt and 5i wit 5it wit are spatiotemporal random e ects with a CAR speci cation Areal unit data are often noniGaussian eg sparse counts we could view this model as a hierarchical model and replace the rst stage Gaussian speci cation with a for eg Poisson Assuming Yst Mat 55t With 5 modeled using models 1 2 andor 3 We obtain the likelihood function fYipa39ramete39rs which is normal We can also obtain the predictive posterior distribution at some location acmto given the data Y fYsotoiY fYsotoiYparameterspparametersiY Slide 11 Slide 12 Pointlevel modeling with continuous timel Suppose that 5 6 R2 and t 6 R we seek to de ne a spatiotemporal process ls7 t We need to specify a valid spatiotemporal covariance It is not reasonable to use one of the known models exponential7 matern7 in R37 because distance in space has nothing to do with 7 distance77 in time A possibility is to change the temporal scale by multiplying time by 007 so distances in space and time can be used together in a Sedimensional version of some of the known covariance models7 eg Co We de ne the covariance CovZ57t7Zs7t 00 57at 7 5 oat0 Separable models A spatialitemporal eld Z57t7 where 5 represent space and t time7 is separable if CovZ57t7Zs7t 01575C2 779 for some spatial covariance 01 and temporal covariance Cg A class of nonseparable spatialitemporal models were proposed by Cressie and Huang JASA7 997 Gneiting JASA7 027 Stein 2003 and Chen7 Fuentes and Davis 2004 Slide 13 Slide 14 A separable covariance g a g e o i g 0M i gt 2 Maw 39 8 WWW i m 150 0 09 quot vvo o ov0 o gamaww a v v c o o o o o o 4644 e 3123quot A nonseparable covariance a 1 N 8 nq39o o m C 2 9 W39avae g 3 lg omw 7 ii o z39o39oip q r39vt39q39o39ovwu t o a ask 2Qoo o an xaxgww 3W 3 v us aw quot quot3v39v oquot go a 4 Wm 4 WW 1 A a Slide 15 Slide 16 Figure 2 Figure 1 The spectral density7 f 7 which is the Fourier transform of the spatialitemporal covariance function mm W A exp7inX 7 iTtCx7tdxdt7 1 and the corresponding covariance function is given by Cx7t expiLJTX i7tfw 7mm 2 Rd 1quot For example7 the Matern spatial spectral density is given by M W WT and its corresponding Matern spatial covariance function is 427 2 Amy 02 where K alxl is a modi ed Bessel function Cx Xi Kvale7 We propose the following spatialitemporal spectral density Chen7 Fuentes and Davis7 2004 that has a separable model as a particular case The spectral density f of Z changes with space and time to explain how the spatial temporal dependency varies on the domain of interest Locally in a neighborhood of Si x13727 to allow lack of stationarity we propose the following parametric model for f7 JUL177 7iltag flwlZ agl z Elwl2727 2 3 if we have stationarity the parameters of the model do not change with location eg it is simply my rather than 011 we have my 01139 and 3139 are positive7 vi gt 1421 and 5 6 071 The parameter 011 explains the rate of decay of the spatial correlation For the temporal correlation7 the rate of decay is explained by the parameter 31717 7139 is a scale parameter Slide 17 Slide 18 When 5 17 the previous equation can be written as fsw77 Mafia 193le aw 4272 with lwl27 73 Therefore the corresponding spatialitemporal covariance is separable7 both spatial component and temporal component are Matern type covariances When 5 17 7139 an 3i d 1 and 1 327 a contour plot of the corresponding separable spatialitemporal covariance is given here There are sharp ridges erlsllnn 1 mm Slide 19 Figure 3 Slide 20 When 5 07 fab 77 7401319er 193le Wilda This is an extension of traditional Matern spectral density The parameter 01171 explains the rate of decay of the spatial correlation For the temporal correlation7 the rate of decay is explained by the parameter 31717 7139 is a scale parameter The corresponding spatialitemporal covariance is a 3d Matern type covariance with an extra parameter7 which can be considered a conversion factor between the units in the space and time domains Whene 0 mix ml I 39r is the range I 02 is the sill I 1 gt 0 measures the smoothness of Z I p7 which is new7 is a scale factor to take into account the change of units between the spatial and temporal domains I This is a 11 1 Matern type covariance7 but it takes into account the change of units between space domain and temporal domain When507 yaz d corresponding separable spatialitemporal covariance is given here It 1 and 1 327 a contour plot of does not have ridges Slide 21 Slide 22 ensllnn n 5mm Spacetime covariances for di erent values of 57 Figure 4 Slide 23 Slide 24 mm epswm mm 2 mm x pawn a 5qu 5 ln summary7 the new class spectral density in 3 is nonseparable for 0 S 5 lt 17 and separable for 5 1 Therefore7 the parameter 5 plays a role for separability lt controls the interaction between the spatial component and the temporal component When 5 equals 0 and 17 there are exact forms for the corresponding spatialitemporal covairances Otherwise7 the corresponding spatialitemporal covariance has to be computed numerically Figure 5 Slide 25 Slide 26 Cressie and Huang propose a generic approach to developing parametric models for spatialitemporal processes The method relies heavily on spectral representations for the theoretical spacetime covariance structure7 and generalizes the results of Matern for pure spatial processes In essence7 Matern constructs a number of parametric families for spatial processes by direct inversion of spectral densities Cressie and Huang show that the same ideas can be used to construct families of spatialitemporal covariances First7 Cressie and Huang represent the stationary spatialitemporal covariance Ch7u as 00w eilthTmgtgltw mm lt4 where Chyu is a stationary spatialitemporal covariance function in which h represents a drdimensional spatial vector and u is a scalar time component The function 57012777 where w is drdimensional and 739 is scalar7 is the spectral density of the covariance function C The function g may be written as a scalar Fourier transform in 77 gw77 eiimhw7udu with inverse hw7u ei gw77d7 5 Putting 4 and 5 together7 Chyu eih hw7udw 6 The next step is to write hwy MudWyn 7 Slide 27 Slide 28 where kw is the spectral density of a pure spatial process and pw7u for each w is a valid temporal autocorrelation function in u Cressie and Huang remark that any smooth spaceetime covariance function can be written in the form 6 and 77 and they also impose the conditions C1 For each w pu7 is a continuous temporal autocorrelation function7 fpw7udu lt 00 and kw gt 0 C2 lt 00 Under those conditions7 the generic formula for C h7 u becomes Chyu eihT kwpwudu 8 When pw7u is independent of w 8 reduces again to a separable model Cressie et al developed seven special cases of For example7 2 2 pm exp 1W exp M a gt 0 4 Jew exp co gt 0 which lead to 4am 7 Hth exp 76112 9 112 COd2 u2 co The condition 5 gt 0 is needed to ensure the condition C1 is satis ed at w 07 but the limit of 9 as 6 gt 0 is also a valid spatialetemporal covariance function7 leading to the three parameter family 72 l72llhll2 a u2 1d2 EXP 7a2u2 1 I Chyu x Ch7u Slide 29 Slide 30 Cressie and Huang s approach is novel and powerful but depends on et al and provides a very general class of valid spatialetemporal covariance models The key result can be formulated as follows Let 1117f7 t 2 07 be a completely monotone functiona7 and let 7507 t 2 07 be a positive function with a completely monotone derivative Then is a spacetime covariance function7 where h 6 1 1 represents a drdimensional spatial vector and u 6 r is a scalar time component o Fourier transform pairs in I d Gneiting takes the approach of Cressie quot 2 M CW lul2d2w lt ltlul2gt 10 aA continuous function 1t de ned for t 2 0 is said to be completely monotone if it possesses derivatives wquot of all orders and 71quot quott 2 0 for t gt 0 and For example7 putting 111t exp7ct7 and t 11730 DB in 10 leads to 72 Cllhll27 00quot alulza 1W exp lt alum 1w l where h7 u 6 L1 X r The product with the purely temporal covariance function alulh Dd u 6 r7 then gives the class 7 7 CW 27 CW alnlza Dew2 exp lt alnlza rye v where hyu 6 L1 X r a and c are nonnegative scaling parameters of time and space7 respectively the smoothness parameters a and 7 take values in 071 3 6 0717 l 2 0 and 02 gt 0 A separable covariance function is obtained when 3 0 Slide 31 Slide 32 Dynamic spatiotemporal modelsl Dynamic linear models are often referred to as stateispace models in the time series literature Let K be a m X 1 Vector of observations at time t 6 is called the sate Vector Y is related to 6 through the MEASUREMENT EQUATION In general we do not observe 6 We haVe the following framework MEASUREMENT EQUATION K F t 5 where 5 N N07 TRANSITION EQUATION at L6H m where at N N07 Ft and Gt are m X p andp X p matrices7 they are called the system matrices and they might change oVer time Slide 33 Slide 34 We can compute the association accross time COL67671 Gtvar6t71 and 0074mm Fiatvarw m Formulation for spatiotemporal models WW M5773 55773 where 557t N N0702 M5773 74577956773 EM a MM t til 7h where m N Np07 E and MW MW 1 ws Slide 35 Slide 36 We introduce a model of linear coregionalization for the pimultivariate process usyt7 ie w57t A 1157t with 1157t 111577f7 739up577f The 1 are serially independent Gaussian processes with correlation function p1 Thus7 the crossicoVariance of w is a linear combination of the ms The pl s are usually assumed to be separable spacetime covariances Thus7 p Cwltwltstgtwltstgtgt Ems 7 sat 7 tTj 71 where ajai7 and ak the 10th column of A A Bayesian hierarchical model may be complete by prior speci cation7 for example 30 has a normal prior7 and 370 E 0 Elm Eu have IW inverse VVishart7 02 has inVerse Gamma Instead of Eu having an IVV7 we could use the coregionalization model7 with T having an IW prior Slide 37 Slide 38 ST 790 M Fall 2004 SPATIAL STATISTICS AND DATA ASSIMILATION Montserrat Fuentes Statistics Department N CS U fuentes st atncsuedu httpWWWstatncsuedufuentes Basics of Bayesian inferencel o Bayes theorem 0 Bayesian inference Point estimation Interval estimation Hypothesis testing and model choice 0 Bayes computation Revisit Gibbs and Metropolis Hasting Slice sampling Convergence diagnosis variance estimation Bayes Theorem I We model the observed data and unknown parameters as random variables The Bayes theorem allows us to combine external knowledge and complex data models in estimating the unknowns We specify the distributional model LIKELIHOOD f W y is the observed data and 6 are the unknown parameters We assume a PRIOR distribution for 6 WWW where is a hyperparameter Inference 0n 6 is based on its POSTERIOR distribution py9M plty6w pltywgtwltew fpty WdQ fpy67r6ikd6 Since might be unknown hyperprior we need an additional step 299iy fpyi97r9 ffpyi97r9 3 Alternatively we can replace A by an estimated value of A A Which could be the maximazer 0f pylA Inference based on this estimated posterior p6ly is referred to as EMPIRICAL BAYES analysis Forms of 7r and h known as conjugate priors enable analytic evaluation of these integrals But in the presence of nuisance parameters unknown variable some intractable integrations still remain Here the need of of the recently developed Markov chain Monte Carlo MCMC integration methods Bayesian inference I Point estimation Estimates of 6 The mean of the posterior The median of the posterior The mode of the posterior A A A 6 2299iy supepWiy The last one is the easiest to compute since it does not required any integration If the posterior exits under a at prior 196 1 then the posterior mode is just the MLE of 6 The posterior median is often the best point estimate because it represents better the center of a non symmetric distribution Interval estimation The posterior allows inference about not only the median but any QUANTILE We can obtain a Bayesian con dence interval that we generally call CREDIBLE INTERVAL The probability that 8 lies in qL qU is 1 04 Where q and qU satisfy qL p6lyd6 a2 and Oop9lyd9 1 042 CIU Thus pqLlt6ltqU1 a The frequentist CI does not satisfy that condition Instead it gives an interval such that the probability that the random interval covers the TRUE parameter is 1 or ie P6 E ab161 a The interval pqL lt 6 lt qU 1 or is the equal tail credible set For symmetric unimodal posteriors this interval Will be symmetric about this mode It Will be also Optimal in the sense that it Will have shortest length among sets C satisfying 1 a PCly Amanda 10 For posteriors that are not symmetric and unimodal a better shorter interval can be obtained by taking only values of 6 that have posterior greater than some cutoff The cutoff is as large as possible While C satis es the previous condition This is called the HIGHEST POSTERIOR DENSITY HPD con dence set More dif cult to compute but always of optimal length 11 Hypothesis testing and model choicel Hypothesis testing is not very straightforward There is not agreement between Bayesian about how to approach the problem Deviance Information Criterion DIC has gained popularity available in WinBUGS We Will discuss next BF and BIC for model choice Bayes factor for model choice We replace the two hypotheses H0 and H1 by two candidate parametric models M1 and M2 With parameters 61 and 62 respectively The priors are 12 Thus the marginals of Y mm ymMimwadei 13 Bayes theorem can be applied to obtain the posterior of PM1 y and PM2iy1 PM1 y The quantity used to summarize these results is BAYES FACTOR BF the ratio of the posterior odds of M1 to the prior odds of M1 PM1yM2iy BF pltM139gtpltM239gt PQJ Mi 29y M2 If both models have same prior then BF is the posterior odds of M1 14 In models such that 61 62 6 and both hypotheses are simple M1 6 61 and M2 6 62 Then 7r 6 is a point mass at 673 And we have Kyle pyl91 which is the LIKELIHOOD RATIO between the tWO models BF 15 Bayesian Information Criterion BIC BIC also known as SchwarZ Criterion SchwarZ showed that for nonhierarchical models and large sample sizes n BIC approximates 2 log BF BIC is a penalized likelihood ratio model choice criterion if we think of M2 as the full model and M1 as the reduced model ABIC W 192 p1logn where 197 is the number of parameters in model M7 and supM1fyl9 supM2fyl9 the usual likelihood ratio test statistic W 2log 16 An alternative to BIC is the Akaike Information Criterion AIC W 2092 p1 This is also a penalized likelihood ration model choice criteria The more serious limitation in using BF or their approximations BIC AIC are that they are not appropriate under noninformative priors If 7r 6 is improper then is as well A solution is to use DIC 17 DIC Spiegelhalter et al propose a generalization of AIC Whose asymptotic justi cation is not appropriate for hierarchical models It is based on the posterior of the deviance DW2kefW2kehw h some function of the data alone The t is summarized With the mean of the posterior of D D Eellel and the effective number of parameters pp In Gaussian models pp is the expected deviance minus D evaluated at the posterior expectations PD Eellel DE6lel D D6 18 The DIC is then de ned as DICDpD 2D D6 smaller DIC indicate a better tting model DIC has no absolute scale only DIFFERENCES in DIC across models are meaningful Identi cation of What is a SIGNIFICANT difference n DIC is dif cult delta method approximation to the variance of BIG have not been very successful In practice we recompute DIC feW times using different random number seeds It is up to the user to think carefully about Which parameters ought to be in focus before using DIC ie the likelihood function used could be marginal only for the parameters of interest 19 Bayesian computation I The most popular computing tools in Bayesian practice today are Markov chain Monte Carlo MCMC methods Because their ability to enable inference from posterior distributions of large problems by reducing the problem to one of RECURSIVELY solving a series of lower dimensional problems Like traditional Monte Carlo MCMC works by producing not a closed form for the posterior by a SAMPLE of values from this distribution A histogram based on such a sample is typically suf cient for reliable inference However unlike traditional MC methods MCMC algorithms produce CORRELATED samples from this posteriors since they are recursive draws from a particular Markov chain the 2O stationary distribution of Which is the same as the posterior The two most popular are Gibbs sampler and Metropolis Hastings algorithm 21 Algorithm to simulate from the posterior distribution The Gibbs sampler Gibbs Sampling We describe now another algorithm to ef ciently generate the simulations from the posterior of a vector parameter 5 The most convenient methods are of Markov chain Monte Carlo MCMC type of which Gibbs sampling is one of the most widely used Gibbs sampling Start with an arbitrary initial value for the vector parameter 150 of 20 Generate a new value of 1 denoted 9 from the conditional distribution of 11 given 132 g0 CI 510 Then generate a new value of 2 denoted 1 from the conditional distribution of 12 given 131 9 133 QED CI 510 Continue up to the generation of 511 from the conditional distribution of CI given 22 11 9 1714 2131 This completes one iteration of the sampler Then starting from the new vector 151 return to 151 and repeat the Whole process to generate 152 Thus Gibbs sampling consists purely in sampling from conditional distributions because instead of updating lt15 m block it is more computationally ef cient to divide qb into components and then update these components one by one 23 The Metropolis Hastings algorithm Gibbs sampler is easy to implement but requires the ability to sample from each of the full conditional distributions The NIH algorithm is a rejection algorithm that attacks the problem of not being able to sample from a distribution We start with an arbitrary Xm and generate a new trial value x from some distribution qx x0 which depends on x0Then form the ratio qX0 X lX qX X0lXO39 where lx fylx7rx the likelihood times the prior for E If or Z 1 then we accept x in other words set X0 x If 04 lt 1 we perform an independent random drawing with probability oz accept x and set X0 x otherwise reject x and set X0 XO 24 Theoretically we can choose any proposal q but in practice only a good choice Will work The usual approach is QX lx0 NX lx0 i3 Where i might be an empirical estimate of the true posterior variance derived from a preliminary sampling run Accepting all candidates is usually the result of an overly narrovv proposal The ideal is to choose 62 such that around 50 of the candidates are accepted We can also use qX lXO qX this is not symmetric in its arguments 25 Metropolis algorithm If we replace the acceptance ratio with o Xlx0 Where q now has to be symmetric in its arguments ie qx lx0 qXO lxm we have the Metropolis algorithm 26 Once we have the simulated values from the posterior we usually ignore the ones in the burn in period and we can construct a histogram with the rest and also obtain the sample mean to estimate the expected value of the posterior In practice we may actually run m parallel chains instead of only 1 27 Slice sampling This is an alternative to M H We seek to sample 6 W 2 mm mode Where h is known We add an AUXILIARY variable U such that Ul6 N Um39f0 h6 Then the joint of 6 and U is p6uolt1IU lt h9 I indicator function We run a Gibbs sampler from U 6 followed by BlU at each iteration we then can obtain samples from 196 u and then get the marginal of 6 28 Sampling from 6W requires a draw from a uniform distribution fro 8 over 5U a U lt mm 29 If h6 h16h26 Where hl is a standard density and hg nonstandard Then we introduce U such that UlB N UOh26 Now 299 0C h19 1U lt h29 I indicator function To sample from U 6 is routine to sample from BlU we now draW 6 from hl and retain it only if 6 is such that U lt hg 3O


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Anthony Lee UC Santa Barbara

"I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.