Note 3 for GEOS 585A with Professor Meko at UA
Note 3 for GEOS 585A with Professor Meko at UA
Popular in Course
Popular in Department
This 8 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Arizona taught by a professor in Fall. Since its upload, it has received 28 views.
Reviews for Note 3 for GEOS 585A with Professor Meko at UA
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
3 Autocorrelation Autocorrelation refers to the correlation of a time series with its own past and future values Autocorrelation is also sometimes called lagged correlation or serial correlation which refers to the correlation between members of a series of numbers arranged in time Positive autocorrelation might be considered a speci c form of persistence a tendency for a system to remain in the same state from one observation to the next For example the likelihood of tomorrow being rainy is greater if today is rainy than if today is dry Geophysical time series are frequently autocorrelated because of inertia or carryover processes in the physical system For example the slowly evolving and moving low pressure systems in the atmosphere might impart persistence to daily rainfall Or the slow drainage of groundwater reserves might impart correlation to successive annual ows of a river Or stored photosynthates might impart correlation to successive annual values of treering indices Autocorrelation complicates the application of statistical tests by reducing the effective sample size Autocorrelation can also complicate the identi cation of signi cant covariance or correlation between time series eg precipitation with a treering series Autocorrelation implies that a time series is predictable probabilistically as future values are correlated with current and past values Three tools for assessing the autocorrelation of a time series are l the time series plot 2 the lagged scatterplot and 3 the autocorrelation function 31 Time series plot Positively autocorrelated series are sometimes referred to as persistent because positive departures from the mean tend to be followed by positive depatures from the mean and negative departures from the mean tend to be followed by negative departures Figure 31 In constrast negative autocorrelation is characterized by a tendency for positive departures to follow negative departures and vice versa Positive autocorrelation might show up in a time series plot as unusually long runs or stretches of several consecutive observations above or below the mean Negative autocorrelation might show up as an unusually low incidence of such runs Because the departures for computing autocorrelation are computed relative the mean a horizontal line plotted at the sample mean is useful in evaluating autocorrelation with the time series plot Visual assessment of autocorrelation from the time series plot is subjective and depends considerably on experience Statistical tests based on the observed number of runs above and below the mean are available eg Draper and Smith 1981 though none are covered in this course It is a good idea however to look at the time series plot as a rst step in analysis of persistence If nothing else this inspection might show that the persistence is much more prevalent in some parts of the series than in others 32 Lagged scatterplot The simplest graphical summary of autocorrelation in a time series is the lagged scatterplot which is a scatterplot of the time series against itself offset in time by one to several time steps Figure 32 Let the time series of length Nbe xx 139 L N The lagged scatterplot for lag k is a scatterplot of the last N 7 k observations against the rst N 7 k observations For example for lagl observations xi gy xN are plotted against observations x1x27 xNil A random scattering of points in the lagged scatterplot indicates a lack of autocorrelation Such a series is also sometimes called random meaning that the value at time t is independent Notes73 GEOS 585A Spring 2009 1 of the value at other times Alignment from lower left to upper right in the lagged scatterplot indicates positive autocorrelation Alignment from upper le to lower right indicates negative autocorrelation 2 Treering Index MEAF 11 ill ill 1 yl hill W WNW U iyl index us 1 1 1 1 1 1 1 1 1 1 1 19nd 1510 1520 1930 1540 1350 1960 1970 19m 1990 2000 Year Figure 31 Time series plot illustrating signatures orpersistence Tendency ror highs to rollow highs or lows to rollow lows circled segtnenm characterize series with persistence or positive autocorrelation rU3SN 101551119 IUl3N1UEf D l9 u a quots N 15 3 A iv 39 1 1 g quot o as u xx quot o 390 0 0 05 1 15 xl1 xii2 u19 N 195 tu23 N 1n lm 9 o 0 15 15 at 3 13 a O o 9quot o t 0 o o 3 0 oa39ufgf 3 05 33 3 J s o o p u 05 1 139sr 05 1 15 gtltt73 x04 Figure 32 Lagged scatterplots ortreering series MEAF These are sca elplom orthe series in Figure 31 with itselr onset by 1 2 3 and 4 years Annotated above a plotis the correlation coelncient the sample size and the threshold level orcorrelation needed to rejectthe null hypothesis orzero population correlation with 95 percent signincance 011105 The threshold is exceeded at lags 12 and 4 butnot at lag 3 At an onset or 3 years the jumposiu39on ofhighrgrowth 1999 with lowrgrowth 2002 exerm high in uence point in red recmngle Notes GEOS 585A Spring 2009 An attribute ofthe lagged scatterplot is that it can display autoconela on regardless ofthe form ofthe dependence on past values An assump on oflinear dependence is not necessary An organized curvature in the pattern of dots might suggest nonlinear dependence between time separated values Such nonlinear dependence might not be effectively summarized by other methods eg the autoconela on function ac which is describedlater Another attribute is that the lagged scatterplot can show if the autoconela on is driven by one or more outliers in the data This again would not be evident from the acf Fitted line A straight line can be t to the points in a lagged scatterplot to facilitate evaluation linearity and strength ofrela onship of current with past values A series oflagged scatterplots at increasing lags eg k 12 it helps in assessing whether dependence is restrictedto one or morelags Correlation mefflclznt and 95 Slgmflcancz level The correlation coef cient for the scatterplot summarizes the strength of the linear rela onship between present and past values It is helpful to compare the computed correlation coef cient with critical level of correlation required to reject the null hypothesis that the sample comes from a population with zero correlation at the indicatedlag Lfa time series is completely random and the sample size is large the laggedcorrelation coef cient is approximately normally distributed with mean 0 and variance 1NChat eld 2004 It follows that the critical level of correlation for 95 signi cance 005 oelevel ier 0 2 AW where Nis the sample size Accordingly the requiredlevel of correlation for signi cance becomes very small at large sample size Figure 33 D A 33 Autocomela tion function u 3 comelogram An important guide to the persistence in s n 2 a time series is given by the series of quanti es called the sample autocorrelation n i coef cients which measure the correlation between observations at different times n The set of autocorrelation coef cients in7 101 103 10 arranged as a function of separation in time Sample size 3 ubseivatiuust is the sample autocorrelation function orthe Figure 33 Critical level of currelatiun coefficient 95 percent significance as a functiun of sample size The critical level drupsfmm r020 fur a sample size 0139 100 to r002 fur a sample size of 10000 acf An analogy can be drawn between the autoconelaton coef cienet and the product moment correlation coef cient Assume N pairs of observations on two variables x and y The correlation coef cient betweenr and y is given by 1 where the summations are over the Nobservations A similar idea can be applied to time series for which successive observations are correlated Instead oftwo different time series the correlation is computed between one time series and the same series lagged by one or more time units For the rstorder autocorrelation the lag is one Notes GEOS 585A Spring 2009 3 time unit The firstorder autocorrelation coefficient is the simple correlation coefficient of the rst N 7 lobservations xt t 12N 7l and the next N 7 lobservations xt t 23N The correlation between xt and x 1 is given by N71 Xxx 31xm Em 7 1 r1 7 N71 t 2 12 N71 2 12 2 for 951 for x2 t1 t1 Where in is the mean of the first N 7 lobservations and fa is the mean of the last N 7 lobservations As the correlation coef cient given by 2 measures correlation between successive observations it is called the autocorrelation coefficient or serial correlation coef cient For N reasonably large the difference between the subperiod means in and fa can be ignored and r1 can be approximated by N lltx7xgtltx7xgt r 7 H 3 N where f Z xt is the overall mean t1 Equation 3 can be generalized to give the correlation between observations separated by k time steps rk 7 4 The quantity rk is called the autocorrelation coefficient at lag k The plot of the autocorrelation function as a function of lag is also called the correlograrn Link between aef and lagged scatterplot The correlation coefficients for the lagged scatterplots at lags k 1 2 8 are equivalent to the acf values at lags l 8 Link between aef and autocovariance function 10 Recall that the variance is the average squared departure from the mean By analogy the autocovariance of a time series is defined as the average product of departures at times t and HI 1 N71 Ck FZxt ifxxnk if Notes73 GEOS 585A Spring 2009 4 where ck is the autocovariance coef cient at lag k The autocovariance at lag zero co is the variance By combining equations 4 and5 the autocorrelation at lag k can be written in terms of the autocovariance rkCkCO 6 Alternative equation for autocovariance function Equation 5 is a biased though asymptotically unbiased estimator of the population covariance The acvf is sometimes computed with the alternative equation 1 Nrk ekN7kZxrfxtpf 7 t1 The acvf by 7 has a lower bias than the acvf by 5 but is conjectured to have a higher mean square error Jenkins and Watts 1968 chapter 5 34 Testing for randomness with the correlogram The rst question that can be answered with the correlogram is whether the series is random or not For a random series lagged values of the series are uncorrelated and we expect rk E 0 It can be shown that if x1 xN are independent and identically distributed random variables with arbitrary mean the expected value of rk is E n i 1N 8 the variance of rk is Varrk 1 N 9 and rk is asymptotically normally distributed under the assumption of weak stationarity The 95 con dence limits for the correlo gram can therefore be plotted at 71 N i 2 W and are often further approximated to 0 r 2 W Thus for example if a series has length 100 the approximate 95 con dence band is r 2 4100 r 020 Any given rk has a 5 chance of being outside the 95 con dence limits so that one value outside the limits might be expected in a correlo gram plotted out to lag 20 even if the time series is drawn from a random not autocorrelated population Factors that must be considered in judging whether a sample autocorrelation outside the con dence limits indicates an autocorrelated process or population are 1 how many lags are being examined 2 the magnitude of rk and 3 at what lag k the large coef cient occurs A very large rk is less likely to occur by chance than an rk barely outside the con dence bands And a large rk at a low lag eg k l is far more likely to represent persistence in most physical systems than an isolated large rk at some higher lag 35 Largelag standard error While the confidence bands described above are horizontal lines above and below zero on the correlogram the con dence bands you will see on the correlograms in the assignment script may appear to be narrowest at lag l and to widen slightly at higher lags That is because the con dence bands produced by the script are the socalled largelag standard errors of rk Anderson 1976 p 8 Successive values of rk can be highly correlated so that an individual rk might be large simply because the value at the next lower lag rk1 is large This interdependence makes it dif cult to assess just at how many lags the correlo gram is signi cant Notes73 GEOS 585A Spring 2009 5 The largelag standard error aoljusts for the interdependence The variance of rk with the adjustment is given by 1 K 2 Varrk 122r 10 ei where K lt k The square root ofthe variance quantity given by 10 is called the largzslag standard zrrm of rk Anderson 1975 p 8 Comparison of 10 with 9 shows that the adjustment is due to the summation term andthat the variance ofthe autocorrelation coef cient at any given lag depends on the sample size as well as on the estimated autocorrelation coef cients at shorter lags For example the variance ofthe lag3 autocorrelation coefficient Varr3 is greater than 1be an amount that depends on the autocorrelation coef cients at lags 1 andz Likewise the variance ofthe lag10 autocorrelation coef cient Varr3 depends on the autocorrelation coefficients at lags 19 Assessment ofthe signi cance oflagk autoconelation by the largelag standard error essentially assumes that the theoretical autoconelation has died out by lag k but does not assume that the lowerlag theoretical autoconelations are zero Box and Jenkins 1975 p 35 Thus the null hypothesis is NOT that the series is random as lowerlag autocorrelations in the generating process may be nonzero An example for a treenng index time series illustrates the slight difference between the confidence interval computed from the largelag standard error and computed by the rough approximation zsthereN is the sample size Figure34 The alternative con dence intervals differ because the null hypotheses differ Thus the autocorrelation at lag 5 say is judged significant under the null hypothesis that the series is random but is not judged signi cant ifthe theoretical autoconelation function is considered to not have died out until lag 5 95 cl mil o Liampo 702 will u m 15 Lag Figure 34 Sample autoconelation with 95 oonridence intervals for MIEAF freering index 190072007 Domed line 39 simple approximate con dence interval at zsqrtNwhrre N is the sample size Dashed line is largelag standard error 36 Hypothesis test on r1 The firstorder autoconelation coef cient is especially important because for physical systems dependence on past values is likely to be strongest for the most recent past The firstorder autoconelation coef cient r can be tested against the null hypothesis that the corresponding Notes GEOS 585A Spring 2009 o population value p1 0 The critical value of r1 for a given signi cance level eg 95 depends on whether the test is onetailed or twotailed For the onetailed hypothesis the alternative hypothesis is usually that the true firstorder autocorrelation is greater than zero H1 p gt 0 11 For the twotailed test the alternative hypothesis is that the true firstorder autocorrelation is different from zero with no speci cation of whether it is positive or negative H1 p i 0 12 Which alternative hypothesis to use depends on the problem If there is some reason to expect positive autocorrelation eg with tree rings from carryover food storage in trees the onesided test is best Otherwise the twosided test is best For the onesided test the World Meteorological Organization recommends that the 95 significance level for rlbe computed by 711645N 7 2 N71 13 r195 where N is the sample size More generally following Salas et al 1980 who refer to Andersen 1941 the probability limits on the correlo gram of an independent series are 711645N7k71 Nik rk 95 one sided 14 71i196N7k71 Nik rk 95 two sided where N is the sample size and k is the lag Equation 13 comes from substitution of lFl into equation 14 37 Effective Sample Size If a time series of length N is autocorrelated the number of independent observations is fewer than N Essentially the series is not random in time and the information in each observation is not totally separate from the information in other observations The reduction in number of independent observations has implications for hypothesis testing Some standard statistical tests that depend on the assumption of random samples can still be applied to a time series despite the autocorrelation in the series The way of circumventing the problem of autocorrelation is to adjust the sample size for autocorrelation The number of independent samples after adjustment is fewer than the number of observations of the series Below is an equation for computing socalled effective sample size or sample size adjusted for autocorrelation More on the adjustment can be found elsewhere WMO 196639 Dawdy and Matalas 1964 The equation was derived based on the assumption that the autocorrelation in the series represents firstorder autocorrelation dependence on lag1 only In other words the governing process is firstorder autoregressive or Markov Computation of the effective sample size requires only the sample size and rstorder sample autocorrelation coef cient The effective sample size is given by 15 Notes73 GEOS 585A Spring 2009 7 whereNis the sample sizeN is the effective samp1es size and r is the rstorder autoeone1ation coef cient The ratio 17 r 1ris a scaling factormultiplied by the original samp1e size to eompute the effective sample size For example an annual senes with a samp1e size 0f100 years and a rstorder autoeone1ation 0f050 has an adjusted sample size of N39100w 100 st 33years 105 15 The adjustment to effective sample size heeomes less important the 1oweithe autoeonelation but a rstorder autoeone1ation coef cient as smal1 as i1010 iesults in a scaling to ahoutso peieent 0fthe original samp1e size Figure 35 1 1 1 1 m n a 7 Anderson RL 1941 Distribution k ofthe senal eonelation coef cients 5 n a Annals ofMaLh Statistics v 8no 7 1 p 113 E n 4 e m Anderson 0 1975 Time senes n 2 7 analysis and foieeasu ng the Box Jenkins approach London Butterwonhs p 182 pp 0 D 2 D A D E D 8 1 FitstuteetAutoeuiteiatun Box GEP and Jenkins GM Figure 35 Scalingractur fur cumpu ng eirecu39ve 1975 Time senes analysis sample size mm nriginzl sample sizefnr mushquotg and comm Sm autucuneiamd time series Fur a given rstzurder ancimy Holden Day p 575 W chatiie1d c 2004 1he analysis of time senes an introduction sixth edition New yoik Chapman amp HallCRC Dawdy DR and Matalas Nc 1954 Statistical and probability analysis ofhydxologic data part 111 Analysis ofvariance eoyananee and time series in Ven Te chow ed Handbook of app1ied hydrology a eompendium ofwaterresources technology New York McGrawHill Book Company p 868890 Jenkins GM and Watts DG 195s Spectral analysis and its app1ieations Ho1denDay 525 p Salas JD Delleur Jw Yevjevich VM and Lane wL 1930 App1ied modeling of hydrologic time series Littleton Colorado Water Resources 1gtuh1ieations 434 pp World Meterorological Organization 1955 Technical Note No 79 climatic Change WMONo 19511100 Geneva 80 pp Notes GEOS 585A spiing 2009 s
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'