Class Note for STAT 635 at OSU 16
Class Note for STAT 635 at OSU 16
Popular in Course
Popular in Department
This 13 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Ohio State University taught by a professor in Fall. Since its upload, it has received 18 views.
Reviews for Class Note for STAT 635 at OSU 16
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 02/06/15
STATISTICS 635 SUMMER 2005 STAT635 LECTURE OUTLINE 2 Discovery consists of seeing what everybody has seen and thinking what nobody else has thought Jonathan Swift COVARIANCES AUTOCOVARIANCES AND AUTOCORRELATIONS REMINDER IfIEX2 lt oo EEO2 lt oo IEZ2 lt 00 and a b c are real constants then covaX bY 0 Z acovX Z bcovY Z De nition 1 Let Xi be a stationary time series With mean function uX The autocovariance function ACVF of Xi at lag h is 7X00 C0VXth7 Xt E iltXth MXXt MXii The autocorrelation function ACF of Xi at lag h is 7X00 7X0 Note They measure the amount of dependence between Xi and XtLh PXUL corrXth Xi SOME PROPERTIES D 7X0 VXt D yXUL 7X h for each h B When Xi is Gaussian the distribution is completely speci ed by uX and 7X ii pX0 1 D pXh pX h for each h D 1 S pXh S 1 for each h STATISTICAL ANALYSIS OF TIME SERIES 1 STATISTICS 635 SUMMER 2005 MORE EXAMPLES OF TIME SERIES MODELS THE RANDOM WALK PROCESS De nition 2 A random walk is a series de ned by the equation XtZ1Z2Zt 2512 7 7 Where Zt N WN0 02 A random walk is thus obtained by cumulatively summing iid random variables Randum WaIk gtz1z2 Z WNHZ quot MD was 45 I I I I I I I I I n 2n 4n ED an Inn Izn MEI mm mm 2am I Find D IEXt and VXt D If 5 gt t then covZSXt El yXt hi B Is the series Xi stationary B Is there any transformation that can change that STATISTICAL ANALYSIS OF TIME SERIES 2 STATISTICS 635 SUMMER 2005 FIRST ORDER MOVING AVERAGE OR MA1 PROCESS De nition 3 A series Xt is a first order moving average or MA1 process if Xi Zt6ZH t0i1i2 Where D Z N WN002 D 0 is a real valued constant It is easy to show that the autocovariance function ACVF is 02162 ifh0 7Xtht 020 ifh i1 0 if h gt 1 D Clearly an MA1 process is a stationary process Also the autocorrelation function ACF is I ifh0 pxth7tpxh 0102I ifhi1 0 if h gt 1 EXAMPLE OF AN MA1 PROCESS WITH 0 9 ACF n2 n2 as In I I I I I STATISTICAL ANALYSIS OF TIME SERIES 3 STATISTICS 635 SUMMER 2005 FIRST ORDER AUTOREGRESSIVE OR AR1 PROCESS De nition 4 A series Xt is a first order autoregressive or AR1 process if XI q XHZt t0i1i2 Where D Z N WN002 D lt 1 D Z is uncorrelated With X8 for each 5 lt t It is easy to show that the autocovariance function ACVF is 7Xha h012 D An AR1 process is a stationary process The autocorrelation function ACF of an AR1 is given by pxlthgt W h 0 i1 i2 EXAMPLE OF AN AR1 PROCESS WITH q 72 ACF 4n2 n2 as In I I I I I I I STATISTICAL ANALYSIS OF TIME SERIES 4 STATISTICS 635 SUMMER 2005 ESTIMATION OF THE MEAN OF A TIME SERIES B Let Xi be a stationary process with mean u and ACVF 7 and let X1X2 Xn be a sample from the process Xi D A natural estimator of u is the sample mean X i 1 ix 7 n t1 D The sample mean X is an unbiased estimator of u 1Ele M D X is not always a consistent estimator of u because X may not be close in probability to u as n gt 00 U More efficient estimator of u can be constructed for instance if one knows something about the ACVF ESTIMATION OF THE AUTOCOVARIANCE FUNCTION D A natural estimator of the ACVF 70L IE Xt uXth M is nlhl 1 s Vlhnlhl E XtlhlXXt X7 nlthltn tl Q at is an unbiased estimator of 70L when X is unbiased Q Unfortunately the matrix Fn j2j1 is not always nonnegative definite D The most common estimator of Wat is the sample autocovariance function 1 nilhl 7 7 W Z thi XgtltXt X n lt h lt n TL tl Q is a biased estimator of 70L Q The matrix Fn jZ71 is nonnegative definite Q MSEWUL is typically smaller than MSEWUL Q Normally gt 0 as h gt oo STATISTICAL ANALYSIS OF TIME SERIES 5 STATISTICS 635 SUMMER 2005 ESTIMATION OF THE AUTOCORRELATION FUNCTION ACF D The sample autocorrelation function is nlthltn D Heuristic Use and only if m S i and n 2 50 n D Use the fact that TL W m W to argue why large values of lead to unstable estimates of A TEST FOR IID NOISE USING THE SAMPLE ACF D Note For iid noise with nite variance we have for h y 0 pm N N lt0 D Steps of the diagnostic for iid noise Q Plot the lag h versus Q Draw two horizontal lines at ll96 C These two lines are drawn automatically in R Q You should have about 95 of the the values h 12 within the lines if the noise is indeed iid D Which of the following depicts an HD noise ssss 5x STATISTICAL ANALYSIS OF TIME SERIES 6 STATISTICS 635 SUMMER 2005 EXAMPLE OF R CODE FOR SIMULATING IID NOISE Simulate 144 IID NO1 random variables Xlt rnorm14401 Let s have a plot of 2 panels high by 1 Also make the teXt 07 times smaller parmfrowc21 ceXO7 Plot the resulting time series plotX Xlabquottimequot ylabquotIID noisequot typequotlquot Calculate and plot the ACF acf X IIDnaIse I I I I I 42410123 EI 2D 4D EEI Eu IEIEI IZEI MEI UmE Series x In 06 IIIII ACF 402 COMMENTS STATISTICAL ANALYSIS OF TIME SERIES 7 STATISTICS 635 SUMMER 2005 EXAMPLE OF R CODE FOR SIMULATING AN MA1 PROCESS Simulate an MA1 process of length 144 with theta 9 Xlt arimasimlist mac 9 n144 Let s have a plot of 2 panels high by 1 Also make the teXt 07 times smaller par mfrowc2 1 ceXO7 Plot the resulting time series plotX Xlabquottimequot ylabquotMA1 processquot typequotlquot Calculate and plot the ACF acf X MA1pracess 20 In a In 20 I I I I I I I I I D In 4D EEI an Inn IZEI MEI ume Series x In 06 IIIII ACF 402 COMMENTS STATISTICAL ANALYSIS OF TIME SERIES 8 STATISTICS 635 SUMMER 2005 EXAMPLE OF R CODE FOR SIMULATING AN AR1 PROCESS Simulate an AR1 process of length 144 with phi 072 Xlt arimasimlistarc 072 n144 Let s have a plot of 2 panels high by 1 Also make the teXt 07 times smaller par mfrowc2 1 ceXO7 Plot the resulting time series plotX Xlabquottimequot ylabquotAR1 processquot typequotlquot Calculate and plot the ACF acf X pracess I 410123 Am I I I I I I I I D In 4D EEI an Inn IZEI MEI ume Series x In 06 IIIII ACF COMMENTS STATISTICAL ANALYSIS OF TIME SERIES 9 STATISTICS 635 SUMMER 2005 MODELS WITH TREND AND SEASONAL COMPONENTS A GOOD PRACTICE IN TIME SERIES ANALYSIS D Step 1 Plot the data D Step 2 Investigate the existence of Q Apparent discontinuities such as sudden change in level or behavior C Could it be that the series needs to be broken into homogeneous segments C ls the variance increasing with time Q Outliers D Step 3 Perform any suitable transformation e g logarithm to stabilize the variance if the graph suggests that D Step 4 Inspect the graph to nd out if it suggests the possibility of representing the data as a realization of the classical decomposition model Xtmt8tYt 1 Where Q mt is the trend component d Q st Is a seasonal component With period d and 21st 0 Q Yt is a random noise component with 0 THE STANDARD PROCEDURE HAS TWO STEPS D Estimate and extract the deterministic components mt and st Q A popular model for st is the harmonic regression model k st Z Olj COSlt27Tfjtgt j sin27rfjt j1 C The Cg1 j 1 h are the parameters to be estimated C The fj control the frequency of the periodicity D Analyze the residual noise component Yt Q The aim is to hopefully model the residual Yt as a stationary time series STATISTICAL ANALYSIS OF TIME SERIES 10 STATISTICS 635 SUMMER 2005 A NONSEASONAL MODEL WITH TREND If our plot reveals no seasonal effect but does suggest the existence of a trend then we may use the simpler decomposition Xt mt Yr with 0 To construct a model for the data we consider two methods D METHOD 1 Trend estimation Q Fit a polynomial trend by Least squares regression Q Subtract the tted trend from the data Q Find an appropriate stationary time series model for the residuals D METHOD 2 Differencing Q Eliminate the trend directly by differencing Q Find an appropriate stationary time series model for the residuals Method 2 has the advantage that it typically uses fewer parameters and does not rest on the assumption that the trend remains the same throughout the observation time METHOD 12 TREND ESTIMATION BY LEAST SQUARES D A polynomial of degree k in t is posited as a model for mt ie mt 50 it 2t2 quot ktk k 51 j0 D The coe icients parameters are estimated by the least squares method ie nding those 53 that minimize the residual sum of squares n n k 2 R33 Z Xt mt2 Z X ZBjtj i1 2 1 D Computing In R this is done with the function lm Type helplm STATISTICAL ANALYSIS OF TIME SERIES 11 STATISTICS 635 SUMMER 2005 MORE ON LEAST SQUARES ESTIMATION OF TREND X17X27 39 quot 7XnT7 Y K7367 quot 39 71VTLl 7a11d39l 3 807817827 quot 39 78kT7 the model can be written in matrix form as XA6Y Where 1 t1 1 v2 A 1 tn t The ordinary least squares estimator of 6 is therefore I ATA 1ATX Remark Since the process is typically not an HD noise process the statistical properties of 6 will be different from the the results encountered in basic regression courses METHOD 1 APPLIED TO THE LAKE HURON DATA D The plot of the Lake Huron data set seemed to suggest a linear downward trend so that one could posit the following simple linear regression model mt 50 5115 B One could use the scaled index If 1 2 711 98 or the more meaningful original index If 1875 1876 7 1972 D Some of the R commands used dataLakeHuron Load the Lake Huron time series years lt 18751972 Create a variable named years lmhuronlt lmLakeHuron years Fit the linear regression model plotyearsLakeHuron type quotlquot Xlabquotyearquot ylabquotLevelquot ablinelmhuronlty2 Add the line to Huron plot acf residlmhuron Plot the ACF of the residuals STATISTICAL ANALYSIS OF TIME SERIES 12 STATISTICS 635 SUMMER 2005 D Some of the R commands used summarylmhuron Summarize the model obtained Call lmformula LakeHuron years Residuals Min 1Q Median 3Q MaX 2 509970 0 727260 0000829 0744024 2 535650 Coefficients Estimate Std Error t value Prgtt Intercept 625554918 7764293 80568 lt 2e16 years 0024201 0004036 5996 355e08 Signif codes 0 0001 001 005 01 1 Residual standard error 113 on 96 degrees of freedom Multiple R Squared 02725 Adjusted R squared 02649 F statistic 3595 on 1 and 96 DF p value 3545e 08 D Plot of trend and plot of sample ACF Series residlmhuron 552 I I n 580 581 I I n a I n e I Level W 221 579 I ACF m I 578 I 577 I an 576 I I I I I I I I man man mm mm wen n In 2U an AD yea Lag D Judging from the ACF plot is an HD noise process realistic adequate for Yt in the Lake Huron data STATISTICAL ANALYSIS OF TIME SERIES 13