### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Stat Methods and Models STA 251

UCD

GPA 3.69

### View Full Document

## 69

## 0

## Popular in Course

## Popular in Statistics

This 165 page Class Notes was uploaded by Carmen Mayer on Tuesday September 8, 2015. The Class Notes belongs to STA 251 at University of California - Davis taught by Peter Hall in Fall. Since its upload, it has received 69 views. For similar materials see /class/191913/sta-251-university-of-california-davis in Statistics at University of California - Davis.

## Popular in Statistics

## Reviews for Stat Methods and Models

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/08/15

METHODOLOGY AND THEORY FOR THE BOOTSTRAP Fourth set of two lectures Main topic of these lectures Theoretical properties of bootstrap confidence inter vals Studentised and non Studentised estima tOI S Let g 9F denote the bootstrap estima tor of a statistic 0 9F computed from a dataset X X1Xn Here F denotes the distribution function of the data Xi Write 02 02F for the asymptotic variance of Sn12 9 which we assume has a limiting Normal iioa2 distribution Studentised and non Studentised estima tors continued Let 62 0203 denote the bootstrap estima tor of 02 The Studentised form of S is T 36 7112 9amp which has a limiting Normal NO1 distribu tion Therefore P3 S 03932 ltlgta 01 PT g 13 x 01 We say that T is asymptotically pivotal be cause its limiting distribution does not depend on unknowns Edgeworth expansions of distributions of S and T We know from previous lectures that in a wide range of settings studied by Bhattacharya and Ghosh the distributions of S and T admit Edgeworth expansions 13090212 dgtxn 12P1x x n 1P2x x Pages dgtxn 12Q1x x n 1Q2x x where Pi and Qj are polynomials of degree 33 1 of opposite parity to their indices Bootstrap Edgeworth expansions Let Xquot XX denote a resample drawn by sampling randomly with replace ment from X and let and 6 be the same functions of the bootstrap data Xquot as 0 and 6 were of the real data X Put 3 7112 7 Tgtllt n12 denoting the bootstrap versions of S and T The bootstrap distributions of these quanti ties are their distributions conditional on X and they admit analogous Edgeworth expan Sions PSS6xl dgtxn 12131x x n 1132x x PT 22 l X dgtxn 121az x n 12az x In these formulae 1 and 27 are the versions of Pj and Qj in which unknown quantities are replaced by their bootstrap estimators Bootstrap Edgeworth expansions contin ued 1 For example recall that when 0 and 02 denote the population mean and variance P x 132 Q1 Q2 Replace a 4 EX mators H3 H4 H3 x2 1 13 i H4 5132 3 7i2m x4 10x215 m32x21 131 12H4 132 3 1 18H a42x2 3 x23 H3 2 0 3EX EX3 and H4 EX4 3 by their bootstrap esti 6 11 2 6 471 1 2X1 X394 3 1 1 t0 ge t 13113262LQ2 Skewness and KUI tOSiS Note that we call m3 skewness and K4 kur tosis Therefore the adjustment of order 71 12 that an Edgeworth expansion applies to the standard Normal approximation is a cor rection arising from skewness ie from asym metry If skewness was zero ie if 3 O and in particular if the sampled distribution was symmetric then the term of order 71 12 would vanish and the first nonzero term ap pearing in the Edgeworth expansion would be of size n l Likewise the term of size 71 1 in the expan sion is a second order correction for skewness and a first order correction for kurtosis or tail weight Kurtosis describes the difference be tween the weight of the tails of the sampled distribution and that of the Normal distribu tion with the same mean and variance If m4 gt 0 then the tails of the sampled distri bution tend to be heavier and if m4 lt 0 they tend to be lighter Skewness and kurtosis continued We can make the same interpretation for Edge worth expansions in general cases not just the case of the sample mean In general the term of size 71 12 in an Edgeworth expansion provides a first order correction for asymme try of the sampled distribution The term of size 71 1 provides a first order correction for tailweight relative to that of the Normal dis tribution and a second order correction for asymmetry Note particularly that to first order the boot strap correctly captures the effects of asym metry In particular since R3 H3Opn 12 then Q1 R3 2x2 1 63 222 1 Opn12 was opltn 12 Similarly 1 P 0pn12 and Q o Opn 12 for each j Accuracy Of bootstrap approximations It follows that PS 6a I X lt1gtxn 12131x x 01901 1 was n 12P1x gtx 01901 1 P8 3 am Opel 1 PT 22 I X xn 121x x 02901 1 was n 12Q1x x 01901 1 PT as Opel 1 That is the bootstrap distributions of S6 and T approximate the true distributions of 80 and T respectively to orders 71 PS g 613 X PS g 013 ppm 1 PT g a X PT g 13 Open 1 Compare this with the Normal approximation where accuracy is generally only 0n 12 These orders of approximation are valid uni formly in 13 Accuracy of bootstrap approximations con tinued These results underpin the performance of bootstrap methods in distribution approxima tion and show its advantages over conven tional Normal approximations Note however that the bootstrap distribu tion of S6 approximates the true distribu tions of 80 not the true distribution of 86 T Therefore in order to effectively use ap proximations based on 3 we usually have to know the value of a but generally we do not Therefore we do not necessarily get such good performance when using the standard per centile bootstrap which refers to methods based on 3 rather than the percentile t bootstrap referring to methods based on T Cornish Fisher expansions Let ga 77a Ea a denote a level qua ntiles of the distributions of SaT and the bootstrap dis tributions of SampT respectively PSU aPT 77aPS6 Ealx PT alxa Cornish Fisher expansions of quantiles in both their conventional and bootstrap forms are 04 205 71 12 P1Cfzoz 71 1 P2Cfzoz 770z Zoz 71 12 QEfCZa 71 1 62510305 H Ea m nil2 139 M1 1390200 77a m 71 12 Efm n 1 9020 where za 2 CID 1a is the standard Normal a level critical point Cornish Fisher expansions continued 1 Recall that Pff P1 QE C Q1 Pam P1xPx xP1x2 P2x ego c21ltxc2 1ltxgt anew was etc Of course the bootstrap analogues of these formulae hold too Pff 131 Q 621 132 131x131ltx x131x2 132x ego 192 ciao ammo ego etc Cornish Fisher expansions continued 2 Therefore since 1 Pj Opn 12 and Qj Qj Opn 12 it is generally true that Efren Pffltxopn 12 x Q fx0pn12 and hence that o Zak nil2 139m n 1P fzg 04 01901 1 7 7705 205 71 12 ngczoz 71 1 ngzoz 7 Z 7704 01901 1 These orders of approximation are valid uni formly in 1 on compact intervals Again the order of accuracy of the bootstrap approximation is 71 1 bettering the order 71 12 of the conventional Normal approxima tion But the same caveat applies in order for approximations based on the percentile boot strap ie involving 8 to be effective we need to know a Bootstrap confidence intervals We shall work initially only with one sided con fidence intervals putting them together later to get two sided intervals The intervals 11 00 71 120 1 a J1 00 71 125771 a cover 0 with probability exactly 04 but are in general not computable since we do not know either 1O or 771 On the other hand their bootstrap counterparts in 0079An 12039 1 oz7 112 00 71 125E1 a 71 00 n 125 1 a are readily computed from data but their cov erage probabilities are not known exactly They are respectively called percentile and percentile t bootstrap confidence intervals for 9 Bootstrap confidence intervals continued The other percentile confidence intervals for 0 are K1 oo n 1Qa a R oo9 n 125 a Neither in general has exact coverage The interval R is the type of bootstrap confidence interval we introduced early in this series of lectures We expect the coverage probabilities of I J1 K1 and R to converge to or as n gt 00 How ever the convergence rate is generally only 71 12 The exceptions are 11 and J1 for which coverage error equals 04 001 1 The interval 11 is generally not particularly useful since we need to know a in order to use it effectively Therefore of the intervals we have considered only J1 is both useful and has good coverage accuracy Advantages of using a pivotal statistic In summary Unless the asymptotic variance 02 is known one sided bootstrap confidence intervals based on the pivotal statistic T gen erally have a higher order of coverage accu racy than intervals based on the non pivotal statistic S Intuitively this is because in the case of in tervals based on S the bootstrap spends the majority of its effort implicitly computing a correction for scale It does not provide an effective correction for skewness and as we have seen the main term describing the de parture of the distribution of a statistic from Normality is due to skewness On the other hand when the bootstrap is ap plied to a a pivotal statistic such as T which is already corrected for scale it devotes it self to correcting for skewness and therefore adjusts for the major part of the error in a Normal approximation Derivation of these properties We begin with the case of the confidence in terval j Z 007 71 126771 05 7 our aim being to show that P9 6 J1 a 0n 1 Recall that 771 is defined by 1 057 and that by a Cornish Fisher expansion 771 05 Zl oz 71 12 QEfZ1 oz 01901 1 z1a nil2 Qifz1a open 1 27M 01901 1 where 771 is defined by PT g 771 1 04 Therefore P9 6 J1 Pn12 9V6 gt ma Pn12 96 gt mm Open 1 Derivation continued 1 Pm 6 J1 P n12lt ma gt mm Opel 1 PT gt m a OWL 1 If the 01901 1 term on the right hand side were a constant rather than a random vari able it would be straightforward to show us ing the property PT x lt1gta 71 12 c21ltxgt gtltxgt 001 1 on taking a n10 Opal1 that PT S 771 05 OPn 1 771 oz 0290771 71 12 Q171 a 01901 1 x n1a Opel 1 0014 lt1gtltn1a 71 12 Q1n1 a m a 001 1 PT n1a 001 1 These steps need only Taylor expansion Therefore 1 P9 6 J1 PT n1a 001 1 Derivation continued 2 This step can be justified using a longer ar gument which we shall not give here There fore 1 P6 6 fl PT 77M Glam 1 PT n1a 001 1 1a0n 17 the last line following from the definition of P9 6 fl or 001 1 which proves that the coverage error of the confidence interval J1 equals 001 1 Comparison with Normal approximation in terval A similar argument shows that coverage er ror for the corresponding confidence interval based on a Normal approximation ie N1 w n12521 a oo n126Za equals only 0n 12 Exercise Prove that Pm 6 N1 a 71 12 c212a 2a owl Therefore unless the effect of skewness is vanishingly small ie the polynomial Q1 van ishes coverage error of the classical Nor mal approximation interval N1 is an order of magnitude greater than for the percentile t interval J1 Derivation continued 3 Similarly it can be proved that coverage error of the interval in 00 71 12031 00 also equals 04 001 1 P9 e 111 or 001 1 However this result fails for the more practi cal interval IA12 00 g 71 12 5E1 a as we now show The argument will highlight differences between Edgeworth expansions in Studentised and non Studentised ie pivotal and non pivotal cases Derivation continued 4 Recall that 1a is defined by PS sag X I X 1 a and that by a Cornish Fisher expansion 1 a 31 05 71 12 pffZ1 oz 01901 1 2 2H 71 12 Pff21a Opn 1 l oz 01901 1 7 where 1a is defined by PS g a 1a 1 1 Therefore Pm e 112 Pn12 ma gt EH Pn12 96 gt 1a Opn 1 PT gt 1a Mn 1 Derivation continued 5 By Taylor expansion PT gm 0pn 1 lt1gt 1a Mn 1 71 12 Q1 1 a 0190771 x MM Mn 1 0014 dgt 1a 71 12 Q1 1 a MM 001 1 PT S 1a 001 1 PCT S 771 0 1 oz 771 0 71 a 001 1 Note too that l oz 771 05 71 12 Pffltz1agt Qifltz1agt owl 71 12 Q12a P1lt2a owl Recall that Pff P1 Q Q1 and P1 and Q1 are both even polynomials Derivation continued 6 Hence PT gm Open 1 PCT S 771 05 71 12 elm P1lt2a 2a 001 1 1 a n12Q1Za Plea cm 0n 1 Therefore P9 e in PT gt 1a Opal 1 a n 12P12a c21lt2a gt2a 0n 1 It follows that the confidence interval 12 has coverage error of order n 1 if and only if P1Zoz Q1Zoz O that is if and only if the skewness terms in Edgeworth expansions of the distributions of Studentised and non Studentised forms of the statistic are identical when evaluated at za Two sided confidence intervals Equal tailed two sided confidence intervals are usually obtained by combining two one sided intervals For example if we are using the interval fl ma oo n 125 1 a for which the nominal coverage is or we would generally construct from it the two sided in terval J2 j1104j11 04 n 128 1a2 g 12377lt1 agt2gt The actual coverage of jg equals aOn1 and can be derived as follows METHODOLOGY AND THEORY FOR THE BOOTSTRAP Seventh set of two lectures Main topic of these lectures Bootstrap methods for nonparametric curve estima tion Pointwise versus simultaneous confidence regions We shall dispose of this topic first so that we can focus subsequently on other issues Suppose we have an estimator g of a func tion 9 on an interval 1 and for a given level 1 a of probability have constructed a con fidence region or tube for 9 consisting of a boundary above and a boundary below the curve represented by the formula y gen for a E I Pointwise versus simultaneous confidence regions cont 1 The region can be interpreted as the union of intervals 1x 2x for 1 e 1 Of course g1 and g2 are constructed from data and satisfy Q1 SQ Such a region is commonly referred to a 1 a eve confidence region for g on the interval 139 We can interpret the statement in two ways Either i the interval 1x 2x covers gx with probability approximately 1 oz for each 1 E I or ii the probability that the graph rep resented by the equation y gx lies within the tube converges to 1 or as n increases Pointwise versus simultaneous confidence regions cont 2 Interpretations i and ii are generally refer red to as pointwise and simultaneous re spectively In conventional parametric problems the point wise interpretation seems generally to be favoured For example in regression we often wish to predict the value of EYX x0 denoting the regression mean for only a small number of values of mo However taking the simultaneous interpreta tion causes no difficulty in the parametric case Bootstrap methods are just as easily pressed into use there as in the pointwise context In particular in parametric problems both point wise and simultaneous confidence regions are of width 71 12 Pointwise versus simultaneous confidence regions cont 3 This close relationship vanishes in nonparamet ric cases however There simultaneous confi dence regions are generally an order of magni tude wider than their parametric counterparts Although the factor by which the width in creases is proportional only to log 7012 in asymptotic terms the increase is generally substantial and this alone causes simultaneous bands to be unpopular When coupled with the relative lack of interest in predicting the value of EYX x0 simul taneously for many values of mo this means that the pointwise interpretation is the obvious choice in at least the setting of nonparamet ric regression We shall adopt it in the density estimation context too Our treatment of confidence regions in the setting of nonparametric curve estimation will address only the case of nonparametric den sity estimation Nonparametric regression is broadly similar Local estimation Nonparametric curve estimators usually esti mators of densities or regression means work without imposing structural assumptions Typically the only conditions required are that the function in question have sufficiently many bounded derivatives That is it should be suf ficiently smooth Since only smoothness is assumed the value of the estimator at a particular point 1 say is based largely if not wholly on data values close to 13 In density estimation where we have a sample X1Xn from the distribution with density f this means that to estimate f at 1 we use only those Xi s that are close to 13 In regression where estimators are based on data x1Y1anYn and we wish to esti mate gx EYX 13 we use only those pairs xiY for which x is close to 13 Bandwidth bias and variance We shall treat estimators based on kernel methods Here the extent of the local neigh bourhood of 1 within which we work is de fined by a bandwidth which plays the role of a smoothing parameter As the bandwidth increases the number of data on which our estimator is based also grows and so estimator variance decreases However as bandwidth increases the size of the neighbourhood expands and consequently the extent to which it accurately reflects the function at 1 decreases Therefore bias in creases as bandwidth increases Example Nonparametric density estima tion Suppose we sample independent and identi cally distributed data X X1Xn from a distribution with density f We wish to esti mate this function Let K be a bounded compactly supported symmetric probability density let h gt 0 denote a bandwidth and put fltwgt 2KltXigt This is our estimator of f13 The estimator f is itself a density it is non negative and it integrates to 1 since K has both those properties Reliance of f on bandwidth Note that X iMx h 1 Klt x h 1 is itself a density for each fixed X m 2 O fur 1 The density w gets narrower and taller as h de creases Our estimator f is obtained by simply averaging the values of the M s Clearly ad justing the bandwidth affects the shape and hence the properties of f Mean and variance of f Recall that we can write f 71 12147 For fixed 1 the random variables zMx are indepen dent and identically distributed Therefore Ema Ewan g Kquot g 9 fy dy Kufx hudu varfa n 1 varwm Efx2 TL Exercise Prove from these results and el ementary calculus that if h hn gt O as n gt oo in such a manner that nh gt 00 and if f has two continuous derivatives in a neigh bourhood of x then mm fx ro2 h Mas 007 varfx nh1mfx0nh1 where H fK2 H2 2 du The first result here implies that x is asymp totically unbiased for x That is as n in creases the difference between E x and the quantity x that x is estimating con verges to zero The second result implies that the variance of x converges to zero as n increases 10 Mean squared error Therefore the mean squared error of fa is given by Em for A varfx Efa fx2 02 h4 0nh1 h4 where the constants 01 Isf13 and 02 I f x2 depend on 13 Proof Use the results of the Exercise It follows that the optimal choice of h for the purpose of minimising mean squared error is of size 71 15 This order of magnitude of bandwidth brings the variance and squared bias terms of respec tive orders nh1 and h4 into balance 11 Effect Of bandwidth choice on bias Note particularly that when we take it CTL15 where c gt O is a constant the bias of f13 is rendered of order 71 25 Erna fx r392 h Mas on cos 7125 OOH5 where Cx aw c2 f 13 This is an espe cially large order of magnitude for bias 12 Choice of kernel Common Choices of K are the standard normal density 1 1 2 and the k weight kernel is ck 1 U2 for u g 1 where the integer k 2 1 is chosen so that K ntegrates to 1 The case is 2 is popular then K is called the biweight kernel 13 High order kernels More generally we can take K to satisfy 00 1 ifjO u9Kudu 0 iflgjgr l 1 00 720 iszr for a given integer r 2 1 When using a kernel of this type the variance of f remains of size nh1 but the order of bias changes from h to hi Therefore by choos ing 7 gt 2 we can improve at least in theory the mean square performance of f In partic ular the order of mean squared error can be reduced to n2T2T1 by choosing h of size n l2r1 Note however that if r gt 2 then if 1 is to hold K must take negative values and as a result f is not any longer guaranteed to be nonnegative We shall take 7 2 in all the arguments below 14 Nonparametric and semiparametric prob lems For the sake of definiteness consider the case where f is constructed using a kernel that is a probability density and employing a band width that is optimal for that setting ie of size 71 15 Then as we have seen the rate of mean sduare convergence of fto f is 71 45 Em fx2 0 71 45 Moreover this rate of convergence is optimal in a minimax sense for densities with two der ivatives it cannot be improved This relatively slow rate of convergence of smaller order than the rate 71 1 at which a sample mean converges to a population mean characterises nonparametric density estimation as a nonparametric problem 15 Nonparametric and semiparametric prob lems cont The effective number of parameters that are being fitted when constructing the density es timator f in a given interval equals the num ber of bandwidths that can be fitted into the interval Therefore the number of fitted parameters di verges at rate 7115 as sample size grows In comparison estimation of global char acteristics such as mean variance and other moments is a semiparametric problem Al though in such cases estimation involves a po tentially infinite number of unknowns conven tional convergence rates can be attained 16 Implications for the bootstrap We have not so far encountered cases where the effective number of parameters grew un boundedly as sample size increased The implications for the bootstrap are man ifested in at least two ways difficulties with bias and a worsening of overall convergence rate including the order of magnitude of cov erage error Both these difficulties are manifested to some extent in more conventional parametric prob lems where the number of parameters is large although fixed as sample size increases In such instances difficulties with bias and ac curacy arise frequently although in a theoret ical treatment they do not result in an actual deterioration of convergence rate 17 Bias in semiparametric problems Recall that in semiparametric estimation prob lems the biases of estimators are of order 71 For example the sample mean X is unbiased the sample variance 2 1 n 2 539 ZXz X i1 has bias of order 71 Le Eamp2 1 n 102 and more generally bias 2 E 9 in 1 0 71 2 where C denotes a constant 18 Bias in semiparametric problems cont Moreover the bootstrap estimator of bias t AasE X accurately approximates bias Indeed bias 2 bias 0p 71 32 and this high degree of precision led us to sug gest b s as a bias corrected estimator of 6 abc b sz Ew m 2 E 2c 19 Bias in nonparametric problems Reflecting the infinite parameter nature of nonparametric density estimation the boot strap fails rather spectacularly to approximate bias To appreciate this point let Xquot X12 X denote a resample drawn by sampling random ly with replacement from X Then the stan dard bootstrap form of f is fquot defined by fxi n Klt xXi Therefore Efx IX an implying that the bootstrap estimator of bias is b s Emu X M o 20 Bias in nonparametric problems cont 1 The actual bias of f is of size 71 25 which of course is substantially larger than the order 71 of bias in semiparametric problems Yet the bootstrap estimates bias as zero One consequence of this property is that if we use conventional bootstrap methods to con struct a confidence interval for x we shall instead get a confidence region for E x Sometimes it is suggested that it is satisfactory to compute a confidence region for E x rather than x I tend to disagree with this believing that a working statistician is unlikely to be enthusiastic unless he or she has a good idea about the extent of bias which is sel dom the case If we want a confidence region for x rather than for E x we shall have to compensate for bias in some way for example by subtract ing an estimator of bias 21 Bias in nonparametric problems cont 2 The bias problem that we observed for kernel estimators arises for any linear estimator of f Suppose for example that n x Z aX i1 where a is a function In particular in the case of kernel density estimators 1 a u m E Klt h gt 39 Then the argument given earlier implies once more that Efx X f13 and hence that b s Emu X 1 o 22 Bias in nonparametric problems cont 3 Many nonparametric density estimators are ex actly linear the rest including spline based es timators are approximately linear and so the difficulties with bias are similar to first order to those discussed above For example an orthogonal series density esti mator is given by JFW Z 54939 147 j1 2 where m 2 O is a smoothing parameter the analogue of h l in a sense dam is a complete orthonormal sequence of functions and IH 04339 n E 15339 i1 Here the estimator at 2 has the linear form 17 Z aX i1 provided we take au 71 1 ngm ju dim 23 Correcting for bias It follows from these properties that when us ing bootstrap methods in nonparametric prob lems we must take care to compensate for bias There are at least two ways of doing this we can explicitly estimate bias and subtract it out or we can undersmooth when computing f and reduce in that way the impact of bias Because in nonparametric problems the boot strap is not effective in accommodating bias we cannot use the bootstrap to estimate bias We must develop an alternative approach 24 Correcting for bias cont 1 To appreciate how explicit bias correction might be implemented return to the formula given earlier for Efx and use it to obtain an expression for bias biasa Efx fx w h f a 007 3 Therefore to first order bias equals am h2 f x In this expression we know m2 and I12 it re mains only to estimate f 13 This we can do by twice differentiating the es timator f quotl i n 33 Xz f We hl 4 25 Correcting for bias cont 2 From previous page H i n 33 Xz f We hl 4 We shall often wish to use a different band width when estimating f compared to that when estimating f Our use of hl rather than h in 4 indicates this Note however that this means that f is not necessarily the second derivative of f Observe too that the mean squared error of f as an estimator of f is optimised by taking h m 71 19 rather than h W 71 15 as is the case when estimating f These results are for sec ond order kernels Now h 19 is an order of magnitude larger than 71 15 and so the bandwidth hl employed for estimating f will generally have to be of larger order than the bandwidth h that we use for estimating f 26 Correcting for bias cont 3 Once we have computed an estimator f of f we can construct an estimator of bias by sub stituting f for f in formula 3 given earlier biasa Efx fx m2h2f x 0029 In particular our bias estimator is 3 b sm mg h2 fquota Given a confidence region for Efx for ex ample a region 31 0032x 04 where 21m and 2201 are found using conven tional bootstrap methods and or denotes the nominal coverage probability we can recentre it by subtracting our estimator of bias obtain ing a confidence region for fa 21mm b sx22x a b spn 27 Bias correction by undersmoothing We undersmooth if we use a bandwidth that is small relative to the one that would be em ployed to optimise the mean square perform ance of x as an estimator of x To see how this might be done return to the formulae given earlier in an exercise for the mean and variance of x Em fx H2 h Mas 007 varfx nh1ltofx0nh1 28 Bias corr n by undersmoothing cont 1 From previous page Emu fx ro2 h Mas 007 varfx mm 1mm oltnh 1 The estimator fa is asymptotically normally distributed and so these formulae imply that fun fx nip 2 ax No2 h bps 0pnh 12 I29 where N13 is an asymptotically normal stan dard normal random variable am wow2 be ro2 Ms By taking h to be of strictly smaller order than 71 15 we ensure that the term h b13 which represents the major part of bias of f13 is negligible relative to the term nip 2 arm No2 which contributes the stochastic error 29 Bias corr n by undersmoothing cont 2 A drawback to undersmoothing is that it makes the confidence region wider To appreciate why note that the width of the region is determined primarily by the standard deviation of the estimator f13 This is true in the more familiar problem of semiparametric problems and it holds true too for the non parametric case This means that the width of the region is asymptotically proportional to nh12 3O Bias corr n by undersmoothing cont 3 Indeed if we are constructing a two sided 1 a level bootstrap confidence region for f13 then bearing in mind that fps Efx mm 2 arm Nos 0pnh 12 7 where N13 is asymptotically normal NO1 and aa H f1312 the asymptotic width of the interval will be that of the correspond ing normal appximation interval based on the distribution of nh12 a13 Nx The width of the latter interval equals 2nh12 aa z1a2 where 25 denotes the solution of the equation 31 Bias corr n by undersmoothing cont 4 Therefore the width of the bootstrap con fidence region is asymptotically proportional to nh12 If we undersmooth then we decrease h and so the width of the confidence region con structed by undersmoothing will increase it will be wider than if we did not undersmooth Nevertheless arguments based on high order asymptotics can be used to prove that confi dence regions constructed by undersmoothing can be made more accurate in terms of cover age than regions constructed by explicit bias correction In these calculations it is assumed that the bandwidths for explicit bias corrected confid ence regions and for the undersmoothed con fidence regions are chosen to miminise cover age error 32 Constructing basic bootstrap confidence regions for f13 The preceding arguments address bias adjust ment but do not give the basic construction of confidence regions We shall discuss those next There is a wide variety of options espe cially in the setting of nonparametric density estimation We shall restrict attention to a relatively con ventional percentile t approach and to a non standard percentile method which exploits spe cial properties of density estimators 33 Percentile t method It can be proved that varfgtu2 vv 1u 5 w EKltXrgtj Kuj fx hu du where Analogoust to 5 varf X MU 1 f2 71 1 f2 6 where we put A 1 quot X 939 miazdaf i1 Exercise Derive 5 and 6 34 Percentile t method cont 1 Define too A 1 n 13 X7k 7 9quot a K 1 fjlt h lt h gt The variance property 6 suggests that we should take as our Studentised statistic the ra tio f 1 mmW2 71le and as its bootstrap form n f mm1f n1 f212 77 127 T Note that our definitions of T and T are mo tivated by the exact formula for variance at 6 rather than by the common asymptotic approximation varf nh1 was 35 Percentile t method cont 2 We we take as our estimator of the distribution of T the distribution of T conditional on the data In particular using standard methods based on Monte Carlo simulation we compute a numer ical approximation to the solution z 2am say of the equation PTa ZXoz and take a two sided percentile t bootstrap confidence interval for u x to be the interval fps 21a2xaxfx zagrx am where 32 nh1 f2 n l The latter interval can be explicitly corrected for bias or the bandwidth h can be chosen small so as to minimise the impact of bias 36 Percentile method Since varf is approximately proportional to f then the square root transformation is approx imately variance stabilising Therefore the asymptotic distribution of S nh12ltf12 iii2 does not depend on unknowns Exercise Prove this result and find the as ymptotic distribution 37 Percentile method cont We we take the distribution of 3 conditional on the data to be our estimator of the distri bution of 8 Therefore we compute a numerical approxi mation to the solution y aw say of the equation PSa yX 04 and take a two sided percentile method boot strap confidence interval for u x to be the interval fps g1a2x fps gape Again explicit bias correction or undersmooth ing can be used to convert this into a confi dence region for f13 38 METHODOLOGY AND THEORY FOR THE BOOTSTRAP Sixth set of two lectures Main topic of these lectures Bootstrap methods in linear regression Regression model Assume we observe pairs x1Y1xnYn generated by the model YEZQWDI Ez 1 where g is a function that might be determined either parametrically or nonparametrically and the errors ei have zero mean Regression model cont In the study of regression we take the explana tory variables x to be fixed either because they are pre determined eg were regularly spaced or are conditioned upon In this case the only source of randomness in the model is the errors 6 and so it is those that we resample in form of residuals when implementing the bootstrap Our choice of lower case notation for the explanatory vari ables reflects this view Correlation model Alternatively we might take the view that the explanatory variables are genuinely random and must be treated us such For example the data pairs XYZ for 1 g 139 g n might be drawn by sampling randomly from a bivariate distribution and in our analysis of those data we might wish to preserve all the implications of this randomness rather than condition some of it away by regarding the Xi s as fixed This approach to analysis might be termed the study of correlation rather than regression It would be addressed in bootstrap terms by re sampling the pairs XYZ rather than resam pling the residuals It is important to appreciate that these two dif ferent approaches to resampling sampling the residuals or sampling the pairs XYZ re spectively are appropriate in different set tings for different models They are not alter native ways of doing the same thing and can lead to different conclusions Parametric regression The good properties of percentile t methods carry over to regression problems However in the setting of slope estimation those properties are significantly enhanced and even the stan dard percentile method can perform unusually well For example one sided percentile t confidence regions for slope have coverage error 0n32 not 001 1 and the error is only 001 2 in the case of two sided intervals One sided standard percentile method confi dence intervals for slope based on approxi mating the distribution of 0 by the con ditional distribution of have coverage error 001 1 rather than the usual 0n12 General definition Of slope Although these exceptional coverage proper ties apply only to estimates of slope not to estimates of intercept parameters or means slope may be interpreted very generally For example in the polynomial regression model where we observe mid1 for 1 g 139 g n we re gard each dj as a slope parameter A one sided percentile t interval for dj has coverage error 0n32 although a one sided percentile t in terval for c or for EYxx0c I xod1 I x6ndm has coverage error of size 71 Why is slope favoured especially The reason for good performance in the case of slope parameters is the extra symmetry con ferred by design points Note that in the polynomial regression case we may write the model equivalently as YC M dlx n mdm where gjznlzixg39 and c cg1d1 mdm The extra symmetry arises from the fact that n Z 33 7 0 i1 forlgjgm Correlation model The results implying good coverage accuracy hold under the regression model but not nec essarily for the correlation model For example in the case of the linear correlation model the symmetry discussed above will persist provided that EZX eefo 1 i1 for sufficiently large k Now 6 Y gXgt and as a result 1 will generally not hold for k 2 1 This means that under the cor relation model the conventional properties of bootstrap confidence intervals hold the special properties noted for regression when estimat ing slope parameters are not valid However if the errors 6 are independent of the explanatory variables X then 1 will hold for each k and in such cases the enhanced fea tures of the regression problem persist under the correlation model Simple linear regression Consider the regression model Y C 331 d 61 where the ei39S are independent and identically distributed with zero mean and finite variance 0 2 Define a n 1 2 5131 3 A 1 n dzax2 Zx xei e i1 and E Y icl Estimate yo EY13 5130 2 0 mod by o5xod7 and put as 102xo i2 at Yg Y xi i and 62 2 71 12135 the latter estimating 02 The asymptotic variances of c and g0 equal 020109 and 020571 respectively and so n12 3 00336 and n12g0 yOaay are asymptotically pivotal Bootstrapping the simple linear regression model The residuals YE Y i3 are centred in that Z a 0 Therefore we may resample randomly with replacement from 1 n obtaining e ie say and take as our bootstrap resample the pairs x1Y1 mmef where Yquot Exicie Note particularly that reflecting the fact that we condition upon the explanatory variables those quantities are the same as in the original dataset In regression problems where the residuals are not centred for example in nonparametric re gression we generally centre them for exam ple by subtracting their average value before resampling Estimating quantiles Of distribution of 6 Let 5 a and 8 have the same formulae as E d and a respectively except that we replace Y by Y throughout The bootstrap versions of S n12ci daxa T n12J dax6 are 3 n12ci ciax8 Tgtllt n12d 0xa respectively We estimate the quantiles 5a and 77a of the dis tributions of S and T by Ea and 77a respectively where and X xlY1xnYn denotes the dataset 10 Bootstrap confidence intervals fOI d One sided bootstrap confidence intervals for d with nominal coverage or are given by in 0073 71 12 UUx 1 agt 7 112 007 J 71 12 1 05 7 71 0073 71 12 aUx 1 ozgt These are direct analogues of the intervals ill in and J1 introduced earlier in non regression problems In particular 12 and J1 are standard percentile method and percentile t bootstrap confidence regions Following the line of argument given earlier we would expect T11 and f1 to have coverage error 001 1 In fact they both have coverage error equal to 0n32 However T11 is not of practical use since it depends on the unknown 0 so we shall not treat it any further Likewise we would expect in to have cov erage error of size 71 12 However we shall show that the error is actually of order 71 11 Bootstrap confidence intervals for d cont Note that although 12 involves the variance estimator 6 it can be constructed numerically without resorting to computing 6 Indeed 12 is identical to the interval 112 003 731a 7 where aka is the standard percentile method estimator of w1a the latter defined by PJ d w1a 1 04 In particular who is defined by PJ J u71aXl oz The interval fl is a standard percentile t boot strap confidence interval 12 Polynomials in Edgeworth expansions Edgeworth expansions for the non Studentised and Studentised statistics 8 and T respec tively are given by P8 u ltDu n 12P1u gtu n 1P2u gtu PTu un 12Q1u u n 1Q2u u where P1U Q1U 77x 1 U2 7 Eea3 vac 71 12er iax3y and P2 and Q2 are odd quintic polynomials with P2u Q2uw23242 H gtltw2 3 and H Eea4 3 Note particularly that P1 Q1 13 Why does P1 equal Q1 To understand why it is helpful to treat 8 as an approximation to T Indeed note that by definition of 82 l A 32 612 II S H H 6 g as 4 C d2 M3 S H H 1 n 2 02 263 02opn 1 n i1 Therefore defining n A n la2 Z 62 02 2 21 and recalling that S n12J daxa T n12J dax6 we deduce that T 31 A 0pn 1 14 Why does P1 equal Q1 cont 1 Making use of the approximation TS1 AOpn 1 1 the symmetry property n Zxi i0 2 1 21 and the representation n S n12a101 Z i 61 2 21 it is readily proved that E8 1 AW E09 001 1 3 for j 123 Exercise Derive 3 for j 123 15 Why does P1 equal Q1 cont 2 Therefore the first three cumulants of S and 81 A agree up to and including terms of order 71 12 It follows that Edgeworth expansions of the distributions of S and 81 A differ only in terms of order n l In view of 1 the same is true of the distributions of S and T PS g u PT g 111 0n 1 Therefore the 71 12 terms in the expansions must be identical that is P1 Q1 The chief ingredient in this argument is the symmetry property 2 It and its analogues guarantee that in the problem of slope estima tion for general regression problems P1 Q1 16 Consequences of the property P1 Q1 The identity P1 Q1 implies that to first or der ie up to and including terms of order 71 12 estimating the distribution of S is the same as estimating the distribution of T As we saw earlier in non regression problems the percentile method estimates the distribu tion of 8 whereas the percentile t method es timates the distribution of T The fact that in the setting of estimating slope in regression P1 Q1 means that these two techniques give the same results up to and including terms of order 71 12 They differ only in terms of order 71 1 and terms of higher order Therefore since one sided confidence intervals based on the percentile t method have cover age error equal to 001 1 the same must be true for confidence intervals based on the per centile method 17 Properties of percentile t confidence re gions The coverage error of a one sided percentile t confidence interval for d is of order 71 32 rather than the usual n l That is with J1 defined by j Z 007 J 71 12 771 05 7 it can be shown that Pd 6 J1 a 0n 32 4 Since terms of odd order in 71 32 cancel from formulae for coverage error of two sided confi dence intervals then the two sided percentile t bootstrap confidence interval for d has cov erage error of order 71 2 rather than the usual 71 1 18 Derivation of 4 A proof of 4 can be given as follows Recall that PT u lt1gtu 71 12 Q1u u n 1Q2u u where Q1um1 u2 7 Eea3 and m n 1 El 2 iax3 19 Derivation of 4 cont 1 The bootstrap version of the Taylor expansion is PT UIX lt1gtu n 121u Mu n 1Q2u u where 100 AY YU 1 u2 and 2 E amp3 Now the solutions 77a and 77a of the respective equations admit the Cornish Fisher expansions 77a Zoz 71 12 QEfCZa 71 1 62510305 7 7 za 71 12 mm n 1 9920 20 Derivation of 4 cont 2 Cornish Fisher expansions 77a Zoz 71 12 QEfCZa 71 1 62510305 7 a za 71 12 QEfCZa n 1 29920 On subtracting these expansions we deduce that w n12Ef2a QEfZa n 15fltza ngm n 12Q1ltza Q12a 02901 3 where we have used the fact that Q Q1 Q 1 and C2 Q 0pn12 21 Derivation of 4 cont 3 From previous pages 1u Q1u W 7 7x 1 1L2 5 ag Wax 71 12 Q1Zoz Q1Zoz Opn32 6 It may be prOVed by Taylor expansion that 3 n12i iE 3i i J d3 71 1 61 E i J d232 7 12U0pn1 7 where n U 27712 Z 523 v 76 1 35i 1 21 and 6i 610 Combining results 5 7 we deduce that a na 2 art 1va 1 zi Open 3 22 Derivation of 4 cont 4 From the previous page 771 05 771 05 n 1 CU OPn 32 where c 7x 1 za Therefore Pd 6 fl P d lt J 71 12 6ax 1a PCT gt 771 05 PT I nlcU gt 771a 0pltn 32 P T 71 1 CU gt 771agt On 32 assuming we can treat the Opn 32 inside the probability as though it were deterministic and take it outside 23 Derivation of 4 cont 5 From previous page Pd 6 fl PT I n1 CU gt 771a On32 8 It can be proved that for any choice of the constant c the first four moments and hence also the first four cumulants of T I n1cU are identical to those of T up to and including terms of order 71 32 Hence recalling the way in which moments influence Edgeworth expan Sions PT n 1 cU gt mm PT gt 7710 001 3 2 or 001 32 Therefore by 8 Pd 6 fl or 0n 32 as had to be proved 24 The other percentile method interval Recall that the percentile method interval in is based on bootstrapping 3 d that is it is based on approximating the distribution of this quantity by the conditional distribution of 3 d The other percentile method is based on us ing the conditional distribution of a to approx imate the distribution of 3 It leads to the in terval R1 ooci nil2 aax 7 ooia where Q is an approximation to Ca these two quantities being defined by PltJEaixa Macaw However R has coverage error of size 71 12 not 71 In particularly R does not enjoy the accuracy of the percentile method interval 12 This is a consequence of it addressing the wrong tail of the distribution of J 25 Properties of confidence intervals for the conditional mean yo EY13 x0 and the intercept 6 Recall that yo c I xo d which in turn equals 0 when x0 0 Therefore we can treat confi dence intervals for c as a special case of those for yo Recalling that go 2 5 x0 cl redefine s n go gigMy T 7112 Q0 yoa 0y and taking g 5 x0 3 redefine n12g5 Q05 ay 7112 g5 goaay 3 T 26 Properties of confidence intervals for yo and 0 cont Percentile method and percentile t confidence intervals for yo and c are given respectively by oo go 71 12 waggt51 oo go 71 12 waggt771 112 fl vvhere a 1 02 130 i2 and we define 1 a and 771 77 by PSS alxa7 PTS alxa7 for the new versions of 3 and T The intervals in and J1 have coverage errors 0n12 and 001 1 respectively These re sults unlike those for slope are conventional 27 METHODOLOGY AND THEORY FOR THE BOOTSTRAP Third set of two lectures Main tODiC Of these lectures Edgeworth expansions Moments and cumulants Let X be a random variable Write Xt EeitX for the associated characteristic func tion and let Hj denote the jth cumulant of X ie the coefficient of NVj in an expansion of log X03 X03 2 expH1it 2it2 I j 1sjitj I The jth moment W EXj of X is the coefficient of it j in an expansion of X03 X00 2 1mt u2it 71mjz39t3 Expressing cumulants in terms of mo ments and vice versa Comparing these expansions we deduce that H1 M1 H2M2 MV3FX m3 u3 3u2m2iamp EX EX3 H4M4 4M3M1 3M BrentGM EX EX4 3varX2 In particular Hj is a homogeneous polynomial in moments of degree j Likewise W is a homogeneous polynomial in cumulants of de gree j Third and fourth cumulants K3 and K4 are referred to as skewness and kurtosis respec tively Exercise Express W in terms of H1 mj for j 14 Prove that forj 2 2 Hj is in variant under translations ofX Sums Of independent random variables Let us assume M1 O and 2 1 This is equivalent to working with the normalised random variable Y X mag2 instead of Y although we shall continue to use the notation X rather than Y Let X1X2 be independent and identically distributed as X and put TL 31 The characteristic function of Sn is mt E expit8n EexpitX1n12 expitXnn12 EexpitX1n12 gtlt EexpitXnn12 Xtn12 Xtn12 Mimi2 Sums of independent random variables continued Therefore since H1 2 O and m2 1 We Mimi2 GXDI 1itTL12 r n l j it711 7 exp t2 n12H3 it3 n ltj 2gt2 j it7 Now expand the exponent Xnt et22 1 n12r1it nj2 730 where rj denotes a polynomial with real co efficients of degree 33 having the same par ity as its index its coefficients depending on H3l j2 but not on n In particular 1 3 1 4 1 2 6 r1u 6H3u r2u 4u 7 2s3u Exercise Prove this result and the parity property of rj Expansion of distribution function Rewrite the expansion as Xnt et22 71 12 r101 et22 2 n 92rjzte t 2 Note that 00 t Xmas ewdpwngx OO 00 e t22 611530 7 OO where D denotes the standard Normal distri bution function Therefore the expansion of Xnt strongly suggests an inverse expan sion 1309mm dgtxn 12R1x nj2Rja where 00 elm de13 730 et22 OO Finding a formula for Rj Integration by parts gives et22 2 00 elm ddgta oo 00 z t1 emddgt1x OO 2 10 emdq mx CgtO where dgtj13 Djdgta and D is the differ ential operator ddx Therefore 0 em eta D W W e tQQ Interpreting rj D as the obvious polynomial in D we deduce that CgtO eitx et22 Therefore by the uniqueness of Fourier trans forms Hermite polynomials The Hermite polynomials He0a 1 He1a 13 He2a 512 1 He3x a x2 3 He4a x4 6x23 He5a 513334 10x2 15 are orthogonal with respect the standard Nor mal density gb 2 CD are normalised so that the coefficient of the term of highest degree is 1 and have the same parity as their index Note too that Hej is of precise degree j Most importantly from our viewpoint Dj Wm Hej 133 33 Formula for Rj Therefore if rju clu I C3ju3j then Rj33 rj D Mm 01 Heo 03339 Hesj 1U gtlt It follows that we may write Rjx Pja gb13 where Pj is a polynomial Since rj is of degree 33 and has the same parity as its index and Hej is of degree j and has the same parity as its index then Pi is of degree 33 1 and has opposite parity to its index Its coefficients depend on moments of X up to those of order j 2 Formula for Rj continued Examples 31 H3 2 1 Mm R2a 1 rst 512 3 7 12H3 134 105132 15 was Exercise Derive these formulae This is straightforward given What we have proved already Asymptotic expansions We have given an heuristic derivation of an expansion of the distribution function of Sn PSn as lt1gtx n 12R1az nj2Rja where In order to describe its rigorous form we must first consider how to interpret the expansion The expansion seldom converges as an infinite series A sufficient condition due to Crame r is that EeX24 lt 00 which is rarely true for distributions which are not very closely con nected to the Normal distribution Asymptotic expansions continued 1 Nevertheless the expansion does make sense when interpreted as an asymptotic series where the remainder after stopping the ex pansion after a finite number of terms is of smaller order then the last included term PSn as lt1gtx n 12R1az nj2 Rja 0nj2 A sufficient regularity condition for this result is EX72 lt oo lim sup Xt lt 1 OO t gt Rigorous derivation of the expansion under these restrictions was first achieved by Crame r When these conditions hold the expansion is valid uniformly in 13 Asymptotic expansions continued 2 Since moments of order j 2 appear among the coefficients of the polynomial Pj and since Rj Pjgf then the condition EXj2 lt 00 is hard to weaken It can be relaxed when j is odd however The second condition limsupltgt00 Xt lt 1 is called Cram r s continuity condition It holds if the distribution function F of X can be written as F 7rGI 1 7r H where G is the distribution function of a random variable with an absolutely continuous distribution H is another distribution function and O lt 7r g 1 Exercise Prove that if the distribution F of X is absolutely continuous ie for a density function f W Ofu du then Crame r s continuity condition holds in the strong form limsupltgt00 Xt 0 Hence verify the claim made above Asymptotic expansions continued 3 Therefore Crame r s continuity condition is an assumption about the smoothness of the dis tribution of X It fails if the distribution is of lattice type ie if all points 1 in the sup port of the distribution of X have the form xzjh I a where hgtO and ooltaltoo are fixed and j is an integer If h is as large as possible such that these constraints hold it is called the span of the distribution of X When X has a lattice distribution with suf ficiently many finite moments an Edgeworth expansion of the distribution of Sn still holds in the form PSnx dgtxn 12R1x njQ Rm ornjQ but the functions Rj have a more complex form In particular they are no longer contin uous Asymptotic expansions continued 4 The gap between cases where Crame r s con tinuity condition holds and the case where X has a lattice distribution is well understood only for j 1 There was shown by Esseen that the expansion Poem x lt1gtx nil2 R1a owl2 is valid under the sole conditions that the dis tribution of X is nonlattice and EX3 lt oo Asymptotic expansions of densities Crame r s continuity condition holds in many cases where the distribution of Sn does not have a well defined density Therefore it is unrealistic to expect that an expansion of the distribution of Sn will automatically imply an expansion of its density However such an ex pansion is valid provided Sn has a well defined density for some n There writing fn1 ddx PSn g 13 we have Mas W 71 12 R 1x njQ was orn jQ provided EXj2 lt 00 The expansion holds uniformly in 13 Asymptotic expansions of densities con tinued A version of this local expansion as it is called also holds for lattice distributions in the form PSnx gbxn12R1x ml was arm 939 uniformly in points 13 in the support of the distribution Curiously the functions Rn in this expansion are the same ones that appear in the usual non lattice expansion of PSn g 13 Exercise Derive the version of this local lat tice expansion in the case Where the unstan dardised form ofX has the Binomial Binp distribution Treating the case j 1 is ade quate largerj is similar but more algebraically complex Hint Use an expansion related to Stirling s formula to approximate Expansions in more general cases The year 2003 was the 75th anniversary of the publication of Cram r s paper Oh the composition of elementary errors in which he gave the first general rigorous expansion of the distribution of a sum of independent and identically distributed random variables The cases of other statistics have been dis cussed for many years but it was not until relatively recently in a pathbreaking paper in 1978 by Bhattacharya and Ghosh that rigour was provided in a wide range of cases Expansions in more general cases contin ued 1 Bhattacharya and Ghosh dealt with statistics which can be represented as a smooth func tion A of a vector mean X that is with AG where Abs 9x guhu or Abs 9x guhx g and h are smooth functions from Rd to IR hm gt O and X 71 1 1 X1 is the mean of the first n of independent and identically distributed random d vectors X1X2 with mean M We make these assumptions below Let t 2 251 tdT denote a d vector and let Xt Eexpz tTX be the characteristic function of the d vector X distributed as Xj The two different versions of A13 above allow us to treat non Studehtised and Studen tised cases respectively Expansions in more general cases contin ued 2 Let 02 gt 0 denote the asymptotic variance of Theorem Essentially Bhattacharya amp Ghosh 1978 Assume the function A has j 2 continuous derivatives in a neighbourhood of u and that EXHle2ltagt HhSUDtxllt1 ll t gtoo Then PSn as lt1gtx n 12R1az n R mnm x uniformly in 13 where Rka Pk13 gb13 and Pk is a polynomial of degree 3 1 with op posite parity to its index and with coefficients depending on moments of X up to order k2 and on derivatives of A evaluated at u up to the k 2nd Expansions in more general cases contin ued 3 Note 1 here is a scalar not a d vector and gb is the univariate standard Normal density For a proof see Bhattacharya RN amp Ghosh JK 1978 On the validity of the formal Edgeworth ex pansion Ann Statist 6 434 451 The polynomials Rj are identified by develop ing a Taylor approximation to Sn of the form Sn QM u Opn ltj1gt2 where Qn is a polynomial of degree j1 Here we use the fact that AMX M AM X LOTKM X MTAMX M AU Expansions in more general cases contin ued 4 Since Qn is a polynomial and X is a sample mean then the cumulants of the distribution of QnX u can be written down fairly easily and hence a formal expansion of the distribu tion of QM u can be developed up to j terms PQnX M g 33 Clgt13 71 12 R1a njQ Rm om jQ The functions Rj appearing here are exactly those appearing in the Taylor expansion of the distribution of Sn Studentised and non Studentised cases Expansions in Studentised and non Student ised cases have different polynomials For ex ample in the case of the Studentised mean P1 x H3 2x2 1 P2x mmks 1 18 51344 25132 3 x2 3 We know already that in the non Studentised case 131 H32 1 P2a x2 hm4x2 3 7 12H x4 10x215 Cornish Fisher expansions We have shown how to develop Edgeworth expansions of the distribution of a statistic Sn PSnx dgtxn 12R1x nj2 Rja 0nj2 This is an expansion of a probability for a given value of a quantile 13 Defining ga to be the solution of PSn aaa for a given fixed value of or 6 01 we may invert the expansion to express 5a as a se ries expansion 04 Za 71 12 P1CfZa n jQ me cam W where za 2 CID 1a denotes the a level quan tile of the standard Normal distribution and Pf PQCf etc are polynomials Cornish Fisher expansions continued Noting that Rj PjCD for polynomials P it may be proved that Pff P1 Pym P1a Pies az Plow P2x etc Exercise Prove these formulae METHODOLOGY AND THEORY FOR THE BOOTSTRAP Sixth set of two lectures Main topic of these lectures Bootstrap methods in linear regression Regression model Assume we observe pairs x1Y1xnYn generated by the model YEZQWDI Ez 1 where g is a function that might be determined either parametrically or nonparametrically and the errors ei have zero mean Regression model cont In the study of regression we take the explana tory variables x to be fixed either because they are pre determined eg were regularly spaced or are conditioned upon In this case the only source of randomness in the model is the errors 6 and so it is those that we resample in form of residuals when implementing the bootstrap Our choice of lower case notation for the explanatory vari ables reflects this view Correlation model Alternatively we might take the view that the explanatory variables are genuinely random and must be treated us such For example the data pairs XYZ for 1 g 139 g n might be drawn by sampling randomly from a bivariate distribution and in our analysis of those data we might wish to preserve all the implications of this randomness rather than condition some of it away by regarding the Xi s as fixed This approach to analysis might be termed the study of correlation rather than regression It would be addressed in bootstrap terms by re sampling the pairs XYZ rather than resam pling the residuals It is important to appreciate that these two dif ferent approaches to resampling sampling the residuals or sampling the pairs XYZ re spectively are appropriate in different set tings for different models They are not alter native ways of doing the same thing and can lead to different conclusions Parametric regression The good properties of percentile t methods carry over to regression problems However in the setting of slope estimation those properties are significantly enhanced and even the stan dard percentile method can perform unusually well For example one sided percentile t confidence regions for slope have coverage error 0n32 not 001 1 and the error is only 001 2 in the case of two sided intervals One sided standard percentile method confi dence intervals for slope based on approxi mating the distribution of 0 by the con ditional distribution of have coverage error 001 1 rather than the usual 0n12 General definition Of slope Although these exceptional coverage proper ties apply only to estimates of slope not to estimates of intercept parameters or means slope may be interpreted very generally For example in the polynomial regression model where we observe mid1 for 1 g 139 g n we re gard each dj as a slope parameter A one sided percentile t interval for dj has coverage error 0n32 although a one sided percentile t in terval for c or for EYxx0c I xod1 I x6ndm has coverage error of size 71 Why is slope favoured especially The reason for good performance in the case of slope parameters is the extra symmetry con ferred by design points Note that in the polynomial regression case we may write the model equivalently as YC M dlx n mdm where gjznlzixg39 and c cg1d1 mdm The extra symmetry arises from the fact that n Z 33 7 0 i1 forlgjgm Correlation model The results implying good coverage accuracy hold under the regression model but not nec essarily for the correlation model For example in the case of the linear correlation model the symmetry discussed above will persist provided that EZX eefo 1 i1 for sufficiently large k Now 6 Y gXgt and as a result 1 will generally not hold for k 2 1 This means that under the cor relation model the conventional properties of bootstrap confidence intervals hold the special properties noted for regression when estimat ing slope parameters are not valid However if the errors 6 are independent of the explanatory variables X then 1 will hold for each k and in such cases the enhanced fea tures of the regression problem persist under the correlation model Simple linear regression Consider the regression model Y C 331 d 61 where the ei39S are independent and identically distributed with zero mean and finite variance 0 2 Define a n 1 2 5131 3 A 1 n dzax2 Zx xei e i1 and E Y icl Estimate yo EY13 5130 2 0 mod by o5xod7 and put as 102xo i2 at Yg Y xi i and 62 2 71 12135 the latter estimating 02 The asymptotic variances of c and g0 equal 020109 and 020571 respectively and so n12 3 00336 and n12g0 yOaay are asymptotically pivotal Bootstrapping the simple linear regression model The residuals YE Y i3 are centred in that Z a 0 Therefore we may resample randomly with replacement from 1 n obtaining e ie say and take as our bootstrap resample the pairs x1Y1 mmef where Yquot Exicie Note particularly that reflecting the fact that we condition upon the explanatory variables those quantities are the same as in the original dataset In regression problems where the residuals are not centred for example in nonparametric re gression we generally centre them for exam ple by subtracting their average value before resampling Estimating quantiles Of distribution of 6 Let 5 a and 8 have the same formulae as E d and a respectively except that we replace Y by Y throughout The bootstrap versions of S n12ci daxa T n12J dax6 are 3 n12ci ciax8 Tgtllt n12d 0xa respectively We estimate the quantiles 5a and 77a of the dis tributions of S and T by Ea and 77a respectively where and X xlY1xnYn denotes the dataset 10 Bootstrap confidence intervals fOI d One sided bootstrap confidence intervals for d with nominal coverage or are given by in 0073 71 12 UUx 1 agt 7 112 007 J 71 12 1 05 7 71 0073 71 12 aUx 1 ozgt These are direct analogues of the intervals ill in and J1 introduced earlier in non regression problems In particular 12 and J1 are standard percentile method and percentile t bootstrap confidence regions Following the line of argument given earlier we would expect T11 and f1 to have coverage error 001 1 In fact they both have coverage error equal to 0n32 However T11 is not of practical use since it depends on the unknown 0 so we shall not treat it any further Likewise we would expect in to have cov erage error of size 71 12 However we shall show that the error is actually of order 71 11 Bootstrap confidence intervals for d cont Note that although 12 involves the variance estimator 6 it can be constructed numerically without resorting to computing 6 Indeed 12 is identical to the interval 112 003 731a 7 where aka is the standard percentile method estimator of w1a the latter defined by PJ d w1a 1 04 In particular who is defined by PJ J u71aXl oz The interval fl is a standard percentile t boot strap confidence interval 12 Polynomials in Edgeworth expansions Edgeworth expansions for the non Studentised and Studentised statistics 8 and T respec tively are given by P8 u ltDu n 12P1u gtu n 1P2u gtu PTu un 12Q1u u n 1Q2u u where P1U Q1U 77x 1 U2 7 Eea3 vac 71 12er iax3y and P2 and Q2 are odd quintic polynomials with P2u Q2uw23242 H gtltw2 3 and H Eea4 3 Note particularly that P1 Q1 13 Why does P1 equal Q1 To understand why it is helpful to treat 8 as an approximation to T Indeed note that by definition of 82 l A 32 612 II S H H 6 g as 4 C d2 M3 S H H 1 n 2 02 263 02opn 1 n i1 Therefore defining n A n la2 Z 62 02 2 21 and recalling that S n12J daxa T n12J dax6 we deduce that T 31 A 0pn 1 14 Why does P1 equal Q1 cont 1 Making use of the approximation TS1 AOpn 1 1 the symmetry property n Zxi i0 2 1 21 and the representation n S n12a101 Z i 61 2 21 it is readily proved that E8 1 AW E09 001 1 3 for j 123 Exercise Derive 3 for j 123 15 Why does P1 equal Q1 cont 2 Therefore the first three cumulants of S and 81 A agree up to and including terms of order 71 12 It follows that Edgeworth expansions of the distributions of S and 81 A differ only in terms of order n l In view of 1 the same is true of the distributions of S and T PS g u PT g 111 0n 1 Therefore the 71 12 terms in the expansions must be identical that is P1 Q1 The chief ingredient in this argument is the symmetry property 2 It and its analogues guarantee that in the problem of slope estima tion for general regression problems P1 Q1 16 Consequences of the property P1 Q1 The identity P1 Q1 implies that to first or der ie up to and including terms of order 71 12 estimating the distribution of S is the same as estimating the distribution of T As we saw earlier in non regression problems the percentile method estimates the distribu tion of 8 whereas the percentile t method es timates the distribution of T The fact that in the setting of estimating slope in regression P1 Q1 means that these two techniques give the same results up to and including terms of order 71 12 They differ only in terms of order 71 1 and terms of higher order Therefore since one sided confidence intervals based on the percentile t method have cover age error equal to 001 1 the same must be true for confidence intervals based on the per centile method 17 Properties of percentile t confidence re gions The coverage error of a one sided percentile t confidence interval for d is of order 71 32 rather than the usual n l That is with J1 defined by j Z 007 J 71 12 771 05 7 it can be shown that Pd 6 J1 a 0n 32 4 Since terms of odd order in 71 32 cancel from formulae for coverage error of two sided confi dence intervals then the two sided percentile t bootstrap confidence interval for d has cov erage error of order 71 2 rather than the usual 71 1 18 Derivation of 4 A proof of 4 can be given as follows Recall that PT u lt1gtu 71 12 Q1u u n 1Q2u u where Q1um1 u2 7 Eea3 and m n 1 El 2 iax3 19 Derivation of 4 cont 1 The bootstrap version of the Taylor expansion is PT UIX lt1gtu n 121u Mu n 1Q2u u where 100 AY YU 1 u2 and 2 E amp3 Now the solutions 77a and 77a of the respective equations admit the Cornish Fisher expansions 77a Zoz 71 12 QEfCZa 71 1 62510305 7 7 za 71 12 mm n 1 9920 20 Derivation of 4 cont 2 Cornish Fisher expansions 77a Zoz 71 12 QEfCZa 71 1 62510305 7 a za 71 12 QEfCZa n 1 29920 On subtracting these expansions we deduce that w n12Ef2a QEfZa n 15fltza ngm n 12Q1ltza Q12a 02901 3 where we have used the fact that Q Q1 Q 1 and C2 Q 0pn12 21 Derivation of 4 cont 3 From previous pages 1u Q1u W 7 7x 1 1L2 5 ag Wax 71 12 Q1Zoz Q1Zoz Opn32 6 It may be prOVed by Taylor expansion that 3 n12i iE 3i i J d3 71 1 61 E i J d232 7 12U0pn1 7 where n U 27712 Z 523 v 76 1 35i 1 21 and 6i 610 Combining results 5 7 we deduce that a na 2 art 1va 1 zi Open 3 22 Derivation of 4 cont 4 From the previous page 771 05 771 05 n 1 CU OPn 32 where c 7x 1 za Therefore Pd 6 fl P d lt J 71 12 6ax 1a PCT gt 771 05 PT I nlcU gt 771a 0pltn 32 P T 71 1 CU gt 771agt On 32 assuming we can treat the Opn 32 inside the probability as though it were deterministic and take it outside 23 Derivation of 4 cont 5 From previous page Pd 6 fl PT I n1 CU gt 771a On32 8 It can be proved that for any choice of the constant c the first four moments and hence also the first four cumulants of T I n1cU are identical to those of T up to and including terms of order 71 32 Hence recalling the way in which moments influence Edgeworth expan Sions PT n 1 cU gt mm PT gt 7710 001 3 2 or 001 32 Therefore by 8 Pd 6 fl or 0n 32 as had to be proved 24 The other percentile method interval Recall that the percentile method interval in is based on bootstrapping 3 d that is it is based on approximating the distribution of this quantity by the conditional distribution of 3 d The other percentile method is based on us ing the conditional distribution of a to approx imate the distribution of 3 It leads to the in terval R1 ooci nil2 aax 7 ooia where Q is an approximation to Ca these two quantities being defined by PltJEaixa Macaw However R has coverage error of size 71 12 not 71 In particularly R does not enjoy the accuracy of the percentile method interval 12 This is a consequence of it addressing the wrong tail of the distribution of J 25 Properties of confidence intervals for the conditional mean yo EY13 x0 and the intercept 6 Recall that yo c I xo d which in turn equals 0 when x0 0 Therefore we can treat confi dence intervals for c as a special case of those for yo Recalling that go 2 5 x0 cl redefine s n go gigMy T 7112 Q0 yoa 0y and taking g 5 x0 3 redefine n12g5 Q05 ay 7112 g5 goaay 3 T 26 METHODOLOGY AND THEORY FOR THE BOOTSTRAP Fifth set of two lectures Main topic of these lectures Completion of work on confidence intervals and sur vey of miscellaneous topics Revision of confidence intervals Recall that a is the a Ievel quantile of the bootstrap distribution of T n12 6 PT alxa A one sided percentile t confidence interval for an unknown parameter 0 based on the bootstrap estimator g and having nominal cov erage 04 is therefore J1 J1a oo n 126 1a Revision continued It has coverage error 001 1 P9 e j1oz or 0n 1 A conventional two sided interval for which the nominal coverage is also or is obtained from two one sided intervals i201 J11ail1 a l 125 1a2 71 12 Wow2 Unsurprisingly the actual coverage of jg also equals or 0n 1 P9 e 301 or 0n 1 Percentile method intervals Interestingly however this result extends to the case of two sided percentile intervals the one sided versions of which have coverage ac curacy only 0n12 Recall that one form of percentile method confidence interval for 6 is f12a oo n 128E1a where Ea is the a level critical point of the bootstrap distribution of sa Psa s Ea l 2c a The corresponding two sided interval is B201 112 1 04 f12 1 04 A 1 2 A A Z l9 n 1a27 91 quot 1255lt1 agt2gt coverage Of two sided percentile intervals To calculate the coverage of f22a recall that 139 E fi2a 04 71 12 P1Zoz Q1ltZ0z 0305 001 1 Since P1 and Q1 are even polynomials and Z1a2 z1a2 then P1Z1a2 Q1Z1Oz2 P1Z1a2 Q1Z1 a2 W herefore 139 E f 2a P 9 E 12 P 9 E 12 1 1 oz 1 O 0nl 04 001 1 Coverage of two sided percentile intervals continued Therefore owing to the parity properties of polynomials in Edgeworth expansions this two sided percentile confidence interval has cover age error On 1 The same result holds true for the other type of percentile confidence interval of which the one sided form is 21a oo n 1Qa a Its one and two sided forms have coverage madam a0n12 Maegan or I On1 Exercise 1 Derive the latter property 2 Show that when computing percentile confidence intervals as distinct from percen tile t intervals we do not actually need the value of a It has been included for didactic reasons to clarify our presentation of theory but it cancels in numerical calculations Discussion Therefore the arguments in favour of percen tile t methods are less powerful when applied to two sided confidence intervals However the asymmetry of percentile intervals will usu ally not accurately reflect that of the statistic and in this sense they are less appropriate This is especially true in the case of the in tervals f the other percentile method There when has a markedly asymmetric distribution the lengths of the two sides of a two sided interval based on R will reflect the exact opposite of the tailweights Other bootstrap confidence intervals It is possible to correct bootstrap confidence intervals for skewness without Studentising The best known examples of this type are the accelerated bias corrected intervals pro posed by Bradley Efron based on explicit cor rections for skewness It is also possible to construct bootstrap con fidence intervals that are optimised for length for a given level of coverage The coverage accuracy of bootstrap confi dence intervals can be reduced by using the iterated bootstrap to estimate coverage error and then adjust for it Each application gen erally reduces coverage error by a factor of 71 12 in the one sided case and 71 1 in the two sided case Usually however only one application is computationally feasible Other bootstrap confidence intervals cont Although the percentile t approach has obvi ous advantages these may not be realised in practice in the case of small samples This is because bootstrapping the Studentised ratio involves simulating the ratio of two random variables and unless sample size is sufficiently large to ensure reasonably low variability of the denominator in this expression poor cov erage accuracy can result Note too that percentile t confidence intervals are not transformation invariant whereas in tervals based on the percentile method are From some viewpoints particularly that of good coverage performance in a very wide range of settings an analogue of robust ness the most satisfactory approach is the coverage corrected form using the iterated bootstrap of first type of percentile method interval ie of in and 32 in one and two sided cases respectively Bootstrap methods fOI time series There are two basic approaches in the time series case applicable with or without a struc tural model respectively We shall say that we have a structural model for a time series X1 Xn if there is a con tinuous deterministic method for generating the series from a sequence of independent and identically distributed disturbances 6162 The method should depend on a finite number of unknown but estimable parame ters Moreover it should be possible to es timate all but a bounded number of the dis turbances from n consecutive observations of the time series Bootstrap fOI time series With structural model We call the model structural because the pa rameters describe only the structure of the way in which the disturbances drive the pro cess In particular no assumptions are made about the disturbances apart from standard moment conditions In this sense the setting is nonparametric rather than parametric The best known examples of structural mod els are those related to linear time series for example the moving average 19 Xj M 9 j 1 11 or an autoregression such as p Xj MZ Z wz Xj z 1 M ja i1 where u 619p w1wp and perhaps also p are parameters that have to be es timated Bootstrap for time series with structural model continued 1 In this setting the usual bootstrap approach to inference is as follows 1 Estimate the parameters of the structural model eg u and w1wp in the autore gression example and compute the residuals ie estimates of the ej39S using standard methods for time series 2 Generate the estimated time series in which true parameter values are replaced by their estimates and the disturbances are re sampled from among the estimated ones ob taining a bootstrapped time series X X for example in the autoregressive case p X X1 m e Z Bootstrap for time series with structural model continued 1 3 Conduct inference in the standard way using the resample XX thus obtained For example to construct a percentile t con fidence interval for u in the autoregressive ex ample let 32 be a conventional time series estimator of the variance of 71121 computed from the data X1 Xn let if and 62 de note the versions of pi and 32 computed from the resampled data XX and construct the percentile t interval based on using the bootstrap distribution of as an approximation to the distribution of T 711207 loa Bootstrap for time series with structural model continued 2 All the standard properties we have already noted founded on Edgeworth expansions ap ply without change provided the time series is sufficiently short range dependent Early work on theory in the structural time series case includes that of Bose A 1988 Edgeworth correction by bootstrap in autoregressions Ann Statist 16 1709 1722 Bootstrap for time series with structural model continued 3 It is common in this setting not to be able to estimate n disturbances ej based on a time series of length n For example in the context of autoregressions we can generally estimate no more than n p of the distur bances But this does not hinder application of the method we merely resample from a set of n p rather than n values of E7 Usually it is assumed that the disturbances have zero mean We reflect this property em pirically by centring the Ej s at their sample mean before resampling Bootstrap fOI time series WithOUt StI UC tural model In some cases for example where highly non linear filters have been applied during the pro cess of recording data it is not possible or not convenient to work with a structural model There is a variety of bootstrap methods for conducting inference in this setting based on block or sampling window methods We shall discuss only the block bootstrap ap proach BIOCK bootstrap fOI time series Just as in the case of a structural time series the block bootstrap aims to construct simu lated versions of the time series which can then be used for inference in a conventional way The method involves sampling blocks of con secutive values of the time series say XI1 X1b where 0 g I g n b is chosen in some random way and placing them one af ter the other in an attempt to reproduce the series Here b denotes block length Assume we can generated blocks X1j1 X1jb for j 2 1 ad infinitum in this way Cre ate a new time series XX identical to X1117 7X11b7 X1217 7X12b7 The resample X X isjust the first n val ues in this sequence Block bootstrap for time series contin ued 1 There is a range of methods for choosing the blocks One the fixed block approach in volves dividing the series X1 Xn up into m blocks of b consecutive data assuming n bm and choosing the resampled blocks at random In this case the Ij s are indepen dent and uniformly distributed on the values 1b 1m 1b 1 The blocks in the fixed block bootstrap do not overlap Another the moving blocks technique al lows block overlap to occur Here the Ij s are independent and uniformly distributed on the values O1n b Block bootstrap for time series contin ued 2 In this way the block bootstrap attempts to preserve exactly within each block the de pendence structure of the original time se ries X1Xn However dependence is cor rupted at the places where blocks join Therefore we expect optimal block length to increase with strength of dependence of the time series Techniques have been suggested for match ing blocks more effectively at their ends for example by using a Markovian model for the time series This is sometimes referred to as the matched block bootstrap Difficulties with the blOCk bootstrap The main problem with the block bootstrap is that the block length b which is a form of smoothing parameter needs to be chosen Using too small a value of b will corrupt the dependence structure increasing the bias of the bootstrap method and choosing b too large will give a method which has relatively high variance and consequent inaccuracy Another difficulty is that the percentile t ap proach cannot be applied in the usual way with the block bootstrap if it is to enjoy high levels of accuracy This is because the corrup tion of dependence at places where adjacent blocks joins significantly affects the relation ship between the numerator and the denomi nator in the Studentised ratio with the result that the block bootstrap does not effectively capture skewness However there are ways of removing this problem Successes Of the DIOCK bootstrap Nevertheless the block bootstrap and related methods give good performance in a range of problems where no other techniques work effectively for example inference for certain sorts of nonlinear time series The block bootstrap also has been shown to work effectively with spatial data There the blocks are sometimes referred to as tiles and either of the fixed block or moving block methods can be used References for DIOCK bootstrap Carlstein E 1986 The use of subseries values for estimating the variance of a gen eral statistic from a stationary sequence Ann Statist 14 1171 1179 Hall P 1985 Resampling a coverage pat tern Stochastic Process Appl 20 231 246 KUnsch H R 1989 The jackknife and the bootstrap for general stationary observations Ann Statist 17 1217 1241 Politis DN Romano JP Wolf M 1999 Subsampling Springer New York Bootstrap in non regular cases There is a meta theorem which states that the standard bootstrap which involves con structing a resample that is of approximately the same size as the original sample works in the sense of consistently estimating the lim iting distribution of a statistic if and only if that statistic s distribution is asymptotically Normal It does not seem possible to formulate this as a general rigorously provable result but it nevertheless appears to be true The result underpins our discussion of boot strap confidence regions which has focused on the case where the statistic is asymptoti cally Normal Therefore rather than take up the issue of whether the bootstrap estimate of the statistic s distribution is asymptotically Normal we have addressed the problem of the size of coverage error Example of non regular cases Perhaps the simplest example where this ap proach fails is that of approximating the dis tributions of extreme values To appreciate why there is difficulty consider the problem of approximating the joint distribution of the two largest values of a sample X1Xn from a continuous distribution The probabil ity that the two largest values in a resample Xi X drawn by sampling with replace ment from the sample both equal maxX is 1 n l 1 1 n l 1 1 n 1 gt1 2e1 TL TL TL aSn gtOO The fact that the probability does not con verge to zero makes it clear that the joint dis tribution of the two largest values in the boot strap sample cannot consistently estimate the joint distribution of the two largest data The m Out Of n bootstrap The most commonly used approach to over coming this difficulty in the extreme value ex ample and many other cases is the m out of n bootstrap Here rather than draw a sam ple of size n we draw a sample of size m lt n and compute the distribution approximation in that case Provided mmn gtoo and mn gtO the m out of n bootstrap gives consistent es timation in most probably all settings For example this approach can be used to consistently approximate the distribution of the mean of a sample drawn from a very heavy tailed distribution for example one in the do main of attraction of a non Normal stable law The m out of n bootstrap continued The main difficulty with the m out of n bootstrap is choosing the value of m Like block length in the case of the block boot strap m is a smoothing parameter large m gives low variance but high bias and small m has the opposite effect In most problems where we would wish to apply the m out of n bootstrap it proves to be quite sensitive to selection of m A secondary difficulty is that the accuracy of m out of n bootstrap approximations is not always good even if m is chosen optimally For example when the m out of n bootstrap is applied to distribution approximation prob lems the error is often of order m12 which since mn gt O is an order of magnitude worse than 71 12 Conclusion Nevertheless there is very substantial theoret icaI evidence that the bootstrap works quite well in a particularly wide range of statisti cal problems and theoretical and empirical evidence that it performs very well indeed in some settings It is currently the only viable method for solv ing some problems where asymptotic approx imations are either not available or are poor Certain extreme value problems are of this type For these reasons the bootstrap is a vital com ponent of contemporary statistical methodol ogy

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.