### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# CALC ANYL GEOM III MATH 126

UW

GPA 3.76

### View Full Document

## 35

## 0

## Popular in Course

## Popular in Mathematics (M)

This 34 page Class Notes was uploaded by Addison Beer on Wednesday September 9, 2015. The Class Notes belongs to MATH 126 at University of Washington taught by Staff in Fall. Since its upload, it has received 35 views. For similar materials see /class/192096/math-126-university-of-washington in Mathematics (M) at University of Washington.

## Reviews for CALC ANYL GEOM III

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/09/15

Lectures at Champ ry 3eme cycle March 36 2002 Empirical Processes in Statistics Methods Examples Further Problems by Jon A Wellner Outline L1 Examples and Empirical Process Basics Basic notation and Some History 12 Some Examples 13 Clivenko Cantelli and Donsker Theorems 14 Preservation theorems Glivenko Cantelli and Donsker 15 Bounds on Covering Numbers and Bracketing Numbers 16 Convex Hulls and VC hull classes 17 Some useful inequalities H H L2 Empirical Process Methods for statistics 21 The argmax or argmin continuous mapping theorem M estimators 22 M estimates rates of Convergence 23 M estimates convergence in distribution 24 Z estimators 25 Back to the examples L3 Extensions and Further Problems 1 Examples and Empirical Process Basics 11 Basic Notation and History Empirical process theory began in the 1930 s and 1940 s with the study of the empirical distribution function IF and the corresponding empirical process If X17 7Xn are iid real valued random variables with distribution funtion F and corresponding probability measure P on R7 then the empirical distribution function is 1 n g 2 6 R7 i1 and the corresponding empirical process is ZWE WORM 7 F7 Two of the basic results concerning IF and Zn are the Glivenko Cantelli theorem and the Donsker theorem Theorem 1 Glivenko Cantelli7 1933 Han 7 FHOQ sup 7 7M 0 7ooltmltoo Theorem 2 Donsker7 1952 Zn Z E UF in DR7 where U is a standard Brownian bridge process on 07 1 Thus U is a zero mean Gaussian process with covariance function ElUslUt s A t 7 st7 375 6 01 This means that we have E9Zn H 1992 for any bounded7 continuous function g DR7 7 R7 and 9Zn rd 9 for any continuous function g DR7 7 R Remark In the statement of Donsker s theorem 1 have ignored measurability difficulties related to the fact that DR7 is a nonseparable Banach space I will continue to ignore these difficulties throughout these lecture notes For a complete treatment of the necessary weak convergence theory7 see VAN DER VAART AND WELLNER 19967 part 1 m Convergence The stars as superscripts on P s and functions refer to outer measures in the rst case and minimal measureable envelopes in the second case I recommend ignoring the s on a rst reading The need for generalizations of Theorems 1 and 2 became apparent in the 1950 s and 1960 s In particular it became apparent that when the observations are in a more general sample space X such as Rd or a Riemannian manifold or some space of functions or then the empirical distribution function is not as natural It becomes much more natural to consider the empirical measure l indexed by some class of subsets C of the sample space X or more generally yet lan indexed by some class of real valued functions f de ned on Suppose now that X1 Xn are iid P on X Then the empirical measure l is de ned by 1 V L l Z Z 6Xi i1 thus for any Borel set A C R 71 WA g 1m i1 n For a real valued function f on X we write 1 V L PM rm gzgfltXil 1 If C is a collection of subsets of X then lP nC C E C is the empirical measure indexed by C If f is a collection of real valued functions de ned on X then MUM f E f is the empirical measure indexed by f The empirical process Gn is de ned by CW V710an 7 P thus GnC C E C is the empirical process indexed by C while Gnf f E f is the empirical process indexed by g Of course the case of sets is a special case of indexing by functions by taking f 10 C E Note that the classical empirical distribution function for real valued random variables can be viewed as the special case of the general theory for which X R C 7ooz z E R or f 1Oom z E R Two central questions for the general theory are i For what classes of sets C or functions f does a natural generalization of the Glivenko Cantelli Theorem 1 hold ii For what classes of sets C or functions f does a natural generalization of the Donsker Theorem 2 hold If f is a class of functions for which 1an 7 PHI sup WV 7 FUN 7m 0 fef then we say that f is a PiGlivenko Cantelli class of functions If f is a class of functions for which Gn71an7PG in KKK7 where G is a mean zero PiBrownian bridge process with uniformly continuous sample paths with respect to the semi metric ppfg de ned by pig 79 VarpfX 79X7 then we say that f is a PiDonsker class of functions Here we z f H Bl Mr sup mm lt oo fef and G is a PiBrownian bridge process on f if it is a mean zero Gaussian process with covariance function EGfG9 Pfg 7 PfP9 Answers to these questions began to emerge during the 1970 s7 especially in the work of Vapnik and Chervonenkis VAPNIK AND CHERVONENKIS 19717 and Dudley DUDLEY 19787 with notable contributions by many others in the late 1970 s and early 1980 s including Pollard7 Gine and Zinn7 and Gaenssler We will give statements of some of our favorite generalizations of Theorems 1 and 2 later in this lecture Our main focus in these lectures will be on applications of these results to problems in statistics Thus our rst goal is to brie y present several examples in which the usefullness of the generality of the modern set up becomes apparent 12 Some Examples Example 1 LI deviations about the sample mean Let X7 X17 X27 7Xn be iid P on R and let l denote the empirical measure of the Xi s Let Y n l ELI Xi and7 for p 2 1 consider the Lp deviations about 7 1 Mp g2 lXi 7 Xl PaniXuP 1 i Questions i Does Anp 7p ElX 7 EXlp E ap ii Does 77Anp 7 ap 7d N0 V2p7 And what is V2p7 As will become clear to answer question we will proceed by showing that the class of functions 95 E z 7 lz 7 tl lt 7 M g 6 is a P7Glivenko Cantelli class and to answer question ii we will show that 95 is a P7Donsker class Example 1p Lp7deViations about the sample mean considered as a process in p Suppose we want to study Anp as a stochastic process indexed by p 6 11 for some 0 lt a g 1 g b lt 00 Can we prove that sup MAP 7 apl Has 0 aSpr Can we prove that VHAW 7 a a A in Dab as a process in p 6 11 This will require study of the empirical measure lan and empirical process Gn indexed by the class of functions f6ftp ltilul da pgb where ftpz 7th form Rt Rpgt0 Example 1d p7th power of Lq deViations about the sample mean Let X X1X2 Xn be iid P on Rd and let lan denote the empirical measure of the Xi s Let 7 n l ELI Xi and for p q 2 1 consider the deViations about 7 measured in the Lq7metric on Rd 1 An 761 7 g HXz39 7 X115 PnllX 7 MW 1 where llmllq 7 19511 WW Questions 1 Does 1MP 712 EHX 7 19001 q E 109761 ii Does xWAnmq 7 109761 7d N07 V2p7q7 And what is V2097 q Example 2 Least Lp7estimates of location Now suppose that we want to consider the measure of location corresponding to mimimum Lp deViation ling E argmint EDan 7 tip for 1 g p lt 00 Of course n2 7 while n1 any median of X1Xn The asymptotic behaVior of np is well known for p 1 or p 2 but for p 31 12 it is perhaps not so well known Consistency and asymptotic normality for any xed p can be treated as a special case of the argmax or argmin continuous mapping theorem 7 which we will introduce as an important tool in chapterlecture 2 The analysis in this case will again depend on various Glivenko Cantelli7 Donsker properties of the class of functions f ftx t6 R with ftx as 7 25 Example 2p Least Lp estimates of location as a process in p What can be said about the estimators np considered as a process in 197 say for 1 g p g b for some nite I Probably b 2 would usually give the range of interest Example 2d Least p th power of Lq7 deviation estimates oflocation in Rd Now supppose that X17 7Xn are iid P in Rd Suppose that we want to consider the measure of location corresponding to mimimum Lq deviation raised to the p7th power npq E argmint lP nHX 7 25115 for 1 pq lt 00 Example 3 Projection pursuit Suppose that X17X27Xn are iid P on Rd For 256 R and 39y e S i l let Fn ED7117ltgtltgtt39 X WW X S 75 the empirical distribution of 39y X17 y X Let F P17oot X PM X S 75 Question Under what condition on d dn a 00 as n a 00 do we have Dn E sup sup BMW 7 Ft7vl Hp 07 11 teR 765d71 According to DIACONIS AND FREEDMAN 19847 pages 794 and 8127 this shows that under the condition for which 11 holds the least normal projection is close to normal Theorem 11 of DIACONIS AND FREEDMAN 1984 shows that for non random vectors 951 795 in Rd and P N UniformSd 17 then the empirical distribution oflizl7 Pzn converges weakly to N0702 if 1 g n 7 02d gt ed 7gt 07 71 and 1 ij n gt ed 70 for every 6 gt O Example 4 Kernel density estimators as a process indexed by bandwith Let X1 X2 Xn be iid P on Rd Suppose P has density p with respect to Lebesgue measure on Rd and HpHoo lt 00 Let k be a non negative kernel which integrates to one deydy kydy 1 Then a kernel density estimator of p is given by 13nyh lenz h lenk This estimator is naturally indexed by the bandwidth h and it is natural to consider n as a process indexed by both z 6 Rd and h gt 0 Questions i Does 13nm hn converge to pz pointwise or in L for some choice of hn a 0 ii How should we choose hn a 0 Can we let hn depend on x and or X1 Xn Here the class of functions f involved is fzegtkltygtyeizdhgto 12 Example 5 Interval censoring in R and R2 Suppose that X1 Xn are iid with distribution function F on Ff 0 00 and Y1 Yn are iid with distribution function G and independent of the Xi s Unfortunately we are only able to observe 1Xigyi E A1332 i 1 n but our goal is to estimate the distribution function F In this model the conditional distribution of A given Y is BernoulliFY and hence the density of A Y with respect to the dominating measure 1 given by the product of counting measure on 01 and G is pF5y Fy61 Fy1 57 5 6 01 y 6 13 It turns out that the maximum likelihood estimator V L Fn argmaxF 2 Ai log 1 7 Ailog17 i1 is well de ned and is given by the left derivative of the greatest convex minorant of the cumulative sum diagram Pn1YgY7PnA1Y le 3 yelp771 where YO YW are the order statistics of the Yi s Questions A i Can we show that Fn is a consistent estimator of F A ii What are the global and local rates of convergence of Fn to F Example 6 Machine learning Koltchinskii and Smale See KOLTCHINSKII AND PANCHENKO 2002 Example 7 Pro le likelihood and semiparametric models two phase sampling 13 Glivenko Cantelli and Donsker Theorems Our statements of Glivenko Cantelli theorems will be phrased in terms of bracketing numbers and covering numbers for a class f of functions 1 from X to R The covering number Ne7 f is the minimal number of balls 9 Hg 7 fH lt e of radius 6 needed to cover f The centers of the balls need not belong to f but they should have nite norms Given two functions 1 and u the bracket Lu is the set of all functions f satisfying 1 g f g u An eibracket is a bracket Lu with 71H lt e The bracketing number Ne7 f is the minimum number of eibrackets needed to cover f The entropy with bracketing is the logarithm of the bracketing number Again the upper and lower bounds u and l of the brackets need not belong to f themselves but are assumed to have nite norms A related notion is that of packing numbers Call a collection of points 67separated if the distance between each pair of points is strictly larger than 6 The packing number De7 d is the maximum number of 67separated points It is easily shown that Ned Ded Ne2d Here is another bit of notation we will use frequently if f is a class of functions from X to R7 then the envelope function F of the class is FM sup lfMl HfMHr fef With this preparation we are ready to state several useful Glivenko Cantelli and Donsker theorems Theorem 131 Blum DeHardt lf NefL1P lt 00 for every 6 gt 07 then f is PiGlivenko Cantelli Theorem 132 Vapnik Chervonenkis Pollard Let f be a suitably measurable class of real valued functions on X satisfying supQ NEHFHQ17f7L1Q lt 00 for every 6 gt 0 If PF lt 007 then f is PiGlivenko Cantelli An important weakening of the main condition in Theorem 132 is given in the following version of the Glivenko Cantelli theorem For a class of functions f with envelope function F and a positive number M7 let the truncated class fM flungI f E f Theorem 133 Vapnik Chervonenkis Gine and Zinn Suppose that f is L1P bounded and nearly linearly supremum measurable for P in particular this holds if f is image admissible Suslin Then the following are equivalent A f is a P7 Glivenko Cantelli class B f has an envelope function F E L1P and the truncated classes fM satisfy lE klogNe 117LTan a 0 for all e gt 0 and for all M E 07 00 774 for some all r 6 0001 where MW Hflla Pmer Now we turn to Donsker theorems The rst order of business is the following theorem characterizing the Donsker property Theorem 133 Let f be a class of measurable functions Then the following are equivalent i f is PiDonsker ii f pp is totally bounded and Gn is asymptotically equicontinuous in probability with respect to pp for every 6 gt 0 limlim supP sup lGnU 7 Gngl gt e 0 5 7H pPf9lt6 iii fnop is totally bounded and Gn is asymptotically equicontinuous in mean with respect to pp 11111 EV sup lGnU Gn9l 0 a pPf9lt5n for every sequence 6 a 0 Proof See VAN DER VAART AND WELLNER 1996 pages 113 115 D Typically the way that the Donsker property is veri ed is by showing that either ii or iii holds But it is important for many applications to remember that the Donsker property always implies that ii and iii hold Note that iii implies ii via Markov s inequality7 but the fact that ii implies iii involves the use of symmetrization and the Hoffmann Jorgensen inequality and the fact that implies that the class f has a centered envelope function F satisfying the weak Lgicondition any PiDonsker class f satis es 0m 2 as maoo Theorem 134 Ossiander Suppose that f is a class of measurable functions satisfying 1 AlogNHefL2Pde lt oo 0 Theorem 135 Pollard Suppose that f is a suitably measurable class of real valued functions on X satisfying Then f is PiDonsker 1 logstCinNeHFHQ277L2Qd6ltoo 0 If P EF2 lt 007 then f is PiDonsker The following theorem is a very useful consequence of Theorem 135 Theorem 136 Jain Marcus Let T7 1 be a compact metric space7 and let CT be the space of continuous real functions on T with supremum norm Let X17 7X be iid random variables in CT Suppose that EX1t 0 and EX12t lt 00 for all t E T Furthermore7 supppose that for a random variable M with EMZ lt 007 lX1t 7 X18l S Mdt7s as for all 2573 E T Suppose that 1 logNe7 T7 1 d6 lt oo 0 Then the CLT holds in CT 14 Preservation theorems Glivenko Cantelli and Donsker As we will see in treating the examples7 it is very useful to have results which show how the Glivenko Cantelli property or the Donsker property of a class of functions are preserved Here we give statements of several useful preservation theorems7 beginning with a Glivenko Cantelli preservation theorem proved by VAN DER VAART AND WELLNER 2000 Given classes f17 7 of functions 2 X a R7 and a function 4p Bk a R7 let let 4pf17 7fk be the class of functions x H 4pf1m7 7 where f 1 17 7fk ranges over f1 x X fk Theorem 161 Van der Vaart and Wellner Suppose that f17 7fk are PiGlivenko Cantelli classes of functions7 and that 4p Bk a R is continuous Then H E 4pf17 7fk is PiGlivenko Cantelli provided that it has an integrable envelope function Proof See VAN DER VAART AND WELLNER 20007 pages 117 120 D Now we state a corresponding preservation theorem for Donsker classes Theorem 162 Van der Vaart and Wellner Suppose that f17 757 are Donsker classes with lt 00 for each 2 Suppose that 4p Bk a R satis es k MUM M900 S ZONE 7 91002 11 for every 1 79 6 f1 x x fk and m Then the class 4pf17 7fk is Donsker provided that ltpf17 7 fk is square integrable for at least one 1 17 7 fk Proof See VAN DER VAART AND WELLNER 19967 pages 192 198 D 15 Bounds on Covering Numbers and Bracketing Numbers For a collection of subsets C of a set X and points m1 mn E X Agz1zn E C z1zn 0 ea so that A zl zn is the number of subsets of 1 mn picked out by the collection C Also we de ne mcn E max A z1 Han 1 512n Let we 2 mm mCn lt 2 where the in mum over the empty set is taken to be in nity Thus VC 00 if and only if C shatters sets of arbitrarily large size A collection C is called a VG class if VC lt 00 Lemma 151 VC Sauer Shelah For a VG class of sets with VC index VC set S E SC E VC E 1 Then for n 2 S mCn g 13 Proof For the rst inequality see Van der Vaart and Wellner 1996 pages 135 136 To see the second inequality note that with Y Binomialn 12 S S E 2 Z 12 2 PY g S j0 J j0 J 2nErY S for any r g 1 1 2nTES TES1 Tn S S 1Z bychoosing rSn 5657 and hence 13 holds D l A Theorem 152 There is a universal constant K such that for any probability measure Q any VC class of sets C and r 2 1 and 0 lt e g 1 6gt0 14 67 e T VC71 7Vc715 NltecLltQgtgt 5 11 here K 3525 7 1 z 129008 works Moreover NeCLTQ VC4eVltCgt WWI 15 where f is universal The inequality 14 is due to Dudley 1978 the inequality 15 is due to Haussler 1995 Here we will re prove 14 but not 15 For the proof of 15 see Haussler 1995 or van der Vaart and Wellner 1996 pages 136 140 Proof Fix 0 lt e g 1 Let m De C L1Q the L1Q packing number for the collection C Thus there exist sets 01 Cm E C which satisfy QC ACJ39 EQllci 710 gt E for 7275 Let X1 Xn be iid Q Now C and 07 pick out the same subset of X1 Xn if and only if no Xk E CiACj If every CiACj contains some Xk then all Ci s pick out different subsets and C picks out at least m subsets from X1 Xn Thus we compute QXk E CiACj for some k for all 27 jlc QXk CiACj for all k g n for some i 7 Zoqu 0AC for all k g 71 ilt7 3 mil 7 Mamagt1 l A l A l A 1 7 e lt 1 for 71 large enough 16 1 91 In particular this holds i n gt 710g 5 Iogltmltm 71gt2gt Ioglt17 e 7loglt17 e Since 7log17 e lt e 16 holds if n Blogmej for this n QXk E CiACj for some k g n for all 27 gt 0 Hence there exist points X1w Xnw such that m A2ltX1ltwgtXnltwgtgt max Afm1mn 1mn en 5 3 g 17 12 where S E SC E VC E1 by the VC Sauer Shelah lemma With 71 3 log mej7 17 implies that S lt lt3510gmgt m 7 Se Equivalently7 77115 35 logm 7 7 or7 with gz E mlog m 3 gm1S i 18 E This implies that 35 3e 15 lt 5 1 19 m 7 5E1 6 0g 6 7 or S e 3e 3e lt Beam m51 E Iog6 110 Since Nec L1Q Dec L1Q 14 holds for r 1with K 3525 71 Here is the argument for 18 implies 19 note that the inequality 1 990 7 Q 7 implies e lt l z 7 6711 Ogy To see this7 note that gz z logz is minimized by z e and is T Furthermore y 2 gz for z 2 5 implies that logy 2 logz E loglogz logz lt1 E W gt logz lt1 E logz e x ylogm lt ylogy1E1e71 For LTQ with r gt 17 note that 1110 1DHL1Q QUAD ch 1DllTLTQ7 so that K S NltecLltc2gtgt E NltedcL1ltc2gt log This completes the proof D De nition 153 The subgraph of f X x R is the subset of X x R given by zt E X x R t lt A collection of functions f from X to R is called a VG subgraph class if the collection of subgraphs in X x R is a VG class of sets For a VG subgraph class let Vf E Vsubgraphf Theorem 154 For a VC subgraph class with envelope function F and r 2 1 and for any probability measure Q with HFHLMQ gt O 1 7Vf1 Nlt2eHFHQfLltc2gtgt Kvltfgtlt16egtVltfgt E for a universal constant K and 0 lt e g 1 Proof Let C be the set of all subgraphs Cf of functions f E f By Fubini s theorem Qlf igl Q X CfACg where is Lebesgue measure on R Renormalize Q x to be a probability measure on z t ltl by de ning P Q x A2QF Then by the result for sets 45 V 71 NltE2QltFgt f L1Q NltecL1ltPgtgt KW E For T gt 1 note that Qlf 79V S Qlf 9l2FT 1 2Hle ingU Wl for the probability measure R with density FT lQFT 1 with respect to Q Thus the LTQ distance is bounded by the distance 2QFT 117Hf 7 Elementary manipulations yield 85 Vf71 67 NE2HFHQT7LTQ g NETRF7L1R g KVflt by the inequality 15 D 16 Convex Hulls and VC hull classes De nition 161 The convex hull convf of a class of functions f is de ned as the set of functions 04 with 04 g 1 oz 2 0 and each 1 E f The symmetric convex hull denoted by sconvf of a class of functions f is de ned as the set of functions 04 with lail 1 and each 1 E f A set of measurable functions f is a VG hull class if it is contained in the pointwise sequential closure of the symmetric convex hull of a VG class of functions f C WQ for a VC class g 14 Theorem 162 Dudley7 Ball and Pajor Let Q be a probability mesaure on 2674 and let f be a class of measurable functions with measurable square integrable envelope F such that CQF2 lt 00 and E V NlteHFHQ2 L2ltQgtgt c 1 0 lt 61 Then there is a K depending on C and V only such that 1 2VV2 logNlteHFHQ2 convltfgtL2ltc2gtgt K 6 Note that 2VV 2 lt 2 for V lt 00 Dudley 1987 proved that for any 6 gt 0 1 2VV26 logNeHFHQ27 convfL2Q K Proof See Ball and Pajor 1990 or van der Vaart and Wellner 19967 142 145 See also Carl 1997 5 Example 163 Monotone functions on R For f 1tooz t E R7 f is VC7 so by Theorem 27W1th F E 17 V 27 Ne fL2Q K52 0 lt e g 1 Now QE9RHl01l 9Cmf Hence by Theorem 162 K logNegL2C2 7 0lte 1 E In this case there is a similar bound on the bracketing numbers iogNHegLQ 0 lt e 1 111 for every probability measure Q every r 2 17 where the constant K depends only on r see VAN DER VAART AND WELLNER 19967 Theorem 2757 page 159 Example 164 Distribution functions on Rd For f 1tooz t E Rd7 f is V0 with Vf d 1 By Theorem 2 with F E 17 Ne fL2Q K5 0 lt e g 1 gEgRdHO1lgisadf on RdCWf 15 Hence by Theorem 162 log Ne gL2Q KER 1 0 lt e g 1 ln particular7 for d 27 iogNeg L2Q K64 0 lt e g 1 17 Some Useful Inequalities Bounds on Expectations general classes f Exponential Bounds for bounded classes f One of the classical types of results for empirical processes are exponential bounds for the supremum distance between the empirical distribution and the true distribution function A Empirical df X R Suppose that we consider the classical empirical df of real valued random variables Thus f 1Oot t E R Then Dvoretzky7 Kiefer7 and Wolfowitz 1956 showed that Power 7 Fm 2 A 0explte2vgt for all n 2 17 2 0 where C is an absolute constant Massart 1990 shows that C 2 works7 con rming a long standing conjecture of Z W Birnbaum Method reduce to the uniform empirical process Um start with the exact distribution of B Empirical df X Rd Now consider the classical empirical df of iid random vectors Thus f 1Oot t 6 Rd Then Kiefer 1961 showed that for every 6 gt 0 there exists a C6 such that PTFH77Fn Flloo Z S Ceexp2 6W foralln21andgt0 C Empirical measure X general f 1c C E C satisfying 6ng f L1Q eg when C is a VC class7 V VC 7 1 Then Talagrand 1994 proved that mewmuc 2 g 2 DKAZ V V exp722 foralln21andgt0 D Empirical measure X general f f f X a 01 satisfying K V supNlte7 L2ltc2gtgt 7 Q e39g39 When f is a VC39ClaSS7 V 2Vf 7 1 Then Talagrand 1994 showed that Wow 7 PM 2 A yexwm xV foralln21andgt0 Kiefer7s tool to prove B If Y17 Yn are iid Bernoullip7 and p lt 5 1 then P7717n2912 2exp110g1p7 1W S 2exp7112 if plte 12 Talagrand7s tool to prove C and D If f is as in D all the 1 s have range in 017 if 0 E supfef Pf 7 Pf2 supfef VarpfX 03 and if Ko n J77 then PrH7zlP n 7 mm 2 Dexp7112 for every 2 Ko n Where Inn E EWGHV7 n Ian V n lZ 2 Empirical Process Methods for Statistics 21 The argmax or argmin continuous mapping theorem Mestimators Suppose that 9 is a parameter with values in a metric space 9 1 Frequently we de ne estimators in statistical applications in terms of optimization problems given observations X1Xn our estimator a of a parameter 9 E 9 is that value of 9 maximizing or minimizing MnW gide an9X i1 We say that such an estimator a is an M estimator The estimators in examples 122 and 126 were of this type Of course Maximum Likelihood estimators are simply M estimators with m9m lng9x Here is a typical theorem giving consistency of a sequence of M estimators Theorem 211 Let Mn be random functions of 9 E 9 and let M be a xed function of 9 such that sup MW 7 M W 9p 0 966 and for every 6 gt 0 sup M09 lt M090 9d99025 Then for any sequence of estimators a satisfying 2 ano 7 op1 it follows that a Hp 90 Note that for iid Xi s the rst hypothesis in the previous theorem boils down to a Glivenko Cantelli theorem for the class of functions f m9 9 E 9 while the second hypothesis involves no randomness but simply the properties of the limit function M at its point of maximum 00 The dif culty in applying this theorem often resides in the fact that the supremum is taken over all 9 E 9 22 Mestimates rates of convergence Once consistency of an estimator sequence a has been established then interest turns to the rate at which 2 converges to the true value for what sequences Tn 00 does it hold that A MW 7 00 Op1 The following development is aimed at answering this question If 00 is a maximizing point of a differentiable function M09 then the rst derivative M 0 must vanish at 00 and the second derivative should be negative de nite Hence it is natural to assume that for 9 in a neighborhood of 00 M09 7 M00 701 00 21 18 for some positive constant C The main point of the following theorem is that an upper bound for the rate of convergence of a can be obtained from the continuity modulus of the process Mn07M0 for estimators a that maximize or nearly maximize the functions Theorem 221 Rate of convergence Let Mn be stochastic processes indexed by a semi metric space 9 and let M 9 gt gt R be a deterministic function such that 21 holds for every 9 in a neighborhood of 00 Suppose that for every n and suf ciently small 6 the centered process Mn 7 M satis es E sup Mn 7 M0 7 Mn 7 M00l K M l 22 d990lt6 x for a constant K and functions n such that gtn66D is a decreasing function of 6 for some 04 lt 2 not dependent of Let Tn satisfy 2 1 Tnsz T V5 for every n V L If the sequence a satis es 2 ano 7 01067 and converges in outer probability to 00 then rnd0n00 O1 Proof See VAN DER VAART AND WELLNER 19967 pages 290 291 D The following corollary concerning the iid case is especially useful Corollary 222 In the iid case7 suppose that for every 9 in a neighborhood of 00 Pm9 7 77190 7Cd2000 Also assume that there exists a function 1 such that gt66D is decreasing for some 04 lt 2 and7 for every n EllGnllMa S K gt5 for some constant K where M5 m9 7 77790 10 90 lt If the sequence 9 satis es angn 2 aneo 7 01067 and converges in outer probability to 00 then ma a 00 O1 for every sequence 7 such that TEL gt1rn 71 for every n In dealing with Nonparametric Maximum Likelihood Estimators over convex classes of densities7 it is often useful to change reexpress the de ning inequalities in terms functions other than logpg Suppose that 73 is a convex family In the following we will take the density p itself to be the parameter The following development is a special case of Section 341 of VAN DER VAART AND WELLNER 1996 19 lf n maximizes the log likelihood over 19 E 73 then Pn log n Z Pn logpo for any xed po 6 73 Thus we have lP nlog 19 2 0 P0 and hence by concavity of log7 A 1 A lP nlog 2 lan lt logp n log 1 2190 2 P0 1an logamp 2 P0 3 0 1p 10g M 2190 for all po 6 73 Thus we can take 1990 p0gt m m lo 23 plt gt g 2W lt gt for any xed po 6 73 Here is a useful theorem connecting maximum likelihood with the Hellinger distance metric between densities Theorem 223 Let h denote the Hellinger distance7 and let mp be given by 23 with p0 corresponding to P0 Then P0mz7 mp0 h2P7P0 for every p here a g b means a Kb for some nite constant K Furthermore7 for M6 mp7 mp0 71997190 lt 67 it follows that J 67371 EEOHGnHvta J57P7h 1 24 71 where 6 J6Phj 41logNe73hde Theorem 223 follows from Theorem 3447 page 3277 of VAN DER VAART AND WELLNER 1996 by taking the sieve 7 73 and p p0 throughout 20 23 Mestimates convergence in distribution Here is a result which follows from the general argmax continuous mapping theorem it is from VAN DER VAART 1998 Theorem 5237 page 53 Theorem 231 For each 9 in an open subset of Rd suppose that x H m9z is a measurable function such that 0 H m9z is differentiable at 00 for PHalmost every x with derivative mom and such that7 for every 01 02 in a neighborhood of 00 and a measurable function m with sz lt oo lm917 MAW S mlt gtl01 7 02l Moreover7 suppose that 0 H ng admits a second order Taylor expansion at a point of maximum 00 with nonsingular symmetric derivative matrix V90 lf lP nmgn 2 supe anme H 01014 and a HP 00 then 1 71 A 7 7 71 I 09 00 V90 W i 1 meo X1 0171 24 Zestimators When 9 C Rd the maximizing value a is often found by differentiating the function an with respect to the coordinates of 0 and setting the resulting vector of derivatives equal to zero This results in the equations MM an9X 0 where mm Vm9m for each xed z E X Since this way of de ning estimators often makes sense even when the functions me are replaced by a function 1A9 which is not necessarily the gradient of a function m9 we will actually consider estimators a de ned simply as the solution of we ZMX WAX 0 25 i1 Here is one possible result concerning the consistency of estimators satisfying 25 Theorem 241 Suppose that 11 are random vector valued functions7 and let 1 be a xed vector valued function of 9 such that suplli nW 7 WM 7p 0 966 and7 for every 6 gt 07 f 1 0 0 110 9d19119020ll Hgt H oH Then any sequence of estimators t2 satisfying op1 converges in probability to 90 Proof This follows from Theorem 211 by taking an and M0H I 0H D We now give a statement of the in nitedimensional Zitheorem of VAN DER VAART 1995 See also VAN DER VAART AND WELLNER 19967 section 337 pages 309 320 It is a natural extension of the classical Zitheorem due to HUBER 1967 and POLLARD 1985 In the in nitedimensional setting7 the parameter space 9 is taken to be a Banach space A sufficiently general Banach space is the space l H E z H a R 1le sup 20 lt 00 heH where H is a collection of functions We suppose that IITL9HLEZ OH7 7112 77 are random7 and that 1 HLEZOOH7 is deterministic Suppose that either 1174 0 in L ie n nh 0 for all h e H or ma opwlZ in L ie lli n nllHl 0pn 12 Here are the four basic conditions needed for the in nite dimensional version of Huber s theorem B1 W n 1 00 20 in l H B2 su W 01 1197901136 1 ueioou p for every sequence 6 a 0 B3 The function 1 is F rechet differentiable at 00 with derivative 1090 E 110 having a bounded continuous inverse W09 090 i 009 i 00W 0H0 t90H 22 B4 114 0271 12 in l H and 100 0 in l H Theorem 242 VAN DER VAART1995 Suppose that 31 B4 hold Let 2 be random maps into 9 C 1 0H satisfying 0 7p 00 Then lt n 7 00 a 451 in l H Proof See VAN DER VAART 1995 or VAN DER VAART AND WELLNER 19967 page 310 D 25 Back to the Examples Example 1217 continued To answer the rst question7 we will assume that Elep lt 00 We need to show that the class of functions 95017th lt7Ml 6 is a Glivenko Cantelli class for P We can View this class as follows 96 MB gtft 3ft 6 f6 where My lyl is a continuous function from R to R and f5m7t lt7plg6 Now E is a VC subgraph collection of functions with VC index 2 since the subgraphs are linearly ordered by inclusion and P7integrable envelope function F5x lz 7 ul 6 It follows by the VC Pollard Gine Zinn theorem 132 that f5 is a P7Glivenko Cantelli class of functions Since 1 is a continuous function and 95 has P7integrable envelope function G5m lz 7 In 7 6MP V lz 7 u 6MP 95 is a P7Glivenko Cantelli class by the Glivenko Cantelli preservation theorem of VAN DER VAART AND WELLNER 2000 Thus with Elna lP an7tl and Ht PlX7tl 7 it follows that sup lHnt7Htl HPn7PHga Has 0 26 le Anp 7 ap E In H H M 7 H0 23 By the strong law of large numbers we know that 1 7 M g 6 for all n 2 N6w for all an in a set with probability one Hence it follows that for 71 large we have M 7 113147 7 Hm g t sup mm 7Htl m 0 7436 by 26 Furthermore lUnl 1HYn 7 HMl S PHX 7 M 71X 7 Mlpl 7w 0 by the dominated convergence theorem since 7 Hwy 0 and Elep lt 00 Thus the answer to our rst question is positive if Elep lt 007 then Anp AMY ap To answer the second question7 we rst note that 95 is a PiDonsker class of functions for each 6 gt 0 if we now assume in addition that EleZP lt 00 This follows from the fact that f5 is a VC subgraph class of functions with Pisquare integrable envelope function F57 and then applying the PiDonsker preservation theorem Theorem 21067 VAN DER VAART AND WELLNER 19967 page 192 and Corollary 210137 page 193 upon noting that My lyl satis es WQE t 7 90 i 8W 1190 i 751 190 Slplz S L290lt 812 for all 572 E 117 642 6 and all z E R where Mm sup 2919577561le7 75lp 1Vplm7HWH tiltwlg satis es PLZ X fL2mdPz lt 00 Note that the PiDonsker property of the class 95 also follows from the Jain Marcus CLT 136 Hence it follows that x77Anp 7 109 x77Hn7n 7 HM x77Pnf75 7 PM opnffn 7 me 7 WltPnh 7 P114 1 OPnIp 7 P114 1 0310 7 P114 GWUYTL 7 GWUM Gan WORK 7 HM GAIYTL 7 Gm Gan H MX 7 M 0121 Gnm H MX 7 M 0121 if H is differentiable at u The last equality in the last display follows since the class 95 is PiDonsker7 and hence for large n with high probability we have7 for some sequence 6 a 07 lGnUm 7 Gm S sup lGnUt 7 Gm sup WW 7 Ml 7p 0 lt7MlS6n 117436 24 Thus it follows that x AnQa 7 109 Gm H MX 7 M 0221 w GUCM H MX 7 M N N07 V209 where WWWMMMHMM7W7WMW7WHMM7W When P is symmetric about u then H u 0 and the expression for the variance simpli es to Ewemsewmemw It is easily seen that H is indeed differentiable at u if Pu O and H W PPlX 7 Mlp 11xgt71xgtu Example 121d continued One difference now is that the class of functions B 95775 llt7 llq 5 is no longer real valued There are several ways to proceed here but one way is as follows consider the classes of functions 55907 901397751393llti llq ll These are clearly again VC subgraph classes of functions since their subgraphs are again ordered by inclusion Moreover these classes each have an integrable envelope functions hieM761 V mew61 Thus each ofthese classes E i 1 d is PiGlivenko Cantelli Since the map 4p from Rd de ned by 4py1yd y ygPq is continuous and the resulting class 4M ltpf1fd has an integrable evelope F assuming that PHXlllg lt 00 Thus it follows from the Glivenko Cantelli preservation Theorem 161 that ad is a PiGlivenko Cantelli class Example 122 continued Our treatment of this example will use the argmax continuous mapping theorem in the form of VAN DER VAART 1998 Theorem 523 page 53 In that theorem we will take m9z lz 7 011 Then MW P190 7 01p711zg9 71mm Thus Mp argminePlX 7 01 imp aminean 7 0115 25 and7 V pp 71PlX MPH p gt1 W 2N p 1 Since the function m9m satis es lth msl S M9017 Sl where Pm2X lt 00 as we saw in Example 1217 it follows from Theorem 231 that 1 71 MW e Mp 7 Z mama 0171 w M0 Pltmipgtvpgt i1 Note that when p 27 M2 PX7 the usual sample mean7 mm 2195 011z91zgt9 490 i 0 so Pmi2X 4VarpX7 VMZ 27 and we recover the usual asymptotic normality result for the sample mean Example 1237 continued First note that the sets in question in this example are half spaces Hm z 6 Rd 39y x g 25 Note that D sup sup 1PnltHmgtePltHmgtl HIPVPHH teR ygsdil The key to answering the question raised in this example is one of the exponential bounds from section 16 applied to the collection H HAN t E Ry E Sd l7 the half spaces in Rd The collection H is a VC collection of sets with VH d 2 By Talagrand s exponential loound7 DKAZ 11 72 11 ex D Prowl 7 PgtHH 2 g for all n 2 1 and gt 0 Taking a yields D DKEZn 1 Pr llpn 7 PHH Z 6 mlt d 1 gt exp7262n D DKEZn 2 7 877 exp d 1loglt d1 exp72e n gt 0 as n a 00 if 171 a 0 This is exactly the result obtained by DIACONIS AND FREEDMAN 1984 by using an inequality of Vapnik and Chervonenkis Question What happens if 177 a c gt 0 Good values for the constants D and K start matteringl 26 Example 124 continued It is fairly easy to give conditions on the kernel k so that the class f de ned in 12 satis es NEfL1Q 27 V NEfL1Q 28 for some constants K and V see eg Lemma 22 page 797 NOLAN AND POLLARD 1987 For example if kt pltl for a function p RJr a RJr of bounded variation then 27 holds As usual it is natural to write the difference 13ny h 7 py as the sum of a random term and a deterministic term 237424771 7 1924 13742471 My h My h 7 My py h Irde k y 2 3 pd is a smoothed version of p Convergence to zero of the second term can be argued based on smoothness assumptions on p ifp is uniformly continuous then it is easily seen that where sup sup My h pyl H 0 thn yERd for any sequence 1 a 0 On the other hand the rst term is just h dlP7Pltk 29 While it follows immediately from 27 and Theorem 132 or 28 and Theorem 131 that PVP k l a 0 this does not suf ce in view of the factor of h d in 29 In fact we need a rate of convergence for 7 X Pn P k Has 0 The following theorem is due to NOLAN AND MARRON 1989 with preparatory work in POLLARD 1987 see also POLLARD 1995 sup hgt0yeRd sup hZ bngty Rd Proposition 251 Marron and Nolan Pollard Suppose that i na logn a 00 ii suphgt0y Rd Irde 2 K1 lt 00 ii The kernel k is bounded iv Either 27 or 28 holds Then sup 123421707 Jody hl Has 0 210 anghgbnyERd If we relax to nag A 007 then 210 continues to hold with AM replaced by HP 0 The following corollary of Proposition 251 allows the bandwith parameter h to depend on n m and the data X17 7X Corollary Suppose that p is a uniformly continuous bounded density on Rd Suppose that hn hny is a random bandwidth parameter satisfying an lt hny bn eventually as for all z where bn a 0 Suppose that the conditions of Proposition 251 hold Then sup pyl Has 0 yERd Proof of the Proposition Set 7 for m y 6 Rd and h gt 07 so that ffyhef yeRdhgt0 and let fn flJ E f h 2 an Suppose we can show that llpnf 7 2 PT su gt A6 lt BN 6 ex 70776 211 kg 7 Pnf Pf T y p y supQ Ne7 f7L1Q or NM7 f L1 Then by taking 39y a2 it would follow that the probability of the event AME on the left side of 211 is arbitrarily small for n sufficiently large if we assume that nag a 00 Then we have7 on 1426 for every 6 gt 0 and n gt 1 for constants A7 B7 and C and where Ne is either mm 7 PM Adam Pm a2 for all h 2 an and all y E Rd7 and this implies that l nty 71 My W S 146937414771 My 71 A6 for all h 2 an and all y 6 Rd This in turn yields 6 1 e 14 229 71 mm 7m h S 14 My 71 E for all h 2 an and all y 6 Rd In View of the hypothesis ii we nd that E E A 2K ltA he hlt 16lt 1pny7 My gt717E 14 2K17 28 and this yields the convergence in probability conclusion The almost sure part of the Proposition follows similarly by taking 39y ai logn and applying the Borel Cantelli lemma Thus it remains only to prove 211 These results are connected to the nice results for convergence in L1 of DEVROYE 1983 DEVROYE 1987 and GINE MASON AND ZAITSEv 2001 The latter paper treats the L1 distance between 13n hn k and p as a process indexed by the kernel function k The results for this example also have many connections in the current literature on nonparametric estimation via multi scale analysis see eg DUEMBGEN AND SPOKOINY 2001 CHAUDHURI AND MARRON 2000 and WALTHER 2001 Example 125 continued The Hellinger distance hPQ between two probability measures P and Q on a measurable space X A is given by WRQ We 1sz where p dPdp and q dQdp for any common dominating measure 0 eg P Q The following inequalities are key tools in dealing with consistency and rates of convergence of the MLE Fn in this problem The rst inequality is valid generally for maximum likelihood estimation p1 h2PEPFO 11 7P 7 gt1mgtmgt 212 pFo The second inequality is valid for the MLE in an arbitrary convex family 73 p1 712 PF R e P w 213 PFO where ltpt t 7 1t 1 For proofs of these inequalities see VAN DE GEER 1993 VAN DE GEER 1996 or VAN DER VAART AND WELLNER 2000 Now the right side of 213 is bounded by llan 7 PHH where H 4p ppppo F a distribution function on 13 Thus ifH is a PiGlivenko Cantelli class Hellinger consistency Ofp n follows To show that H is indeed a PiGlivenko Cantelli class we appeal rst to the convex hull result and then to the Glivenko Cantelli preservation theorem twice as follows First the collection of functions pp F E f is a Glivenko Cantelli class of functions since the functions F and 17F are both universal Glivenko Cantelli classes in view of the bound on uniform entropy for convex hulls given by Theorem 162 and the corollary given by Example 163 The one xed function pFO is trivially a Glivenko Cantelli class since it is uniformly bounded and 1pFO is also a P0 Glivenko Cantelli class since P01pFO lt 00 Thus by the Glivenko Cantelli preservation Theorem 141 with the function 4puv uv f1 1pp0 and f2 pp F E f it follows that the collection g pppFO F E f is PoiGlivenko Cantelli with the PO integrable envelope function 1pp0 Finally yet another application 29 of the Glivenko Cantelli preservation Theorem 141 with the function 4pt t71t 1 which is continuous and uniformly bounded in absolute value by 1 on t 2 0 and the class 9 shows that the class H is indeed PoiGlivenko Cantelli Thus it follows that h2P nPFO at 0 Since dTVP nPFO x212P1ERP1O7 we nd that dTVP n7 PFO Has 0 But it easy to compute that dTVP nPFO 21171 7 FoldG and hence the upshot is that n is consistent for F0 in L1G For generalizations of this argument to mixed case interval censoring77 and to higher dimensions7 see VAN DER VAART AND WELLNER 20007 Section 4 To answer the question about the rate of global convergence in this problem7 we will use Theorems 221 and 223 We take the metric din Theorem 221 to be the Hellinger metric h on 73 pp F E f Now the functions pp17 y and 19107 y 17Fy are both monotone and bounded by 17 and so arepy21y Fy12 and pig2w y 17Fy127 and hence it follows from 111 that the class of functions 7312 pip2 F E f satis es7 for u G x where is counting measure on 017 K 10gN5773127L2M S 7 or7 equivalently7 K logNH6737 h g E This yields 5 5 twig1px 41logNHe73hde j 1Kede 3 612 Hence the right side of 24 is bounded by a constant times 5512 1 5121 n T SR77 632 39 Now by Theorem 231 or its Corollary 232 the rate of convergence Tn satis es TEL gt1rn 71 but with m 7113 we have 1 nlZ TEL nZSnilG 1 2n12 gt f Tn 71 Hence it follows from Theorem 221 that n13hp n7pp 01 30 3 Extensions and Further Problems 31 Extensions The basic theory presented in Lecture 1 has already been extended and improved in several directions including A Results for random entropies See GINE AND ZINN 1984 GINE AND ZINN 1986 and LEDOUX AND TALAGRAND 1989 B Dependent data For some of the many results in this direction see ANDREWS AND POLLARD 1994 and DOUKHAN MASSART AND Rio 1995 C U processes See NOLAN AND POLLARD 1987 NOLAN AND POLLARD 1988 and DE LA PENA V H AND GINE E 1999 D Better inequalities Via isoperimetric methods see TALAGRAND 1996 MASSART 2000 and MASSART 2000 32 Further Problems Problem 1 Calculate VC dimension for classes A U B see DUDLEY 1999 section 45 V0 dimensions for the VC classes of STENGLE AND YUKICH 1989 and LASKOWSKI 1992 Problem 2 Bracketing number bounds for distribution functions on Rd Problem 2M Bracketing number bounds for Gaussian mixtures on Rd generalizing the results of Ghosal and van der Vaart 2001 for d 1 Problem 3 Preservation theorems for a class of transforming functions 4 t E T Glivenko Cantelli Donsker Preservation theorems for f0 g f E f g E 9 Problem 4 Better bounds for convex hulls in particular cases Lower bounds for entropies of convex hulls Preservation of bracketing numbers for convex hulls Problem 5 Better methods for convergence rates Problem 6 Better bounds and convergence theorems for ratios perhaps improving on the bound in the proof of Proposition 251 Acknowledgements Much of the material presented here has resulted from discussions and joint work with A W van der Vaart I also wish to thank Evarist Gine for many helpful discussions References Andrews D W K and Pollard D 1994 An introduction to functional central limit theorems for dependent stochastic processes International Statistical Review 62 119 BALL K AND PAJOR A 1990 The entropy of convex bodies with few extreme points Geometry of Banach spaces Proceedings of the conference held in Strobl Austria 1989 eds PFX Muller and W Schachermayer London Mathematical Society Lecture Note Series 158 25 32 Carl B 1997 Metric entropy of convex hulls in Hilbert space Bull London Math Soc 29 452458 Chaudhuri P and Marron J S 2000 Scale space view of curve estimation Ann Statist 28 408428 de la Pena V H and Gine E I From 1 to 1 Springer Verlag New York 1999 Devroye L 1983 The equivalence of weak strong and complete convergence in L1 for kernel density estimates Ann Statist 11 896 904 Devroye L 1987 A Course in Density Estimation Birkhauser Boston Diaconis P and Freedman D 1984 Asymptotics of grahical projection pursuit Ann Statist 12 793 815 Doukhan P Massart P and Rio E 1995 lnvariance principles for absolutely regular empirical processes Ann Inst H Poincare Probab Statist 31 393 427 Dudley R M 1978 Central limit theorems for empirical measures Ann Probab 6 Dudley R M 1984 A course on empirical processes Ecole d Ete de Probabilites de Saint Flour XII 1982 Lecture Notes in Mathematics 1097 2 141 P L Hennequin ed Springer Verlag New York Dudley R M 1987 Universal Donsker classes and metric entropy Ann Probability 15 1306 1326 Dudley R M 1999 Uniform Central Limit Theorems Cambridge Univ Press Cambridge Duembgen L and Spokoiny V G 2001 Multiscale testing of qualitative hypotheses Ann Statist 29 124 152 Ghosal S and van der Vaart A W 2001 Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities Ann Statist 29 1233 1263 Cine E Mason D M and Zaitsev A Yu 2001 The Llinorm density estimator process Preprint Gine7 E and Zinn7 J 1984 Some limit theorems for empirical processes Ann Probab 127 929989 Gine7 E and Zinn7 J 1986 Lectures on the central limit theorem for empirical processes Lecture Notes in Mathematics 12217 50 113 Springer Verlag7 Berlin Haussler7 D1995 Sphere packing numbers for subsets of the Boolean nicube with bounded Vapnik Chervonenkis dimension J Comb Theory A 697 217 232 Huber7 P J 1967 The behavior of maximum likelihood estimates under nonstandard conditions Proc Fifth Berkeley Symp Math Statist Prob 17 221 233 Univ California Press Koltchinskii7 V and Panchenko7 D 2002 Empirical margin distributions and bounding the generalization error of combined classi ers Ann Statist 307 to appear Laskowski7 M C 1992 Vapnik Chervonenkis classes of de nable sets J London Math Soc 457 377 384 Ledoux7 M and Talagrand7 M 1989 Comparison theorems7 random geometry and some limit theorems for empirical processes Ann Probab 177 596 631 Massart7 P 2000a Some applications of concentration inequalities to statistics Probability theory Ann Fac Sci Toulouse Math 97 245 303 Massart7 P 2000b About the constants in Talagrand s concentration inequalities for empirical processes Ann Probab 287 863 884 Nolan7 D and Marron7 J S 1989 Uniform consistency of automatic and location adaptive delta sequence estimators Probab Theory and Related Fields 807 619 632 Nolan7 D and Pollard7 D 1987 U processes rates of convergence Ann Statist 157 780 799 Nolan7 D and Pollard7 D 1988 Functional limit theorems for U processes Ann Probab 167 1291 1298 Pollard7 D 1985 New ways to prove central limit theorems Econometric Theory 17 295 314 Pollard7 D 1987 Rates of uniform almost sure convergence for empirical processes indexed by unbounded classes of functions Preprint Pollard7 D 1990 Empirical Processes Theory and Applications NSF CBMS Regional Conference Series in Probability and Statistics 27 Institute of Mathematical Statistics Pollard7 D 1995 Uniform ratio limit theorems for empirical processes Scand J Statist 227 271 278 Stengle7 G7 and Yukich7 J E 1989 Some new Vapnik Chervonenkis classes Ann Statist 177 14411446 Talagrand M 1996 New concentration inequalities in product spaces Invent Math 126 505 563 Van de Geer S 1993 Hellinger consistency of certain nonparametric maximum likelihood estimators Ann Statist 21 14 44 Van de Geer S 1996 Rates of convergenced for the maximum likelihood estimator in mixture models Nonparametric Statistics 6 293 310 Van der Vaart A W 1995 Ef ciency of in nite dimensional M estimators Statistica Neerl 49 9 30 Van der Vaart A W 1998 Asymptotic Statistics Cambridge University Press Cambridge Van der Vaart A W and Wellner J A 1996 Weak Convergence and Empirical Processes Springer Verlag New York Van der Vaart A W and Wellner J A 2000 Preservation theorems for Clivenko Cantelli and uniform Glivenko Cantelli classes pp 115 134 In High Dimensional Probability II Evarist Cine David Mason and Jon A Wellner editors Birkhauser Boston Van der Vaart A W 2000 Semiparametric Statistics Lectures on Probability Theory Ecole d Ete de Probabilites de St Flour XX 1999 P Bernard Ed Springer Berlin To appear Vapnik V N and Chervonenkis A Ya 1968 On the uniform convergence of relative frequencies of events to their probabilities Theory of Probability and Its Applications 16 264 280 Walther G 2001 Multiscale maximum likelihood analysis of a semiparametric model with applications Ann Statist 29 1297 1319

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I made $350 in just two days after posting my first study guide."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.