Mathematical Probabilty MATH 6040
Popular in Course
Popular in Mathematics (M)
This 52 page Class Notes was uploaded by Miss Noel Mertz on Monday October 26, 2015. The Class Notes belongs to MATH 6040 at University of Utah taught by Staff in Fall. Since its upload, it has received 25 views. For similar materials see /class/229931/math-6040-university-of-utah in Mathematics (M) at University of Utah.
Reviews for Mathematical Probabilty
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/26/15
Chapter 5 Independence 1 Introduction Our reviewdevelopment of measure theory is finally complete and we begin studying probability theory in earnest In this chapter we introduce the all important notion of independence and use it to prove a precise formulation of the so called law of large numbers In rough terms the latter states that the sample average of a large random sample is close to the population average As such the law of large numbers shows a statistical means by which to estimate various parameters of interest We will also see more subtle and yet equally f 39 39 of that are routinely used in other scientific disciplines Throughout let 9 3 33 denote a probability space 2 Random Variables and Distributions Given a random variable X 9 gt R we can define a set function on R R as follows 2P 0 X m PX e E W e R 5 1 The notation is ugly but motivated by the fact that X E E is another way to write X 1E so that 2P 0X 1E ZPX 1E Lemma 51 33 o X 1 is a probability measure on RR De nition 52 The measure 33 o X 1 is called the distribution of the random variable X C Proof of Lemma 51 The proof is straightforward 33 oX 1Qi 0 and 33 o X 1 is countably additive on RR since 33 is countably additive on 93 and since X is a function E The above lemma tells us that to each random variable on the abstract probability space 93233 we can associate a real probability space R R 33 o X The converse is also true Lemma 53 If 11 is a probability measure on R R then there erists a random variable X whose distribution is 11 41 Proof Define F to be the distribution function of 11 i e Fc 11 00C Vcc E R 5 2 Note that i F is nondecreasing right continuous and has left limits ii F 00 0 and iii F 00 1 While F need not have an inverse it would if and only if F is strictly increasing we can consider its right continuous inverse F 1cinfy Fy2c VxER 53 where inf Z gt0 The key feature of F 1 is that for any a E 01 and a E R F 1a g 5 if and only if a g 5 4 Consider the BoreIESteinhaus probability space 9 3 33 where Q 01 3 Q and 33 the Lebesgue measure on 93 For all u E 9 define Xw F 1w which is clearly a random variable on 03 Then by 5 4 33 X S 5 is the Lebesgue measure of all u E 01 such that u S Fc and this equals F 9 What this shows is that 33 X E 00c 11 occ Now proceed as one does in integration and prove whenever E is a finite disjoint unions of half infinite half closed intervals of the form 005 then 33 X E E 14E By the monotone class theorem Theorem 1 25 for all Borel sets E g R X E E 1E C Since we have introduced distribution functions let us mention a classification theorem for them too Theorem 54 A function F R gt 01 is the distribution function of a probability measure if and only if i F is nondecreasing and rightcontinuous and ii F oc 0 and Foc Proof The necessity of i and ii is easy and has already been mentioned without proof cf the proof of Lemma 5 3 Indeed suppose F x 11 occ for a probability measure 11 on RR Then F is nondecreasing 11 is a measure specifically since A g B gt MA 3 113 It is right continuous thanks to the outer continuity of the measure 11 and the assertions about F ice are obvious Conversely suppose F R gt 01 satisfies i and ii We can the de ne 11ab Fb Fa for all real numbers a lt b Extend the definition of 11 to finite disjoint unions of intervals of type 1505 by setting 11U1aibi 2 F 1 F a It is not too dif cult to check that a this is well defined and b 11 is countably additive on the algebra of all disjoint finite unions of intervals of the type a 1 Now we apply Carath odory s theorem Theorem 1 13 to extend 11 uniquely to a measure on all of R It remains to check that this extended 1 is a probability measure but this follows from 8131 limn1 ocn F 00 1 thanks to the inner continuity of measures C De nition 55 If p gt 0 then the pth moment of a random variable X is defined as X 7 provided that X20asorXEL C Lemma 56 If X 2 0 as or X E MW then X 7 Xpd 5 1dc where 11 is the distribution function of X More generally still if h R gt R is measurable then hX hX d like 11dcc provided that the integrals edist Proof For any random variable Y Y Yb 33dw provided that the integral is well defined Apply this to Y hX for the integral representation which involves the integrals on Q The second inte gral representations are the real message of this lemma since they state that in order for us to compute 42 a moment say we can compute a real integral as opposed to an abstract integral First one works with elementary random variable ie when X 01 where A E 3 and a E R Note that the dis tribution of X is 11 331406 1 33A6g where 639 denotes the point mass at 9 Thus in this case hX haAh0 1 A M93 11dx Next one checks this formula for a simple random vari able X ie one of the form X 2 Gulf where Al An E 3 are disjoint and 01 an E R Then u 2 6aAl6u1 ZPU1Antand am 2 hian Aih01 Al if he ux Then proceed as one does when constructing the abstract integral C Remark 57 At this point you should try and compute the pth moments if well de ned of the distributions of Examples 1 1571 17 C De nition 58 Pearson 1893 The variance and the standard deviation of the random variable X are respectively defined as VarX X X2 and SDX xVarX If XY is a random variable1 then the covariance and correlation between X and Y are respectively defined as CovXY X X A Y Y and X Y CovX Y SDX x SD C Lemma 59 Computational Formulas Provided that they edist VarX X2 lt X2 and CovX Y XY X A Y 5 5 Furthermore if X 2 0 a3 then for any 1 2 1 30 X 7 p AP IHX 2 A dA 5 6 0 In particular 2 33 X 2 n S X S E 33 X 2 n n 0 711 3 Independent Random Variables De nition 510 The events E1 En are said to be independent2 if E1 0 m0 En Hg 33Ej The random variables X1 Xn 9 gt W are said to be independent if for all Al An E Rd X 1Al er1AnE S are independent Equivalently X1 Xn are independent if TL om EAI XneAnH2PXeA 5 7 31 for all measurable Al An An arbitrary collection a E I of events is independent if for all i1 2 E I Eh Ein are independent An arbitrary collection Xig i E I is independent if for all il in E I X51 Xin are independent random variables Remark 511 A word of caution is in order here One can construct random variables X1X2X3 such whenever i j X and Xj are independent but X1X2X3 are not independent C 1Here this means that X Y 2 gt R2 is Bore measurable 2The definition of independence VVVV also known as statistical independence VVVV is due to de Moivre 1718 43 Lemma 512 An equivalent de nition of the independence of random variables is the following X1 Xn are independent if for all nonnegative measurable functions on bn W gt R TL TL 8 Hum Hammn 58 31 j1 Consequently if X1 Xn are independent then so are h1X1 hnXn for any Borel measurable functions hl hn R gt R Proof We will only prove the first assertion the second being a ready consequence of the first When the bfs are elementary functions this is the definition of independence By linearity in each of the bj s the above display remains to hold when the bj s are simple functions Take limits to obtain the full result C We will next see that assuming independence places severe restrictions on the restrictions on the random variables in question But first we need a definition De nition 513 The sigma algebra generated by Xig E i E I is the smallest sigma algebra with respect to which all of the Xi s are measurable it is often written as 0Xigi E I We say that a random variable Y is independent of 006 i E I when Y is independent of Xig i E I Equivalently we might say that 0Y and 0X i E I are independent The tail sigma algebra CT of random variables X1X2 is the sigma algebra CT fdan Xn1 C Lemma 514 Let X and Y denote two topological spaces Then i For all random variables X 9 gt X 0X X 1A A E X 5 9 ii If X1X2 are random variables all taking values in X then a Y valued random variable Y is inde pendent of 0X1X2 if and only if Y is independent of X1X2 iii If Y 1E is independent of X1X2 1F for all E E A and all F E giwhere A and 9 are subalgebras that generate Y and X Q respectivelyithen Y and X1X2 are independent The following tell us that our definitions of independence are compatible Moreover the last portion implies that in order to prove that two real valued random variables X and Y are independent it is necessary and suf cient to prove that for all 53 E R X S cY S y X S xY S y The proof of the above is relegated to the exercises Instead we turn to the following strange consequence of independence Theorem 515 The Kolmogorov ZeroOne Law If X1X2 are independent random variables then their tail sigmaalgebra CT is trivial in the sense that for all E E 7 E 0 or 1 Consequently any T measurable random variable is a constant as Proof Our strategy is to prove that any E E 27 is independent of itself so that E E D E EE 33 E2 But this is easy to prove Since E E T it follows that E is independent of 006 4 n1 Moreover because this is true for each n E is independent of the smallest sigma algebra 44 that contains UnaX1 1 n4 this is written as VnaX1 1 74 In other words E is independent of all of the Xi s and hence of itself To conclude this proof suppose Y is CT measurable we intend to prove that there exists a constant c such that Y c 1 Since Y Y Y we can assumeiwithout loss of generalityithat Y 2 0 a s in which case Y exists but could be infinite By replacing Y by Y n and letting n gt 00 we can assume without loss in generality that Y is also bounded Let c Y this is a positive and finite number and for all 5 gt 0 Y S c5 is 0 or 1 But c Y 2 YY gt c5 2 c533 Y gt c5 This shows that for all 5 gt 0 Y S c 5 1 Hence Y 3 c a s why Similarly one proves that Y 2 c a s and thus Y c a s C Example 516 A Very Weak Law of Large Numbers Suppose X1X2 are independent and define Sn X1 Xn Then the following are almost surely constants lim supn n lSn and liminfn n lSn Furthermore limn n 1 5 exists 0 or 1 and if this probability is 1 then the said limit is also a constant almost surely C Our next result states that independent random variables exist Theorem 517 Given probability measures 111112 fall on RdRdithere erist independent random variables X1X2 fall on a suitable probability spaceisuch that the distribution of X is m for each i 1 2 Proof Without loss of too much generality we will assume that d 1 This is for notational convenience only For every integer n 2 1 let 9quot Rquot 3 quot Rquot and quot m x x in Clearly 11quot is a consistent family of probability measures By the Kolmogorov extension theorem Theorem 3 17 and the accompanying Remark 3 19 there exists a probability measure 33 on RxRx that extends 111112 Finally define for all u E R and all integers i 2 1 Xiw a Since X51 E5 TPX E E 111E and X1 E E1 Xn E En 11quotX1 1E1 x x erlEn H1 jEj so that the Xi s are independent and have the asserted distributions The extension to the case of PM if obtained by letting instead 9quot midquot etc Corollary 518 Independent L2valued random variables in R are uncorrelated Proof Suppose X and Y are independent real valued random variables both in 1123 By the Cauchyi BunyakovskyiSchwarz inequality Corollary 2 25 XY g X2 Y2 lt 00 Moreover by Theo rem 5 17 XY X Y Thanks to Lemma 5 9 CovXY 0 which means that pXY 0 C We conclude this section with an important corollary of Corollary 5 18 Corollary 519 If X1 Xn are uncorrelated real random variables in 193 then TL Var X1 v v v J n ZVarXj 5 10 j1 In particular the above holds if the X s are independent 45 Proof We can use induction on n to reduce it to the case n 2 In this case we can use Lemma 5 9 to deduce that wax Xg a X X92 ax X22 5 11 VarX1 VarlX2 23le X2 The result follows from this D 4 The Weak Law of Large Numbers The following socalled weak law of large numbers is an important consequence of Corollary 5 9 3 While the weak law will be completely subsumed by the forthcoming strong law of large numbers it gives us a good opportunity to learn more about the Markov and Chebyshev inequalities as well as the truncation method quot Throughout this section X1 Xn are independent and identically distributed i i d real valued random variables and SnX1mXn V7121 512 Theorem 520 The Weak Law of Large Numbers Khintchine 1929 Suppose X1 X2 are lid realvalued random variables all in 133 Then as n gt gt0 S Iquot L X1 5 13 Proof We prove this in two instructive steps Step 1 The lgCase This is a particularly simple result when X1X2 are assumed to be in 1123 Indeed note that the evpectation of n lSn is X1 Thus Chebyshev s inequality Corollary 2 16 implies that for all 5 gt 0 i VarXj 5 14 j1 5X1 25 The last equality is a consequence of Corollary 5 19 Since the Xj s are identically distributed they have the same variance Therefore E E rquot X r 5X125gLig 515 and this goes to zero as n gt gt0 thus proving the corollary when the Xi s are in 193 Step 2 The General Case When the Xi s are assumed only to be in 1113 we use a truncation argument For i g n and given a fixed but small 6 E 01 define Xi Xil xi 67 For each fixed at X13 an are independent 3The rst weak law of large numbers was proved in Bernoulli 1713 for 1 random variables See Adams 1974 for a fascinating account of the classical history of this theorem 46 identically distributed random variables furthermore since Xm 3 6n is bounded they are all in 1123 Thus writing Sn X13 an we have ES 3 Var X 3 ELY X1ngza7quot sTyh a 16 thanks to Lemma 5 9 On the other hand in S X Xi S n6 3 n6X11 This yields ES 3 6 2P 3quot X1n2e 511x311 5 17 Now X n Xi X 2 n6 3 n6 1 X1X1 2 m5 ma 15736 thanks to the Markov inequality Theorem 2 15 Therefore by finite subadditivity of measures 6716 6 2F 3239 s n Xi 2P m s inm Xi lt5 18gt Since X11 lt 00 the dominated convergence theorem Theorem 2 22 guarantees that limn 8 0 In particular there exists N6 such that for all n 2 N6 8M 3 6 Subsequently for all n 2 N6 ZPBi n Xi 6 5 19 Finally by Jensen s inequality Theorem 2 27 Z S Z so that for all n 2 N6 15X1n X1l l X1 1X11 gt W S 8 1X1 1331 2 m5 5m S 6 5 20 Therefore for all 5 gt 6 gt 0 and n 2 N6 6X1 25 g 2 39u3j Sn Xmexj I S I 5 21 g 3quot eX1n 25 6 we 9 Xi Xj Consequently by 5 17 and 5 19 for all n 2 N5 ES E 6 Ef J 1E25SW 1HI6 a 22 Let n gt 00 and then 6 l 0 to prove the theorem C 5 The Kolmogorov Strong Law of Large Numbers We are ready to state and prove the strong law of large numbers so named because it is a stronger theorem than the weak law of large numbers Theorem 5 20 Throughout this section X1 Xn are i i d recall that this means independent and identically distributed random variables taking values in R We will write as before 5 X1 Xn for the partial sum process 47 Theorem 521 The Kolmogorov Strong Law of large Numbers If the Xj s are in 1113 then at most surety S n 73326 g X1 0 23 Conversely if lim supn n 115n lt 00 gt 0 then the Xj s are in 1113 and the strong low 523 holds Remark 522 In fact independence can be relayed to the weaker pairwise independence This extension was made by N Etemadi who also devised an alternative proof that is of independent interest cf Etemadi 1981 C There are many proofs of this fact I will discuss the one that I regard as most informative This proof relies on two key technical results either of which is interesting in its own right Theorem 523 The Borel Cantelli Lemma IfAl Ag are events and if 22 33141 lt 00 then 22 1A lt 00 as The converse holds if the 917 s ore pairwise independent Proof By the monotone convergence theorem Theorem 2 21 00 N N 00 Z M M Z No 15 Z 6 Z a 24 711 711 711 711 In particular whenever Zn 33141 is finite so is Zn 14 But any nonnegative random variable that is in L163 is a s finite why thus Zn 14 is finite almost surely The converse is more interesting Suppose that 22 33141 00 and that the Aj s are pairwise independent Let Z N 23 1A for simplicity and note that by Lemma 5 9 for any N 2 1 N N N VarZN Z VarClAn Z 2PM 1 2mm 3 2 2pm ZN 5 25 We use this together with Chebyshev s inequality Corollary 2 16 to see that VarZN 1 58 Z v lt lt f M gnaw 528m In particular as N gt 00 L 1 Why Since 22 33141 T 22 33141 and this is assumed to be infinite we have shown that as N gt 00 ZN converges in probability to 00 ie for any A gt 0 limNm ZN gt A 1 Consequently 22 14 00 almost surely as promised C r z 2N 5 26 We are now prepared to prove the second half of Theorem 5 21 Proof of Theorem 521 Necessity of The L13 Condition Suppose that the Xj s are not in 1113 ie that X1 00 We will show that this implies that a s lim supn n 1Sn 00 According to 5 6 of Lemma 5 9 for any X gt 0 00 90 71 00 K1 1X11 ZPX12 KAd 2 X12 who 0 711 quot 1 so 00 a 27 S ENDS 2 K1 1 91th 2 Kn 711 710 48 Consequently by the independence half of the Bore17Cantelli lemma Theorem 5 23 with probability one Xn 2 Kn for in nitely many n s Since 5 2 Xn Sn1 then either Sn1 2 in nitely often or else for all but a finite number of n s Sn 2 Xn In any event it follows that with probability one 5 2 for infinitely many n s In particular there exists a null set NK such that for all 4 Q NK lim supn n 1Snw 2 On the other hand N UIZV39K is a null set being a countable union of null sets why Furthermore for all a g N lim supn n 1Snw 00 almost surely as was to be shown C The second part of the proof of the strong law of large numbers is a mammal lginequality of A N Kol mogorov Although they may be new to you at this point maximal inequalities are some of the fundamental facts in probability and analysis Theorem 524 The Kolmogorov Maximal Inequality 1 5 X1 A A A Xn and if the Xj 3 are independent not necessarily Md and in 1123 then for any A gt 0 and all n 12 n 5 28 535 st ask 2 A 3 Remark 525 If makan Sk Sk were replaced by 15 Sn the resulting weaker inequality would follow from the Chebyshev inequality Corollary 2 16 C Proof Without loss of any generality we may assume that Xi 0 for otherwise we can consider Xi Xi in place of X Note that Sn n X1 0 in this case For any k 2 2 let Ag denote the event that Sk 2 A but that 511 lt A for all 6 lt k In symbols k l At 151A 2 A 0 15A lt A 5 29 131 Thinking of the index k as time At denotes the event that the first time the random process k e gt St leaves AA occurs at time k Since the Ak s are disjoint events and since 52 2 0 53 2 8 sink 8 Sn 5k taxman k2 131 n n 5 30 2 26 Susi Skleflk Easiest 131 131 For any 4 E Ah Elihu 2 A2 Hence n n 859 2 28 Susn so Ar A2 22m 131 131 5 31 n 2H a Susi mat A2lr gn1Xk12 A 4This result together with its history is wellexplained in Kohnogorov 1933 for instance 49 The event Ah and the random variable Sh both depend measurably on X Xk whereas 5 Sh Xk J n is independent of X Xk Consequently Sn Se is independent of Skit Lemma 5 12 from which we get Sn SkSkAk Sn 513 x SkgAk 0 since Sn Sk 0 This proves the result C Proof of Theorem 521 Throughout we may assume without loss of generality that X1 0 for otherwise consider Xi Xi in place of X The proof of the strong law simplifies greatly when we assume further that X1 E 1123 This will be the first step The second step uses a truncation argument to reduce to a problem about Lg random variables Although we do not need the first step to go through the second it is a simpler setting in which one can introduce a third important technique Blocking andor subsequencing Step 1 The lgCase If the Xj s are in 1123 then by the Kolmogorov maximal inequality Theorem 5 24 for all n 2 1 and 5 gt 0 5 32 jmax151e125nlt7 7 13kg 2 2 since S VarSn 22 VarXj n X cf Lemma 5 9 Now replace n by 2quot to see that x 11m 4 2 39 Zl gggnw 252quot S 1 2752 ltoc a 33 n n By the Bore15Cantelli lemma Theorem 5 23 with probability one for all but a finite number of n s mavlgkggn Sk S 52quot Now any integer m can be sandwiched between 2quot and 2quot for some n Thus Sm S max1ltklt2n1 15k which is with probability one eventually less than 52quot 3 25m We have shown that for any fixed 5 gt 0 there exists a null set N5 such that for all u g N5 limsupmm 1Smw S 5 Let N U5 QZV5 and note that thanks to countable subadditivity of 33 N is a null set Moreover for all a g N limm3Q m 1Sm 0 which is the desired result in the Lg case Aside At first glance it may seem strange that we applied the BorekCantelli lemma to the subsequence 2quot rather than n However n clearly would not work since the probabilities in question would not be summable The reason for this is that in order for the suf ciency part of the proof to match with the necessity part we need events that are nearly independent The point of the proof is that while mavlgkgn Sk and maxlgkgn Sk are not close to being independent since 5 and Sn are highly dependent the random variables mavlgkggn Sk and maXISkSQnsH Sk are close to being independent Step 2 The IllCase As in the proof of the weak law of large numbers Theorem 5 20 we truncate the Xi s However the truncation levels are chosen more carefully For all i 2 1 define X Xle Si and let S X XT L Since 5 is a sum of 239 independent though not i i d random variables by the Kolmogorov maximal inequality Theorem 5 24 for all n 2 1 and 5 gt 0 ZP max is a5 2 lt 534 VarSL 1937 n quot 50 Furthermore VarSL 22 VarXJ S 22 JY2 22 ng XI 3 j Thus Xize 1X1 S 139 131332 47152 90 i i so Z max Z 52quot g 1 n 121 n21 g2l On the other hand for any a gt 0 Z 4 quot g g4 Thus 712 36 i i 4 giXiz 1X1 S j 4 2 2 Z S 523252HSQZj 2 Q5 X1 J 036 n1 321 32 A But for any a gt 1 Zijj g S qr du cc 1 1 5 1 cc 1 l 3 25 1 On the other hand if a E 011 then 232x j 2 S 211 3 2 In other words DMZ X j 2 g 21 x1 gt1 21 x1 9 2 This yields 30 8 22 max gsg asg 25 g 32 X1lt 00 5 37 711 9 131332quot Now use the proof of Step 1 to deduce that limn3Q n46 S 0 almost surely But the fact that X1 is mean zero implies that SL Z X1X1 S j Zg X1X1 gt 3 g 2 X1X1 gt j In particular since limj X1X1 gt j 0 cf Theorem 222 we have limn3Q n 1 SL 0 It suf ces to show that I lim 75quot Squot 0 a s 5 38 n mo 71 To prove this note that 30 30 a Z 1w gt1 22Hle 21 1X11lt 00 5 39 j1 j1 cf Lemma 5 9 Consequently with probability one for all but a finite number of j s HQ 3 j and this means that supn Sn 5 lt 00 a s Why This proves 5 38 whence the strong law 6 Five Applications I will conclude this chapter by applying several of these ideas to five applied problems The first three of these topics are fundamental scientific discoveries and are at the core of the theories of approximation information and empirical processes respectively Topics four and five are younger and at this point in time less fundamental quot I have included them here since they are both elegant as well as natural starting points for learning more about some of the recent discoveries in both discrete mathematics and numerical integration 51 61 The Weierstrass Approximation Theorem The Weierstrass approximation theorem is one of the fundamental approximation theorems of analysis It states that every continuous function on 01 say can be uniformly approximated to within any given 5 gt 0 by a polynomial We now use the proof of the weak law Theorem 5 20 to prove the said Weierstrass theorem this proof is due to S N Bernstein Theorem 526 Bernstein 1913 For any continuous function f 01 gt R de ne the Bernstein poly nomial n f by gum xj1 xquot jf v e 01 5 40 Then nf is a polynomial of order n and limn wn p fp 0 uniformly in p E 01 Proof We start with a bit of undergraduate probability If X1X2 Xn are i i d with X1 0 1 X1 1 1 p where p E 01 is a fixed number then 5n k EMU pquotquot6 for k 0 n and 5n k 0 otherwise here as before 5 X1 X and this is the socalled binomial distribution It has the following statistical significance If Xj s are the results of successfailure iid trials with Xk 1 if and only if the kth trial resulted in a success then 5 denotes the total number of successes in this random experiment and its distribution is binomial Since f is uniformly continuous for each 5 gt 0 we can choose 6 gt 0 such that for all pq E 01 with p q S 6 we have fp fq S 5 Now write nfp fAn where An n lSn denotes the average number of successes We decompose this as nfp EULA An p S 5 f 4ni lAn p gt 6 T1 T By the Chebyshev inequality Corollary 2 16 T2 3 KAn p 2 6 S K6 2V39arAn where K supx fc But by Lemma 5 9 Vaan n gVarLSn n lVarX1 114141 11 S 470 1 Consequently we see that E 3 K 471 uniformly in p E 01 It is also clear that 7 mi 3 in fa 2min m s 6 Kl 4n 12 5 S 5 lt5 41 since Tl fp ZP llAn pl S 5 fl 4nl fpe An pl S 51S lfl 4nl fpl Anpl S 5 Since our bounds on both T1 and T hold uniformly in p E 01 this concludes the proof C 62 The Shannon Entropy Theorem In 1948 C Shannon proved that the thermodynamical notion of relative entropy plays a key role in communication theory see Shannon 1948 Shannon and Weaver 1949 Consider a finite alphabet A 01 am and the probability measure 11 2 15603 defined on the power set of A where 6 denotes the point mass at a E 9 Thus 11E 2 1Eojpj for all E g A Equivalently if X is a random variable on some probability space 03 with distribution 11 then X Oj pj for all j 1 m De nition 527 Any element 0 of A is a symbol or letter A word u uh un of length n is a vector of n symbols Let Wu denote the collection of all words of length n and define m counting functions 8 8 Wn gt N by 82111 22 lm wk 6 1 m u E W That is 87112 is the number of times the symbol cu appears in the word u 52 De nition 528 Fix a sequence A1g gt 0 such that limn n IAn 0 Then de ne the n letter word 11 E Wu to be A typ39ical if for all 6 1 m we have le w 31111 S An Otherwise w is said to be A atyp39ical C In other words a word 11 is typical if the proportion of times that a appears as a symbol in w is p give or take a negligible amount Theorem 529 Shannon 1948 For any red sequence An gt 0 such that t n IAn gt 0 and it lim supn 311 lt 4m Also de ne Tn to be the number of Atyp39ical words of length n Then 1 73326 510217 7113 Hp 0 42 where log2 is the base2 logarithm and Hp 27 mlog m is the relative entropy of the sequence pp1 pm Proof Let X1X2 denote i i d random variables with distribution 11 all de ned on some probability space 9 3 33 Write Wn X1 Xnithis is a randomly sampled word of length niand de ne pnw Wn w X1 111 Xn wn V112 E Wu 5 43 This is the probability of ever sampling the word 11 But then I I m I I 2 mm ag 1 m 70V 71mg An 3 22 70 m gt An 5 44 11 w 39Wn w is Aratyp xcai Moreover 87Wn 22 lgjwl is a sum of n iid random variables with mean p1 and variance 1111 pl 3 37 Thus by the Chebyshev inequality Corollary 2 16 mn lim sup E pnw 3 lim sup 6 lt 1 5 4o n mo mew n mo 4A w is Aratgpical 6710 m On the other hand for any 11 E Wu pnw H21 pl equivalently log2 pn 112 Z 1 821 log2 11 Thus whenever w E Wu is A typical then ln l log2pnw Hp g Kn lAn where K 221 log2 p1 and Hp 27 p log2 p13 is the relative entropy of p Equivalently still for all x typical w E W film KM Emmy S 2nHpKAn 5 46 To complete this proof note that 1 2 Pull Z Pu 5 47 w Artypical w Aratyp xcal Thus by the rst inequality in 5 46 1 2 Zwktypicmpn y Z 2quotHltP KnTn which implies that lim supn quot 1 loglenlAll S Hp This is half of the result For the other half we can combine 5 45 with the second inequality in 5 46 Namely for all 5 gt 0 there exists n such that for all n 2 n5 Z10 Aratypicalpn S 6 5 Hence by 5 47 for all n 2 a 21013typgcalpnW Z 1 lt5 5 If we Choose 5 such that 1 5 gt 0 then by 5 46 Tn2quotHlt Kquot 2 1 6 5 which has the desired effect C 53 63 The Glivenko Cantelli Theorem De nition 530 If X1X2 denotes a sequence of i i d random variables all with distribution function F one can form the empirical distribution function Fn by 1 quot Fm a Z 1mg V e n a 48 131 The following is a remarkable theorem of V I Glivenko and F P Cantelli in statistical terms it presents a uniform approximation to an unknown distribution function F based on a random sample from this distribution Theorem 531 Cantelli 1933 Glivenko 1933 With probability one 1 I lings 31xpltx Fnc 0 a 49 Proof Let us dispose of all measurability issues first Since Fn and F are right continuous supx k Fnc F 5 supx 01Fnc F 5 Being the supremum of denumerably many random variables it follows that supx Fnc F 5 is itself a random variable Now note that for each fixed 5 nFnc is a sum of n i i d random variables with mean nFc and variance nFc1 F 5 Thus by the Kolmogorov maximal inequality Theorem 5 24 for any 5 gt 0 n 2 1 and a E R 33 2 3122an EFAx gt 5 S 33 lsrlnsa122 1 EEFAx gt 52quot 2quot HF5 1 1 S 524n1 S 522n3 5 50 I have used the fact that for any number F E 01 F 1 F S 37 But F is nondecreasing right continuous Foc 1 and F oc 0 Therefore we can always find a sequence of values cm1 lt 5 lt 51 S 0 lt 51 lt 52 lt lt cam such that i Fcm S 5 ii Fxm 2 1 5 and iii for all j g m Fc Fcj1 S 5 Moreover supxj1Sxltxj 3Q 36 m 1 n2 32 WW WE gt 5 S lt 00 lt0 01gt By the BorelECantelli lemma with probability one max EFij g 5 for all but nitely many n s 5 52 max 1 sum raglst But if a E ccj1cj then for all 6 as above Fitz 5 3 F185 5 S Fcj S Fc S Fcj1 5 3 Flcj1 25 3 FM 25 In other words this shows that with probability one Fitz g 25 for all but nitely many n s 5 53 sup max xmltxSxm was 54 On the other hand if a gt cm then Fc 2 Fxm 2 1 5 and F185 2 Fxm 5 2 1 25 Therefore for such values of 5 F x Fc S 1 Fc 1 Flc S 35 Similarly if a lt 5 m Fc 17185 S F cam F1 cam S 35 Consequently with probability one s A El F lt5 f 11 39lr quot4 Pingcwl 93 5 3 or a but nite 3 man n s a a Let N5 denote the null set off of which the above property holds Then U5 QN5 is a null set off of which the theorem holds C 64 The Erd s Bound on Ramsey Numbers De nition 532 The complete graph Km on m vertices is a collection of m distinct vertices any two of which are connected by a unique edge The nth diagonal Ramsey number Rn is the smallest integer N such that any bichromatic coloring of the edges of KN yields a Kn g KN whose edges are all of the same color C In other words if Rn N then no matter how we color the edges of KN using only the colors red and blue then somewhere inside KN there exists a Kn all of whose edges are either blue or red and N is the smallest such value Ramsey 1930 introduced these and other Ramsey numbers to discuss ways of checking the consistency of a logical formula In particular he proved the nontrivial fact that for all n 2 1 Rn lt 00 It is intuitively clear that as n gt gt0 Rn gt 00 and one wants to know how fast this occurs This question was answered in 1948 by P Erdos using independent random variables as follows Theorem 533 Erdos 1948 For all n 2 3 Rn gt 2quot Remark 534 As far as I know the best known bounds are 11127 2 lt Rn lt 11 822quot for two constants A and B For instance the following is likely to be true but has no known proof 1 nl glg Rn E 332 exists a 00 This is a conjecture of Alon and Spencer 1991 equation 3 p 241 C Proof We will show that given any two integers N 2 n N 1 212k1 gt RngtN 5 56 If so we can then apply the above with N Zn2 and note that 3 27 22 nlg thus 321 13 3 2n2nl which is strictly less than 1 for all n 2 3 Hence it suf ces to verify the preceding display Consider a random coloring of the edges of KN i e if EN denotes the edges of KN then consider an i i d collection of random variables X e E EN where X i1 E g color a red if and only if X 1 The probability that any n given vertices form a monochromatic Kn is 21 13 where the extra 1 in the evponent is due to the fact that there are two possible colors Since there are many choices of these n vertices the probability that there erdst n vertices that form a monochromatic Kn is less than or equal to 21 13l This uses only finite additivity ie that 33H U B S 33151 333 1 We have shown that the 55 probability of having a monochromatic Kn g KN is strictly less than one That is to say that with positive probability there are no monochromatic Kn s in KN In other words there are bichromatic colorings of KN that yield no monochromatic Kn g KN ie Rn gt N C 65 Monte Carlo Integration Suppose we were to find or better still approximate the value of some integral L full ltgtI dx where d Rquot gt R is a Lebesgue integrable function that is so complicated that the integral L of interest is not explicitly computable One way to proceed then is to first pick i i d random variables X1 XN all chosen according to the uniform measure on 0 1quot i e each Xj is pick uniformly at random from the hypercube 0 1quot By de nition for any j 1 N ltgtXj L Since gtX1 ltgtXN are ii d random variables with expectation L the Kolmogorov strong law of large numbers Theorem 5 21 insures that for n large N 1 2 gtXj is close to the desired integral More precisely that with probability one N 1 113 w 33 M Ir a on This so called MonteCarlo integration works well in comparison to other numerical integration methods when the dimension n is large 7 Exercises Exercise 51 Prove Lemma 5 9 HINT For 5 6 use the FubiniiTonelli theorem in conjunction with the fact that for y 2 0 measurable MWWX Z d 8U 9A1ux1 d Exercise 52 Verify the claim of Remark 5 11 by constructing three random variables X1 X2 and X3 such that X1X2 X1X3 and X2X3 are independent but X1X2X3 are not independent HINT Consider X1 i2 with probability each an independent X2 i1 with probability each and X3 XI x Xg Exercise 53 Prove Lemma 5 14 Exercise 54 Prove the oneseries theorem of Kolmogorov 19305 If X1X2 are independent mean zero random variables taking values in R and if Zj Xf lt 00 then Zj Xj converges almost surely Use this to prove the following beautiful theorem of Steinhaus 1930 The random harmonic series converges with probability one That is if 0102 are i i d random variables taking the values i1 with probability each then Zj j laj converges almost surely HINT First show that 5 2 Xj is a Cauchy sequence in 1123 Then use the Kolmogorov maximal inequality Theorem 5 24 along a suitable subsequence 51h fact this is preceded by the stronger Muteseries themem which characterizes when Ej X3 converges a s Cf Khintchine and Koimogorov 1925 56 Exercise 55 In Lquot the strong law becomes easier to prove Suppose X1X2 are iid random variables with mean zero and variance one and let 5 X1 i i i Xn If in addition X11 lt 00 then show that there exists a constant A such that for all n 2 1 Sn1 S AV Conclude the strong law under the given L39l assumption from this the BorelCantelli lemma and Chebyshev s inequality alone You may not use the Kolmogorov maximal inequality in this exercise Exercise 56 Prove the Paleinygmund inequality Paley and Zygmund 19326 For any nonnegative ran dom variable Y E 1123 and for any 5 gt 0 Y gt 5 Y Z 1 52M am a as Now suppose that E1E2 are events such that 2 Ej 00 and that f L 2P E0E a lim inf Z 1 Z lt 00 5 59 0 23 HEA Then show that 33 Zj 15 00 2 7 1 gt 0 Verify that this improves the independence half of the BorelCantelli lemma Theorem 5 23 HINT For the first part start by writing Y Yg Y 3 a YgY gt a for a suitable number a Exercise 57 Suppose X is uniformly distributed on 01 i e its distribution is the Lebesgue measure on 0 1 Write in binary X Z ij and then show that X1 X2 are i i d Find their distribution and show that with probability one limn 11 15 13 where 5 22 1xj1 is the total number of 1 s in the first n digits of the binary expansion of X This is the normal number theorem of Borel 1909 Exercise 58 Hard The classical central limit theorem Theorem 6 26 below states the following If X1X2 are ii d random variables with EX1 0 and VarX1 1 and if 5 X1 i i i Xn then for any real 1 lt b e x22 do 5 60 27f nligchaV SSnbe b f Use this without proof to derive the following7 As n gt 00 r b x22 W3 9 e gt d 61 x27r x a We will prove this in successive steps 65ince its rst appearance this has been independently rediscovered by several others for example see Kochen and Stone 1964 7It has been shown that if in addition there exists a e gt 0 such that X1 6 finial then 5 60 holds almost surely The resulting convergence is called an almostsure central limit theorem it was discovered independently and at the same time in Brosamler 1988 Fisher 1987 and Schatte 1988 for a related remark see Levy 1937 p 270 Lacey and Philipp 1990 devised an ingenious proof of the almost sure central limit theorem their method is suf ciently robust that many of the conditions of this exercise including the independence of the increments can be weakened The best possible improvement of the condition X1 6 12 is that lnln X1 00 see Berk s et al 1998 A remarkably general almostsure central limit result appears in Berk s and Cs ki 2001 For further references and related results see the survey article of Berkes 39 1998 57 Chapter 1 A CrashCourse in Measure Theory 1 Introduction Modern probability is spoken in the language of measure theory and to me this is where the connections between the two theories begin as well as end It is for this reason that I have tried to minimize the y quot ur h r fit that are typically found in probability texts At the same time it is difficult to imagine how one can try to understand many of the advances of modern probability theory without rst learning the requisite language As such we begin these notes with a few brief primer chapters on measure and integration The first part of these notes is self contained and the motivated student can learn enough measure theory here to use the remainder of the notes successfully However the present treatment may be too rapid and perhaps even too sparse for some My intention is rather to recall some facts and describe ideas that are needed in developing probability theory To this I should add that many of the said facts are typically not sufficiently well stressed in a standard book on measure and integration So it is best not to omit this rst part in a first readin Perhaps the best way to appreciate our need for using measure theory is to ask What is a random variable After all no matter how they may be de ned random variables are one of the central objects in probability and many of its applications Classically one thinks of a random variable as the numerical outcome of a random experiment Moreover each time we perform our random experiment we should obtain a realization of this random variable that may or may not be the same as the previous realizations This is far from an exact description and leads to various inaccuracies not to mention some paradoxes The modern viewpoint in rough terms is the following We have an unknown possibly complicated function X Each time we perform our random experiment e the evaluation X w of X at some point a in the domain of de nition of X where the point or realization w is selected from some predescribed probability measure or distribution This general description is not as complicated as it may seem and has the appealing property that it can be rigorously introduced As an example consider the unit al 0 1 and let WE denote the length of any E 9 01 more precisely CD is the Lebesgue measure on 01 Now consider the function 3 a a 1 4 1 m e 0 7 X i 0 m e 1 11 3 Note that the Clmeasure of the set of all u 6 01 such that X w 1 is the length of 0 which is g This is often written as X 1 g Likewise CPX 0 Viewed as such X provides us with a mathematical model for the outcome of a fair coin t s For instance if we observe an w such that X w 1 this describes having tossed heads Moreover such an event i e X 1 can happen with probability i e for one half of the w s in the of measure To study more complicated random variables we need to have a much deeper understanding of measures and measure theory Having said this let us begin with a formal description of aspects of the theory of measure 2 Measure Spaces Throughout let 2 be a set that is sometimes referred to as the sample space De nition 11 A collection 5 of subsets of Q is a sigmaalgebra if i Q E E ii it is closed under comple mentation ie if A E K then A E E and iii it is closed under countable unions 1e if 141142 6 S then US147 E E It is an algebra if instead of being closed under countable union it is merely closed under nite unions Of course sigma algebras respectively algebras are also closed under countable respectively nite intersections and they also contain the empty set Furthermore any sigma algebra is obviously an algebra but the converse is false The collection of all nite unions of subintervals of 01 is an algebra but not a sigma algebra Example 12 E 913 is a sigma algebra that is aptly called the trivial sigma algebra The power set of Q is also a sigma algebra Recall that the power set of any set is the collection of all of its subsets These are the two extremal examples Lemma 13 Hausdorff 1927 p 85 If is any set denumerable or not and if E is a sigmaalgebra of subsets of for each i E I then rugs is also such a sigmaalgebra Consequently given any algebra A there exists a smallest sigmaalgebra containing A De nition 14 IfA is a collection of subsets of 2 we write 7A for the smallest sigma algebra that contains A this is the sigmaalgebra generated by A Note that this is a consistent de nition since 7A 05 where the intersection is taken over all sigma algebras 5 such that A Q 5 Note also that this is not a vacuous intersection since the power set of Q is at least one such sigma algebra An important class of sigma algebras are introduced in the following De nition 15 If 2 is a topological space the open subsets of Q generate a sigma algebra 15 Q that is called the Borel sigmaalgebra of 2 Elements of a sigma algebra E are said to be Emeasurable or measurable with respect to 5 When it is clear from the context that E is the sigma algebra under study its elements are referred to as measurable If E is a sigma algebra of subsets of Q a set function 11 E gt 1L is said to be a measure on 23 if a 1113 0 and b given any denumerable collection AIA2 of disjoint sets 1 U An EMA 12 4 We emphasize that by virtue of its de nition 11 E E and not R measures are nonnegative Of course real valued often called signed or complex measures can be de ned just as easin De nition 16 Let 5 denote a collection of subsets of Q and let 1 be a set function on 2 Then 11 is said to be countably additive on S if for all disjoint 141142 fall in Sias soon as we have UnAn E S then 1 2 holds It is said to be countably subadditive on S if for all 141142 6 S such that UnAn E S then 11UnAn 3 Zn 141171 D The following is simple but not entirely obvious if you read the de nitions carefully Lemma 17 Countably additive set functions are countably subadditive De nition 18 If E is a sigma algebra of subsets of Q and if 11 is a measure on 25 then 9511 is called a measure space E Listed below are some of the elementary properties of measures Lemma 19 If 9311 is a measure space then a Continuity from below If Al C 12 C are all measurable then as n gt so we have MA T Um15 m b Continuity from above If A Z 12 2 v are all measurable and if MA lt 00 for some n then as n gt so we have 114n n 114m De nition 110 A measure space 9511 is a probability space if 119 1 In this case 11 is a probability measure If instead 119 lt 00 then 11 is a nite measure Finally it is sigma nite if there exist measurable 21 Q 22 Q such that UnQn Q and 11mm lt 00 D We often denote probability measures as C Q rather than 1112 The following result characterizes probability spaces that are based on a nite set 2 Lemma 111 Suppose Q 1111 wn is a nite set Then we can nd p1 pH 6 01 such that a 101 v v 39 pn 1 and b for all A Q 2 WA UHeap Conversely any sequence p1 pH 6 01 that has the property a above de nes a probability measure CD on the power set of 2 via the assignment 2w i1 n 3 Lebesgue Measure Lebesgue measure on 01 gives us a way of measuring the length of a subset of 01 To construct the Lebesgue measure then we rst de ne a set function in on half closed nite intervals of the form a b 9 0 1 that evaluates the length of the said intervals mltltab1gtb a 13 U Let A denote the collection of all nite unions of half closed subintervals of 0 1 It is easy to that A is an algebra We extend the definition of m to A as follows For all disjoint half closed intervals 11 In g 011 m 1 im g 14 This mE for all E E A Of course we need to insure that this de nition is consistent Consistency follows from induction and the following obvious fact For all 0 lt K b lt c maic maib1mbic 15 More importantly we have Lemma 112 The set function in is countabe additive on the algebra A That is whenever 141142 are disjoint elements of A such that UilAn E A then mUff1An 231 mAn If there are only nitely many nonempty An s this is an obvious consequence of 1 4 Proof Since 0114 and Ufgllz ln are both in A so is UffNAn Moreover co 1V7 00 m U An 2 mm m U An 16 n1 711 n1V It suffices to show that liman mUffNAn 0 Hence our goal is this Given a sequence of Bn L 137 all in A we wish to show that mBn L 0 Suppose to the contrary that there exists 5 gt 0 so that for all n 2 1 mBn Z 5 We will derive a contradiction from this Write Bn as a nite union of half open intervals viz Bn 0 13 by where 0 g a lt b g 1 Also recall that the Bn s are decreasing Choose some E a b so close to a that g a 5 2n1 kn Then the BH and CH 9 BH are close in measure where Cn Ukl Here is a measure of how close they are n n by m Uamcj 52ai ais lt17 j1 j1il In particular mCn Z mBn 52 2 52 But the Cn s are closed bounded and nonempty If we knew that they were also decreasing this and the Heine Borel property of 0 1 would together imply that nCn 7 13 which cannot be since CH 9 BH and BH L E This would give us the desired contradiction The only problem is that the C need not be decreasing So instead consider Dn 01 Cj The Bus are closed bounded and decreasing It suffices to show that they are nonempty But this is e Note that n n anDnUBn DE DnU UBjnDE DnuUBjCj 18 j1 j1 since the Bus are decreasing Therefore 5 g mBn g mDn 52 thanks to 1 7 This shows that mDn 2 52 so that DH 75 la and this completes the proof of countable additivity The rest is smooth sailing D Lemma 1 12 and the following result immediately extends the domain of the de nition of m to 7A the latter is obviously equal to 15 0 1 Theorem 113 F J y t 39 J J 1948 Suppose Q is a set andA denotes an algebra of subsets of 2 Then given a countably additive set function 1 on A there exists a measure 17 on QaA such that for all E E A 11E 17E Furthermore if 119 lt 00 then the extension 17 of 1 is unique and 17 is a nite measure with 179 119 This result is proved in 5 at the end of this chapter Note that the method of this section also yields the Lebesgue measure on R 0 1 etc To construct the Lebesgue measure on 011d where d 2 1 we proceed as in the case 1 1 except start by de ning the measure of a hypercube H321 ajbj E 011d aj lt atj g bj as H311 bj 1 Since the collection of all nite unions of hypercubes is an algebra that generates 15 0 11d we then appealias in the one dimensional caseito the Caratheodory extension theorem to construct the Lebesgue measure on 15 0 11d Further extensions to 0 11d Rd etc are made similarly It is also possible to construct Lebesgue measure on 15 Rd as a product measure 1y tuned Once we have Theorem 1 13 we can easin construct many measures on the Borel measurable subsets of Rd as the following shows This example will be greatly generalized in the next chapter where we introduce the abstract integral Theorem 114 Suppose f Rd gt M is a continuous function such that the Riemann integral fa da equals 1 Given ab 6 Rd with aj g bj for all g 1 consider the hypercube CW a1b1 gtlt gtlt adbd and de ne cm C m da 1 9 Then 1 uniquely extends to a probability measure on Rd 15 Rd and f is called the probability density function of 1 Proof Sketch Let A denote the collection of all nite unions of disjoint hypercubes of the type mentioned Then it is easy to that A is an algebra of subsets of Rd If A E A then we can write A U lCahbi an de ne m m l Cami 211C6mh 1 10 i1 1 One can check that this is a consistent de nition This is an ugly task but it is not too hard to do Now we proceed as we did when constructing the Lebesgue measure in order to show that 1 is countably additive on A Finally we appeal to Theorem 1 13 to nish D Below are some examples of measures that are important in applications I refer to these measures as distributions Example 115 The Uniform Distribution Suppose X E 15 Rn has nite and positive n dimensional Lebesgue measure mX Then the uniform distribution 1 is the measure de ned by 1A mA 0 X mX for all A E 511131 D Example 116 The Exponential Distribution Given a number A gt 0 fa Aexp a de nes a probability density function on M7R7 and the corresponding measure is the exponential distribution with parameter A D Example 117 The Normal or Gaussian Distribution Given two constants 11 E R and a 6 11h the normal or Gaussian distribution with parameters 11 and 7 corresponds to the density function fa exp 21a 11V Va 6 R 1 11 1 v 27m2 Example 118 The Normal or Gaussian Distribution in R Given a column vector of constants 11 E R and an n x n symmetric invertible matrix Q the normal or Gaussian distribution with parameters 11 and Q corresponds to the following density function For all a E R 1 N3 exp 1 1139Q 1a 11 1 12 4 Completion In the previous section we constructed the Lebesgue measure on 15 0 11d say and this is good enough for most people s needs However one can extend the de nition of m slightly further by de ning mE for a slightly larger class of E De nition 119 Given a measure space 9511 a measurable set E is null if 11E 0 The sigma algebra E is said to be complete if all subsets of null are themselves measurable and null When 5 is complete we also say that 9511 is complete We can always insure completeness viz Theorem 120 Given a measure space 9311 there exists a complete sigmaalgebra 539 2 E and a measure 11 on 9539 such that on E 11 and 1139 agree Definition 121 The above measure space 95 11 is the completion of 9311 D Proof of Theorem 120 Sketch Let AAB A 391 BB U An 0 B denote the set difference of A and B and for any two A and B de ne 5 z A g Q EIBN E 5 so that 11N 0 and AAB g N 113 In other words we construct 539 by adding in all of the subsets of null of 5 Step 1 539 is a sigmaalgebra Since ABAB393 AAB 5 is closed under complementation If Al Ag 6 5 then we can nd B1B2 E E and null N1N2 such that AiAB Q for all i Z 1 But UiAiA U B UiAiABi g UiNi and the latter is null thanks to countable subadditivity Thus 5 is a sigma algebra Step 2 The measure 1139 For any A E 539 de ne 1139A 11B where B E E is a set such that for a null set N E E AAB g N It is not hard to that this is well de ned i e does not depend on the representation BN of A Clearly 11 11 on S we need to show that 11 is a measure on 95 The only interesting portion is countable 8 addit ivity Step 3 Countable Additivitg Suppose 41AQ E 5 are disjoint Find B N E E as before and note that whenever j g i then Bi1 n B A1 u M n Aj u Ni 1 14 where Ni U is a null set De ne C1 B1 and iteratively de ne CH1 i1C1 Uv v vUCi The as are disjoint and thanks to the previous display Bi 0 C Q 2 in particular 11 B1 11 Bi Ci 11 Ci1 Since B1 C1 this shows that for all i 11Bi 116 we have used the fact that 11 11 on 5 By the disjointness of the Cj s and the fact that UjCj Uij we get Zj 1Bj 11Uij In other words 11 UjAj Zj 11 Aj and we are done D If m denotes the Lebesgue measure on 011 1 then we can complete l1d 15 01dm to obtain the probability space 0 11d 30 11d where 30 11d denotes the completion of 15 1V and is the collection of all Lebesgue measurable sets in 0 11d Likewise we could de ne 0 11d EON etc We have now de ned the Lebesgue measure ME of E C Rd for a large cl s of E Exer 1 5 shows that one cannot de ne ME for all E C Rd and preserve the all important translation invariance of the Lebesgue measure 5 Proof of Carath odory Extension Theorem The proof of Theorem 1 13 is somewhat long and relies on a set of ingenious ideas that are also useful elsewhere Throughout Q is a set and A is an algebra of subsets of De nition 122 A collection of subsets of Q is a monotone class if it is closed under increasing unions and decreasing intersections Lemma 123 An arbitrary intersection of monotone classes is a monotone class In particular there exists a smallest monotone class containing A De nition 124 The smallest monotone class that contains A is written as mcA and is called the mono tone class generated by A The following result is of paramount use in measure theory Theorem 125 The Monotone Class Theorem Any monotone class that contains A also contains 7A In other words mcA 7A Before proving this let us use it to prove the uniqueness assertion of Carathedory s extension theorem Proof of Theorem 113 Uniqueness Suppose there were two extensions 17 and 1 Clearly the collection 8 z E E 7A 1E E is a monotone class that contains A Thus 8 7A which is another way of saying that 1 and 17 agree on a D Proof of Theorem 125 Since 7A is a monotone 7A D mcA and it suffic 7 to show that mcA 2 7A the proof is nonconstructive First note that the following are monotone cl 39 8 81 E e 7A E393 e mcA 82 E E 7A VF E 7A EUF E mcA 1 15 Since 81 is a monotone class that contains A we have 81 2 mcA and this means that mcA is closed under complementation If we also knew that A Q 82 then we could deduce that 82 2 mcA which would imply that mcA is closed under nite unions and is therefore a sigma algebra This would show that mcA 2 7A and complete our proof However proving that A Q 82 requires one more idea Consider 83 z E E 7A VF E A EU F E mcA 1 16 Since it is a monotone class that contains A 83 2 mcA and in particular 83 2 A By reversing the roles of E and F in the de nition of 82 we can that 82 2 A as well and this is what we needed to prove D Proof of Theorem 113 Existence Optional Reading Producing a completer rigorous proof takes too much effort so I will outline a proof instead The main idea is to try and prove more Namely we will de ne in a natural way 17E for all E Q 2 This de nes a set function 17 on the power set 43 or 2 which maybe too big a sigma algebra in the that 17 may fail to be countably additive on 43 However it will be countably additive on 7A Now we ll in more details For all E g 2 de ne d3 d3 17E infZyEn x021 EjeAedeg U En 117 nl nl In other words the in mum is taken over all sequences E1E2 in A that cover E This ought to be a natural and appealing extension of 11 The proof proceeds in three steps Step 1 Countable Subadditivity of First one shows that 17 is countably subadditive on 43 In the jargon of measure theory 17 is an outer measure on 9amp3 Indeed we wish to show that for any AIA2 all subsets of Q MUSf An g 22 MA To this end consider any collection AM of elements of A such that for all n An 9 UfilAm By the de nition of 17 uid 3 2 C 2 1113397 On the other hand once more using the de nition of 7 n1 we that for any 5 gt 0 we could choose the AM s such that MAN 3 52 n MA This yields 17Uff1An g 5 22 MA which is the desired countable subadditivity of 17 since 5 gt 0 is arbitrary Step 2 17 extends 11 Next one shows that 17 and 11 agree on A so that 17 is indeed an extension of 11 Since for all E E A 17E 3 ME we seek to prove the converse inequality Consider any collection E1E2 of elements of A that covers E For any 5 gt 0 we can arrange things so that 221 ME 3 17E 5 Since 11 is countably additive on A ME 3 11Uff1 n g 23 ME 3 17E 5 Since 5 gt 0 is arbitrary Step 2 is completed Step 3 Countable Additivity We now complete our proof by showing that the restriction of 17 to 7A is countably additive Thanks to 10 Step 1 it suffices to show that for all disjoint AlA2 all in 7A 231 17An 3 1103114 With this in mind consider ME QVFEA E E F EnFn 11s Clearly M contains A Thus thanks to the monotone class theorem if M were a monotone class then 7A 9 M This shows that 17 is nitely additive on 7A which in turn implies that for any N Z 1 17U ilAn Z 17U39f1An 251 17An since the An s were disjoint Step 3 whence the Carathedory extension theoremifollows from this upon letting N T 00 Owing to Step 1 it suffices to show that the following is a monotone c EgmVFeA Eg EnF EnF3 119 This is proved by appealing to similar covering arguments that we used in Steps 1 and 2 D 6 Exercises Exercise 11 Prove Lemma 1 3 Exercise 12 Prove Lemma 1 9 HINT Any countable union can be expressed as a disjoint countable union Exercise 13 Prove Lemma 1 11 Exercise 14 Prove Lemma 1 23 Exercise 15 Prove that Lebesgue measure is translation invariant i e the measure of a A is the same as the measure of A provided a A and A are Borel measurable Here a A z y y E A Furthermore if ma denotes the Lebesgue measure on 0a 15 0a for a given a gt 0 prove that for all measurable A Q 0a KIA is in 15 0 1 and maA am1a 1A In other words prove that Lebesgue measure is also scale invariant Exercise 16 Suppose 9511 is a measure space Let Q be a set and let 5 denote a sigma algebra of subsets of Q39 If f 2 gt 239 is measurable show that 11 0 j 1 is a measure on QCE where no f 1A 11w E Q fw E 120 Exercise 17 In this exercise we construct a set in the circle S1 that is not Borel measurable As usual we can think of S1 as a subset of C S1 69 9 E 027rj 121 a Given any z 6mm em 6 SI we write z 11 if a 6 is a rational number Show that this de nes an equivalence relation on S1 b Use the axiom of choice to construct a set A whose elements are one from each equivalence class of SI 11 o For any rational a E 027r de ne Ac emA denote the rotation of A by angle a and check that if 16 6 027r 0 Q are distinct then AC 391 A5 125 d Let 1 denote the Lebesgue on 5133151 and show that MA is not de ned Lebesgue measure on S1 is de ned as m 0 f4 where m is the Lebesgue measure on 0 2n and f 0 2n gt S1 is an isometry of 1 20 for the de nition of m o f 1 HINT Note that S1 Uae072 qua is a countable disjoint union e Conclude that A is not Borel measurable Chapter 2 A CrashCourse in Integration 1 Introductlon We are ready to de ne nearly household terms such as random variables expectation standard devi ation and correlation To give a brief preview of what we are about to let me mention a A random variable X is a measurable function a The expectation 8X is the integral fX d of the function X with respect to the underlying proba bility measure a The standard deviation and the correlation are the expectations of certain types of random variables Thus in this chapter we will describe measurable functions as well as the abstract integral X I and with many of its salient features 2 Measurable Functions Throughout 2 311 is a measure space De nition 21 A function f 2 gt Rquot is Borel measurable if for all E E 15 Rn fquot E E E Measurable functions on probability spaces are often referred to as random variables1 and written as X Y instead of g In the context of probability spaces measurable are often referred to as events D Since fquot E z w E Q ux 6 E f is measurable equivalently f is a random variable if and only if the preimages of all measurable under f are themselves measurable Example 22 An important example of a measurable function is the indicator function of a measurable set Indeed suppose A E E and de ne the indicator of A as the function 1 ifuv E A 1 2 1 AM 0 m 6 AC 1The correspondence between random variables and measurable functions as well as that of measure sp probability theory appears clearly in Frechet 1930 39 vs intuitive 13 We have already seen such functions in 1 1 Note that quite generally 14 2 gt R and consider I E for a measurable A Q R It is easy to that A ifleEbutOEEc AC ifOEEbutlEEc 2 if 01 6 E e if 01 6 Ec 12105 2 2 Since A E 5 it follows that 14 is measurable D Checking the measurability of a function can be a painful chore The following alleviates some of the pain most of the time Lemma 23 Suppose that A is an algebra of subsets of 15 Rn and that for all A E A j 1 A E S Then f 2 gt Rquot is measurable Proof Since A E A f 1 A E E is a monotone class the lemma follows from the monotone class theorem Theorem 1 25 D We can use the above to produce some measurable functions then Lemma 24 Suppose ff1f2 9 gt Rquot and 91Rquot gt R a Ifg Rquot gt R is continuous then it is measurable b Ifff1f2 are measurable then so are of for any a E R f1 f2 and f1 x f2 6 If f1f2 are measurable then so are supn fn infn f limsupn f and limian fn d Ifg and f are measurable then so is their composition gf Proof I will prove this to remind you of the flavor of the subject If g Rquot gt R is continuous then for all open G Q R gquot G is open and hence Borel measurable in Rquot Part a follows from Lemma 2 3 The functions 9a an and gay a y and gay my are all continuous on the appropriate Euclidean spaces So if we proved d then b would follow from a and d But d is an elementary consequence of the identity 9 o fquotA fquotgquotA It remains to prove c Let 5a supn fnw and note that for all a E R Squot ooa nfnquot ooa E 5 Consequently for all reals a lt y S 1 ay S 1 ooy S 1 ooa E E The collection of nite disjoint unions of of the form ya is an algebra that generates 15 R Therefore by Lemma 2 3 supn fn is measurable Apply part d to 9a a to that infn fn supn fn is also mea surable Now limsupn fn infk supngtk fn infk ht where hk supngtk fn Since we have seen that denumerably many supremum and in mum operations preserve measurability limsupn f and hence limian fn limsupn fn are also measurable D 3 The Abstract Integral Throughout this part 9511 denotes a nite measure space unless we explicitly specify that 11 is sigma nite We now wish to de ne the integral f 111 for measurable functions f 2 gt R Much of what we do here works for sigma nite measure spaces using the following localization method Find disjoint measurable K1K2 such that UnKn Q and MK lt 00 De ne in to be the restriction of 11 to Kn ie 11A MK 0 A for all A E E It is easy to that in is a nite measure on 23 Apply the integration theory of this module to 1m and combine the integrals fd fd n For us the details are not worth the effort After all probability measures are nite The abstract integral is derived in three steps 31 Elementary and Simple Functions When f is a nice function f d is easy to de ne Indeed suppose f CIA where A E E and c E R Such functions are called elementary functions Then we de ne f d 611A More generally suppose A1 An 6 E are disjoint m an E R and f z 2 1514 This is measurable by Lemma 2 4 and such functions are called simple functions For them we de ne f 111 221 ajiAj This is well de ned in other words writing a simple function f in two different ways does not yield two different integrals One proves this rst in the case where f is an elementary function Indeed suppose f 114 b1 5 clc where BC are disjoint It easily follows from this that a b c and A B U C Therefore by the nite additivity of 11 111A MAB cmC which is another way of saying that our integral is well de ned in this case The general case follows from this the next lemma and induction Lemma 25 Iff is a simple function so is f ff 2 0 pointwise then ffdu Z 0 Furthermore if fg are simple functions then for a b E R afbgdiafdibfdi 23 In other words for simple functions f gt gt f d is a nonnegative linear functional A consequence of this is that whenever f g g are simple functions then f d g yd In particular we also have the following important consequence f d g f 111 32 Bounded Measurable Functions Suppose f 2 gt R is bounded and measurable To de ne f d we use the following to approximate f by simple functions Lemma 26 Iff 2 gt R is bounded and measurable then we can nd simple functions n 12 such that in T f fn l f and fn g in 2 quot pointwise By combining this with Lemma 2 5 we can that o d g fn 111 g in d Tnym for all n 2 1 and o fde limn in d limn fn 1 exists and is nite 15 This produces an integral f 111 that inherits the properties of fn d and f d described by Lemma 2 5 That is Lemma 27 Iff is a bounded measurable function so is f ff 2 0 pointwise then ffdu Z 0 Further more if fg are bounded and measurable functions then for ab E R afbgdiafd bfd 24 33 The General Case We now define fd for any measurable f 2 gt IL note these functions are nonnegative To do so define fn minfn which is measurable thanks to Lemma 2 3 Of course 0 g fnw g fw and fnw T fw at every point a E Q In particular fn d is an increasing sequence thanks to Lemma 2 7 Its limit which could be 00 is denoted by f d and inherits the properties of the integrals for bounded integrands In order to de ne the most general integral of this type let us consider an arbitrary measurable function f 2 gt R and write f f f where fw maxfw0 and f w max fw0 Both fir are measurable Lemma 2 5 and if f 111 lt 00 then de ne ffdu jquotL d f d This integral has the following properties Proposition 28 Let f be a measurable function such that ffd1 lt 00 If f 2 0 pointwise then ffdu Z 0 Ifg is another measurable function with 9 d lt 00 then for ab E R afbgdiafd bgdy 2 5 Our arduous construction is over and gives us an indefinite integral We can get de nite integrals as follows For any A E 2 de Lfdy fflAdji 2 6 and this is well defined as long as If f d lt 00 In particular note that f d f d Remark 29 Sometime we write f w 11dw in place of f d D De nition 210 We say that f is integrable with respect to the measure 11 if f 111 lt 00 On occasion we will write fw 11dw for the integral f d This will be useful later when f will have other variables in its definition and the 11dw reminds us to integrate out the a variable D De nition 211 When 95 is a probability space and when X 2 gt R is a random variables we write 8X fX d and call this integral the expectation of X When A E E i e when A is an event we may write 8X A in place of the more cumbersome 8X14 or L X I D 16 4 Modes of Convergence There are many ways in which a function can converge We will be primarily concerned with the following Throughout 2 311 is a measure space and f f1 f2 2 gt R are measurable De nition 212 We say that fn converges to f 11 almost everywhere written 11 a e or a e if it is clear which measure is being referred to if 1LUEQ ligs pfnw fwgt0 0 2 7 In order to expedite the notation we will write f E A for w E Q ux 6 A and lf E A for 11f E A In this way fn gt f a e if and only if lfn 75 f 0 In the case 95 is a probability space and when X X1 X2 are random variables on this space then we say that X converges to X almost surely written a s in place of almost everywhere D One also has the following De nition 213 We say that fn gt f in Ll m if limnnm Hf fp 0 We say that fn gt f in measure if for all 5 gt 0 limnnwy fn f 2 5 0 When 95 is a probability spalt 39 when X X1X2 are random variables on this space we say that X converges to X in probability if limn Xn X Z 5 0 for all 5 gt 0 Sometimes we may write this as X 2 X D Here are how the notions of convergence are related Theorem 214 Either almosteverywhere a or LP a implies in measure We will soon that the converse is false The interesting portion of this result relies on Theorem 215 Markov s Inequality If f is a normeyative element of L1 11 then for all A gt 0 1 1 w 2 A s 7 My 3 inn 2 s A m A Proof Notice that f1 ffdu Z fUN fd Z Aidf Z A Divide by A gt 0 60 niSh D Corollary 216 Chebyshev s Inequality 2 For any 10 gt 0 f E Ll m and A gt 0 1 P 1 P mm 2 A s Em m 1113 Quint lt2 9 2Clielgtysliev was the rst to develop such concentration inequalities ee Chebyshev 1816 1867 Markov who was a student of Chebyshev at the time noted the full power of Chebyshev s inequa ty 17 Proof Since if Z A if1 Z a we can apply Markov s inequality to the function f to nish D Proof of Theorem 214 Thanks to Chebyshev s inequality LP lttonvergence implie39 convergence in measure since 1fn f 2 5 3 PM f 13 gt 0 To prove that a e convergence implies convergence in measure we need to understand a e convergence better Indeed note that fn gt f if and only if 00 00 00 1 1 lfn f2lt0 210 gnaw g lt Since 11 is continuous from above this is equivalent to the following For all 5 gt 0 lggouU lfn f125 0 lt2 11 nN But the above measure is clearly greater than or equal to lfN f 2 5 which implies convergence in measure Here are two examples to test the strength of the relations between the various modes of convergence The rst also introduces the Borel Steinhaus probability space which was the beginnings of modern probability theory Example 217 The Borel Steinhaus Probability Space Let Q 01 CD the Lebesgue measure on Q and E 2amp9 For all w 6 01 de ne n if 39 lt Xnw 0 m gt 2 12 where a gt 0 is xed Then Xn gt 0 almost surely in fact for all w E 2 but for p Z aquot XnHllj napquot is bounded away from 0 so a s convergence does not imply LP lttonvergence The trouble comes from the evident fact that supn Xn is not in L1 CP here Indeed if supn Xn1 we39re integrable then a s convergence would imply Lp lttonvergence thanks to the dominated convergence theorem cf Theorem 2 22 D Example 218 Let 95 be the Borel Steinhaus probability space of the previous example We will now construct a family of random variables Xn such that limn Xnw does not exist for any w 6 01 in particular Xn does not converge a s and yet limn Xan 0 for all p gt 0 This construction may look complicated on paper but is completely apparent once you draw a picture of the Xn s We de ne a triangular array of functions f Vi Z 1 j 3 2iquot as follows First let fmw 1 for all w 6 011 Then de ne 2 ifwe 05 0 ifwe 51 p i 2 13 f2quot 1 0 otherwise fuw 2 otherwise In general for all i Z 1 andj 1 2iquot we have f i1j12 z 1j2 z 11 Let us enumerate the f j s according to the dictionary ordering and call the resulting relabeling X k ie X1 f X2 f2 X3 f X4 f3 X5 f It is clear that for all w 6 011 limsupH00 Xkw 00 whereas lim infkaoo X km 0 In particular the random variables X1X2 do not possess limits at any w On the other hand note that fMHp i1 2 uquot Consequently Xk gt 0 in L1 i for all p gt 0 even though there are no a s limits D 18 5 Lebesgue s Convergence Theorems Proposition 2 8 expresses two of the most important properties of the abstract integral i Integration is a positive operation i e if f 2 0 then f d Z 0 and ii it is a linear operation i e equation 2 5 We now turn to some of the other important properties of the abstract Lebesgue integral that involve limiting operations You should recall that unless stated otherwise all measures are assumed to be nite Theorem 219 The Bounded Convergence Theorem Suppose f1 f2 are measurable functions such that supn fn is bounded by some constant K If f limnnm fn exists pointwise then 9 fndu Mu 2 14 Proof Since f s K pointwise and since 11 is a nite measure f is also integrable so that the integrals all exist Now x an 5 gt 0 and let En w E Q fw fnw gt 5 Then by Proposition 2 8 fdu indie s If fnldu lf fnldwr lf fnldu 2 1quot 133 t En S 5119 21910911 As H gt so 11En gt 0 cf Theorem 2 14 We can then let 5 i 0 to nish D Theorem 220 Fatou s Lemma If 1 is sigma nite then for any sequence of integrable nonnegative functions f1f2 liminf fn d g liminf fn d 2 16 naoo naoo Proof We rst prove this under the additional condition that 119 lt 00 Let gn inf fj and note that as n gt so we have 9 T f liminfk fk pointwise In particular for any constant K gt 0 f AK gnAK is a bounded measurable function that converges to 0 pointwise as n gt 00 Thus by the bounded convergence theorem Theorem 2 19 and since 9 g f limigffndjizligIgnAKd fKd1 217 11 It suffices to show that lim ffKd fd 218 KToo Now for any 5 gt 0 nd a simple function S such that i 0 g S g f pointwise ii there exists C gt 0 such that 5a g C and iii de11 Z ffdu 5 Now ff A K 111 Z fS A K 111 deji 2 ffdu 5 if K gt C This proves 2 18 and hence the result in case 11 is nite In the general sigma nite case for any 3Much of the material of this section was developed by Lebesgue 1910 Notable exceptions are the Levi monotone conver genoe theorem Theorem 221 of Levi 1906 and Fatou s lemma Theorem 220 of Fatou 1906 19 5 gt 0 we can nd a measurable set T such that 110quot lt so and fd Z f d 5 What we showed in the nite case shows that limian fn d Z f 111 Z f d 5 Since 5 gt 0 is arbitrary this completes our proof D Theorem 221 Levi s Monotone Convergence Theorem If 11 is sigma nite fn T f are all nonneg ative and measurable and iff is integrable then fn 111 T ffdji Proof We have fn d g f d and Fatou s lemma does the rest D Theorem 222 Lebesgue s Dominated Convergence Theorem Suppose 11 is sigma nite and f1 f2 is a sequence of measurable functions such that supm Um is integrable Then limn fn d limn fn 111 provided that limn fn exists Proof De ne 9 suan fj and note that as n gt so 9 i f limk fk Since 9 f is a sequence of nonnegative measurable functions that convergence down to 0 the monotone convergence theorem implies that g f d gt 0 Since 9 Z fn this shows that limsup fn d g fd 2 19 naoo For the converse de ne hn inf fj and note that f h i 0 Apply the monotone convergence theorem once more to obtain f d limn hn 111 g limian fn d Together with the preceding display this does the job D 6 LpSpaces Throughout this section 2 5 CD denotes a probability space We can de ne for all p E 0 so and all random variables X 2 gt R HXHp SHXV DW 2 20 provided that the integral exists ie that X1 is integrable De nition 223 The space UK is the collection of all random variables X 2 gt R that are p times Clintegrable More preci ly these are the random variables X such that X H1 lt 00 When we really need to know the probability space we may write LP 5 CD in place of UK D More generally if 9511 is a sigma nite measure space LP 11 will denote the collection of all measur able functions f 2 gt R such that Hpr lt 00 Next we list some of the elementary properties of Lp spaces It should be pointed out that the following properties do not rely on the niteness of 11 Theorem 224 If 11 is sigma nite then i For all a E R and f E Ll m afp a f and whenever fg E Ll m so is fg In particular Ll m is a linear space ii Holder s Inequality Suppose p gt 1 and de ne its socalled conjugate 1 by p 1 1 1 1 Then Hfglli S Hpr H911 Provided that f E 11 and E H 20 iii Minkowski s Inequality If fg E Ll m for some p Z 1 then Hf ng 5 Hpr Hng Proof It is clear that Hapr a jug On the other hand note that for ay E R ay1 g 21 a1 y1 Consequently Hf 9 g 21 f11j Hgng and this proves part i Holder s inequality holds trivially if f H1 or HgHI are equal to 0 Thus we can assume without loss of generality that fp HgHI gt 0 Let Mat p43 1quoty 1 my a Z 0 where y 2 0 is xed and observe that t is minimized at a y Il and the minimum value is 0 In other words my 3 p43 1quoty 1 Replace a and y by Fw fwfp and Gw gwgq respectively and integrate to obtain 11mm m d S w M Ma p 1 Minkowski s inequality follows from Holder s inequality as follows Note that 3 y g a 3 yquot y 3 yquot Thus by Holder s inequality for any a gt 1 1lzi 221 p q lf9lpdu If my 111 19 f9quotquotd 2 22 115 3 mm 11an If grltwgt 11 where 6 is the conjugate to a i e aquot 6quot 1 Choose a p and note that 6 q solves 6p 1 p This yields Hf 9113 S Hfllp 1191112 Hf 9113quot 2 23 If Hf 91 0 Minkowski s inequality holds trivially else we can solve the preceding display to nish D An important special case of Holder s inequality is the following While it is an immediate consequence set p z 2 it is sufficiently important that it deserves special mention Corollary 225 The can hi quot i 39 u 2 km i If m 6 L201 then 1Hng s Hsz 119112 De nition 226 A function y v R gt R is convex if for all A 6 01 and all ay E R 1 A3 1 Ay 3 WW 1 Andy D Theorem 227 Jensen s Inequality Suppose 11 is a probability measure Ify v R gt R is convex and if off and f are integrable then figi39fd Z 1 ffd Proof The idea is simple Since y v is convex there are affine4 functions 99EK such that Wat sup agar Va 6 IL 2 24 261 Therefore by Proposition 2 8 W db 2 gag 1M1 db sup Mu 2 25 4Retall that It is af ne if it is of the form Mat aa b Here is where we need 11 to be a probability measure The right hand side is equal to 12 fd which yields the result To complete this proof we need to verify 2 24 But this too is easy to prove draw a picture For any 5 gt 0 de ne 1W 5 12W DEW 5 2 26 Since A5 3 A3 5 1 A3 for all A 6 01 125 3 g A123 5 1 A123 Collect terms to deduce that for all A 6 01 0323 2 01323 In other words 5 gt gt 03123 is increasing and this means that D123 limgw 03123 exists For each fnced z E R and 5 gt 0 de ne the af ne function 123 3 zDL12z 12z V3 6 R 2 27 Then you should check that o 12 is af ne and hence so is 1253 limgw 12 3 o 12 z 121 and 12 z 5 12z 5 In particular 1251 121 0 For all 6 gt 0 12 3 g 12 53 whenever 3 2 z and 12 3 Z 12 53 whenever 3 g z The last two observations together imply that 12 3 s 123 for all 3 Q zz 5 Let 5 1 0 to that 1253 3 123 and 1251 12z This proves 2 24 D An important consequence of this is that in the case that 11 is nite the Lp sp1ltes are nested Proposition 228 Monotonicity of LPSpaces If 119 lt 00 and r gt p Z 1 then L i Q L1 11 In fact for all f 6 MW 1 Hpr 5 119 1 11er 2 28 Proof We will derive the displayed inequality the proposition follows readin from that Since this is a result that only involves the function f we can assume without loss of generality that f 2 0 Furthermore by replacing f by f A K proving 2 28 with f replaced by f A K and then letting K T so Theorem 2 21 justi es this part we can assume without loss of generality that f is bounded and hence in L 11 for all v gt 0 Note that for any 5 gt 1 the function 123 35 is convex Let s rp and apply Jensen s inequality Theorem 2 27 to deduce that when 11 is a probability measure mm 21111113 3 mm d Him 2 29gt which is the desired result If 119 gt 0 is nite but not equal to 1 de ne 174 11A11Q This is a probability measure and according to what we have shown thus far mm s 111111112 2 30 Solve for 11 integrals to nish Finally if 119 0 then the result holds trivially D Fix any 10 Z 1 and for all f g h 6 LP 11 de ne df g Hf 91 According to Minkowski s inequality Theorem 2 24 d has the proper a dff 0 b dfg g df hdhg and c dfg dgf 22 In other words if it were the case that df g 0 2 f y then d0 0 would metrize L1 i However this latter property generally does not hold since we could let 9 f 1A where A y s and 11A 0 to that g g f but 1039 140 lf 1 MW 0 why Nonetheless if we can identify the elements of L1 i that are equal to each other outside a null set then the resulting collection of equivalence classes endowed with the usual quotient topology and Borel sigma algebra is indeed a metric space It is also complete i e every Cauchy sequence converges Theorem 229 Let 9511 denote a sigma nite measure space For any fg E Ll i write f 9 if and only iff g jialmost everywhere ie 11w fw 7 gw 0 Then is an equivalence relation on Ll i Let f denote the orbit off ie f E f if and only iff 9 Let Ll i f f E L1 i and de ne fp fp Then Lquoti is a complete normed linear space Moreover L 11 is a complete Hilbert space Henceforth we will not distinguish between LP 11 and LP 11 This should not cause any confusions Also note that LP 11 is the quotient space LP 11 Proof The fact that LP 11 and hence LP 11 is a linear space has already been established cf Theo rem 2 24 As we argued a few paragraphs earlier df g Hf 91 is a norm now on LP 11 if we know that df g 0 gt f 9 but this is obvious To prove completen suppose fn is a Cauchy sequence in L1 11 It suffices to show that fn converges in L1 11 Translate thi a statement about fn1 s Recall that fn is Cauchy means that Hf mep gt 0 as n m gt 00 Thus we can nd a subsequence m H2 such that Hme fm Hp 5 2 In particular 2k Hme fm 12 lt 00 Thanks to Minkowski s inequality and the monotone convergence theorem to get Minkowski s inequality to work for an in nite sum O lt 00 2 31 33 Z ifmvz k1 P Remember that integrals of nonnegative functions are always de ned but could be in nite this is true even if the function is in places in nite In particular 2 k fm 1 fnh converges balHIOSt everywhere i e for all but a null set of w E Q Let f denote this sum By Fatou s lemma f 6 LP 11 and by the triangle inequality for LP nrn1s i e by Minkowski s inequality Hf fm p g 2 Wm f H1 gt 0 as h gt 00 Using Minkowski s inequality once more we that for any Nk Z 1 Hf fNHp g Hf fnh p Hfm fNHp Let N gt so and h gt so in this order to that fn gt f in L1 i To nish this proof we need to show that L2 11 has an inner product but this is f g z fg d which thanks to Holder s inequality is nite for all fig 6 L201 D 7 Exercises Exercise 21 Given any p1 pn gt 0 such that 231 pV 1 and an 3 gt 0 prove that 1121 313quot 3 231 pvatv Prove also that this is a strict inequality unless an a 7n HINT Use Jensen s inequality Theorem 2 27 using the convexity of the exponential function Exercise 22 Let Ma a 1 a and given any two random variables X and Y de ne d X Y 8 X Y Prove that d metrizes convergence in probability 23 Exercise 23 Suppose Xn and Yn are two sequences of random variables such that Xn 2 X and Yn 2 Y Show that whenever f is a continuous function of two variables then fXnYn 2 fXY This is called Slutsky s Lemma Exercise 24 Suppose X1X2 are random variables that converge in probability to a random variable X Prove that for any subsequence 7 there exists a further sub subsequenltxe m7 such that as j gt so thi gt X almost surer Exercise 25 Prove Chemo s inequality For any nonnegative random variable X and all A gt 0 CPXZA exp A ln8egx 232 HINT Apply Markov s inequality Theorem 2 15 to eEX Exercise 26 Suppose Q 011 5 36 and 11 is the Lebesgue measure on 23 Prove that con tinuous functions are dense in Ll i for every p Z 1 That is given 5 gt 0 and f E Ll m we can nd a continuous function g 0 11d gt R such that Hf 91 g 5 Exercise 27 Prove the generalized H lde39r inequality Given n random variables X1 Xn and In 7m gt 1 that satisfy 23211921 1 5 111221 X51 5 H221 HXsz HINT You can use Exercise 2 1 for instance Chapter 6 Weak Convergence 1 Introduction The central limit theorem of de Moivre 1733 1738 1756 and Laplace 1810 is without a doubt one of the fundamental result of classical mathematics To describe it consider performing a sequence of independent random trials each of which has two possible outcome Success or failur uppose also that the chances are p 6 01 that any given trial succeeds and 1 10 that it fails Let Tn total number of successes in n trials Then it is easy to that rm k 3M0 pquot Vic 01m 61 and 0 if k Q 1 n Then the de IvIoivre Laplace central limit theorem states that for any a lt b 1 y Tquot p h f7 fig2 dz 6 2 im a lt lt quot43 WIN 9 39 a WW In other wor s in some sense Tn is approximately normally distributed with parameters 11 up and a up 1 p The classical proof of the de Moivre Laplalte theorem is combinatorial and is a tedious application of the de Moivre Stirling formula2 A n 11m 1 63 have 2m1t ein that is itself a simple but tedious exercise In this chapter we discuss the modern approach to this and its various generalizations most other mathe maticians call this weak convergence weak convergence typically means something else 5 E 3 B 4 E 4 D n 3 D E 54 O a O H s D L w o a 4 D H U D E 39E A a a 6 v H 3 E 1 2 1For a detailed historical account of the classical central limit theorem see Adams 1974 2This is also known as Stirling s formula The original formulation of Stirling s formula is due to A de Moivre who showed that there exists a constant B such that inn lnB n lnn n 361 3 12610713 see de Moivre 1738 The displayed Stirling formula with x27r replaced by B follows readily from this The contribution of Stirling 1730 was in proving that B 7r 59 2 Weak Convergence De nition 61 Let X denote a topological space and suppose 11111112 are probability measures on X 15 X We say that 11 converyes weakly to jl and write 11 2 11 if for all bounded continuous functions f X gt R 39 fdim fdu 64 If X is an X valued random variable with distribution 11 and if X is an X valued random variable with distribution 11 then we also say that X converges weakly to X and write this as X1 X This is equivalent to saying that for all bounded continuous functions f X gt R limn 8fX 8 f X D The following important characterization of weak convergence on R is due to P Levy Theorem 62 Levy 1937 LetF be the distribution function of 1 0 probability measure on R 15 R and F that ohm also probability measures on R 15 R Then 11 Q 11 if and only if limnn30 Fw Fw for all a E R at which F is continuous Equivalently in terms of random variables X 2 X if and only if X g 113 gt X g for all a such that X 0 Remark 63 To get a glimpse of how the above may be useful to us consider the setting of the de Moivre Laplace central limit theorem of 62 let X n Tn np npl p X z a normal random variable with 11 0 and a 1 F1 the distribution function of X and F the distribution function of X Observe that i F is continuous everywhere and ii equation 62 assert hat F1 b F a gt F b F a Thanks to the preceding theorem then 62 is saying that X 2 X D In order to prove the preceding theorem we first need a simple lemma that is really a result about non decreasing right continuous functions that have left limits everywhere Note that any distribution function is of this type why Lemma 64 The set 113 E R X 113 gt 0 is denumerable Thus in this sense X1 converyes weakly to X if and only if Fnw gt Fw for most values ofw E R Proof The set of the 27 s in question is the same as the set of all a at which F jumps where F is the distribution function of X ie F at X S So we will show that F can only have a denumer able number of jumps Let 1 denote the collection of all a such that F fl F 27 Z nquot Clearly 1 Foo F oo Z ZiEJJFQr Fw Z nquotJ where denotes cardinality Therefore J is nite and hence U1 J is denumerable D Proof of Theorem 62 It should be clear that the statement about X1 2 X is equivalent to the statement about 11 gt 11 so we need only to prove the statement about the random variables Suppose first that X gt X For any fmed a E R and 5 gt 0 it is easy to find a bounded continuous function f R gt R such that for all y E R fy g 1307y g fy 5 For instance f could be the piecewise linear continuous function such that i for all z S w fz 0 ii for all z 2 w 5 fz 1 and iii for all z E 5 fz aquot z 3 Then 8fX g Fw g 8fX 45 Let n gt so to deduce from this that 8mm s Igggfme limsupFltw sz 5 65 60 Since y a wtkg y the left most term is greater than or equal to F at 5 and a similar reasoning shows that the rightmost term is greater than or equal to F a 5 This tells us that if F were continuous at at then 171M gt Fw as asserted For the converse suppose that for all continuity points a of F F at gt F III we wish to prove that X 2 X Equiva ently that for all bounded continuous function f R gt RH 8fX gt 8fX Note that f 2 0 here but this is not a restriction since otherwise we would consider f and f separately For any N gt 0 we can nd an increasing collection of points 0 270 lt 271 lt 272 lt E R and write 11L wi such that i max SN supyemtmim fy aw g 5 ii F is continuous at for all i E Z and iii limnwn 00 and liman 00 We need Lemma 64 part ii By i 3 N fXnE anl S wN 2 NM Ezwj1 Fn j E 7 66 N lt gt 8fXne1Xn1 m Z fw gtXneltww11gs a E 339le The same is true if we replace X1 and F by X and F respectively Since N is a large but fnced number lim 2 j ltN frrj Fxj1 Fxj 2 j ltN frrj Frrj1 Frrj Therefore the fact that a gt 0 is arbitrary shows us that for all N gt 0 13278fXn9 anl g 513N g ZEN 67 For the remainder terms let K sup K y to that limsupS fX an gt aw g Klinlsup Xn gt 2w 3 K lim 1 FnxN E ZN 1t4 w haw ax 68 K 1 mm Fem But for all 5 gt 0 there exists No gt 0 so large that the last term above is g g for N N0 Thus there exists no so that for all n 2 no 8fXnXn gt 27 3 K5 On the other hand limNn30 maxngm8fXXn gt 27 g KlimNmeaanmUnian gt 273 0 since the maxi mum is over a nite Thus we can nd N1 such thzt maanU 8fX anj gt 27M 3 5 Let N2 N0 V N to that for all n 2 1 8fX anj gt 27 3 K5 Finally there 7 73 gt 0 such that 8fX X gt 27 3 K5 Let N4 N3 V N2 and apply 67 with N N4 to deduce that 1i lSllp8fXn fX S K 69 have Since 5 gt 0 is arbitrary this concludes our proof D 3 Subsequential Weak Limits and Tightness Throughout this section let X1X2 denote a sequence of real valued random variables 111112 their distributions and F1 F2 their distribution functions respectively We begin with the following sequential compactn s result 61 Theorem 65 Helly s Selection Criterion If F1F2 are distribution functions then for any se quence of positive integers n1 lt n2 lt there exists a further subsequence n 1 lt n 2 lt and a nondecreasing rightcontinuous function F30 such that limxH30 Fnimw F30w for all continuity points a of F Remark 66 The function F need not be a distribution function since we need not have F3400 1 For instance suppose the d tr 1 is X z n 1 X 0 Then with probability X converges in probability to 00 and with probability 5 it converges in probability to 0 Equivalently F342 limnFnw 0 ifw lt 0 and it equals if a Z 0 E g o H N Proof The proof is an example of G Cantor s diagonalization method Let r1r2 denote an enu meration of the rationals and consider the sequence F1r1F2r1F3r1 These are numbers in 01 so by compactness we can find a positive integer subsequence n1 1 lt m 2 lt such that F3001 linuH30 me 7quot exists and is in 01 Find a further subsequence n21 lt n21 lt ie of the subsequence n1i s such that F3002 limko30 anmonz M t ince the n2 subsequence is a sub subsequence of the m sequence we also have F3001 limko30 me m Now proceed by induc tion to construct a sequence nj1 lt nj 2 lt such that for all m S j F30 rm limkn30 Fmkrm exists and is in 01 Therefore for all rationals r F300quot linuH30 me exis and is in 01 This defines a nondecreasing function F30 Q gt 01 We can extend it to a nondecreasing function on R by the assignment F342 limrw EQ F300quot By definition F30 is right continuous as well To conclude suppose a is a continuity point of F30 Then for any 5 gt 0 we can find rationals r lt a lt R such that F300 3 F30w 5 3 F300quot 25 Note that limk FMWW F300quot and limk FMWUR F3003 Furthermore since r lt a lt R Fr g Fw g FR for all I Z 1 This shows us that 391 A liminf me 2 lim me r 2 F300quot 2 25 610 lease taco Similarly the corresponding lim sup is than or equal to F3427 25 Since 5 gt 0 is arbitrary this completes the proof Now we address the problem that arose in Remark 66 Namely that some of the limiting distribution functions may be improper We will soon that the only trouble that one can run into as we did in Remark 66 is that the X s can run to in nity with positive probability The following condition insures that this does not happen De nition 67 We say that the Xn s respectively Fn s or jin s are tight if for every 5 gt 0 there A gt 0 such that for all n 2 1 X Z A g 5 Alternatively the Xn s are tight if and only if as A gt so 3Xn Z A gt 0 uniformly in n 2 1 D Theorem 68 If the X s are tight then all subsequential meah limits of the in s are probability measures Conversely if X X then X is tight Proof Suppose limHm F1 U fl F3427 exists for all continuity points a of a nondecreasing right continuous function F30 By tightness F3400 1 and F30 oo 0 Since nondecreasing right continuous functions F with these two properties are distribution functions of probability measures Theorem 54 this completes the first part For the second part choose 5 gt 0 and a continuity point Ag gt 0 for the distribution function of X so large that X 2 Ag 3 Clearly then lim Ag lt X lt Ag limnFnAg 1 Fn Ag Ag lt X 3 A0 2 1 Hence there exists no so that for all n 2 no 62 g lt X1 3 Ag 2 1 5 On the other hand since no is fnced we can choose A1 gt 0 so large that for all n 3 no X 3 A1 2 1 5 as well By letting A Ag V M we can then deduce that for all n X g A Z 1 5 which is another way to state tightness D 4 Harmonic Analysis on R1 De nition 69 The Fourier transform of a probability measure 11 on R BURD is w mi em 11012 w e R 611 730 where i Vfl This still makes if 11 is a nite measure and even if 11 is replaced by a Lebesgue integrable function f as follows ft en dw In this case we are identifying the Fourier transform of the function f with that of the measure 11 where f1lt If X is a real valued random variable whose distribution is some probability measure 11 then 17 is also called the characteristic function of X andor 11 and 17t is equal to Ski This is equal to 8costX i8sintX if you wish to deal only with real integrands D Unfortunat l in most other areas of mathematics a characteristic function is our indicator function and our characteristic function is the Fourier transform Here are some of the elementary properties of characteristic functions You should be sure to understand the extent to which the following properties depend on the measure 11 s being a probability measure Lemma 610 If 11 is a probability measure on R BR then 7 exists and satis es the following a 17 is uniformly continuous on R b supiex W 170 1 and 174 177 c 17 is a nonnegativede nite function ie for any z12 E C and for all t1 t 6 R n n Z 21 tom 2 0 612 i1 k1 Remark 611 The nal equation is sometimes written compactly as t svds 1dt Z 0 for any complex measure 1 whose total mass modulus is bounded D Proof Let X be a random variable whose distribution is 11 so that t 8expitX This is always de ned and bounded since WM 3 1 To prove uniform continui L e note that for any ab E R jei er 11 emu 1a b A 2 whys so that supine ma altsgt1 supine 8amp1 41H g 8 6X 2 By the dominated convergence theorem Theorem 222 as 6 i 0 this goes to 0 and this yields uniform continuity This proves part a Part b is completely elementary and we turn to proving part c But 5 n n n n m 2 22amtmzzserltwhrm8 v m i1 k1 i1 k1 j1 63 which is nonnegative D Example 612 The Uniform Distribution Example 115 Given two numbers a lt b a random variable X is uniformly distributed on ab if X E E z b a 1 dz for all E E 15 R Equivalently the uniform distribution on a b is the same as the Lebesgue measure on ab normalized to have total mass one Check this for intervals rst and then proceed Its characteristic function is then given by Skim b 1716 dz itb 1 1 6 em for all t E R D Example 613 The Exponential Distribution Example 116 Given some number A gt 0 a random variable X is said to have an exponential distribution with parameter A if X E E 15zAe quot39 dz for all E E 15 R Its characteristic function is Skim Afo30 6 11quotquot dz Ait Aquot for all t 6 R D Example 614 The Normal Distribution Example 117 Given two numbers 11 E R and a gt 0 a random variable X is said to have an normal distribution with parameters 11 and a if X E E 27r02quot2 exp 112202 dz for all E E 15 R Its characteristic function is 1 3 2 202 27m fix exp th W dz exp at 2 614 for all t E R Check that 8X 11 and VarX 02 so that we can refer to the distribution of X as normal with mean 11 and variance 2 D 8eitlt Let us mention a few discrete distributions as well Example 615 Discrete Random Variables Suppose that X zj with probability pj 12 where the zj s are real and m gt 0 and 2 m 1 Then Helm 21 eit ipj for all t E R D Here are two noteworthy consequences of this Example 616 Binomial Random Variables Given a number p 6 01 and an integer n 2 1 a random variable X is said to have the binomial distribution with parameters n and 10 if X k pk1 pquot for k 01n and X k 0 otherwise According to Example 615 for all t E R Helm 220 106 quot1 pquot k 106 1 10quot thanks to the binomial formula D Example 617 Poisson Random Variables Given a number A gt 0 X is said to have the Poisson distribution with parameter A if X k e AAkk if k 01 and otherwise X k 0 Its ch stic function is given by Example 615 and is equal to Helm 220 e Ae quotk exp A Ae for t E R D y 1 2 D 5 The Inversion Theorem In this section we state and prove a fundamental theorem of Levy 19373 that roughly speaking shows how to reconstruct a distribution from its characteristic function 3Levy s theorem is a re nement of the more classical Plancherel formula cf Plancherel 1910 1933 The latter is sometime also called the Parseval identity so named after MA Parseval des Chenes 64 Theorem 618 The Levy Inversion Theorem For any two real numbers a lt b and all probability measures it on R BR N it 1 it 1 f tdt Mam ma 14M 615 lim f N430 2712 N Remark 619 Let 1 denote the uniform distribution on ab and recall from Example 612 that 17t itb 1quotem 6 so that the complex conjugate of t is itb 1quote m 6 Thus the inversion theorem can be recast as 1 N 7 lim i t tdt N b 1 N430 27f a ma 2 gm gm 616 Here as in Theorem 618 a part of the assertion is the fact that the said limits exist This is nontrivial since there are examples where 17t v t dt 00 Corollary 620 The Uniqueness Theorem If it and 1 are two nite measures such that for Lebesgue almost every t 6 1R 17t 17t then it 1 Proof If 3 z z E R 11z 0 then according to Lemma 64 3 is denumerable Apply the inversion theorem to deduce that for all a lt b both 2 3 11ab vab Thus it and 1 agree on all disjoint finite unions of open intervals of the form a b where a b 2 3 The collection of all such intervals is 15 R and therefore it 1 on 15 D Now we prove the inversion theorem The proof of Theorem 618 rests on an estimate for the socalled Dirichlet integral ION sinwwquot dz Lemma 621 For any N gt 0 sinwwquot d2 g Proof For all a gt 0 atquot 6quot dr Thus by the Fubini Tonelli theorem Theorem 36 N 30 N ax anaemia dr 617 0 3 0 0 111AMquot Since sinx 11146222 N Nair A v v i 1 e 1 N rs1n1cosr VkNOquot 1111 6399 7 I 1 2 1 r2 6 r 39 In particular for all r gt 0 WWW 1 r2quot 3 MO r1 r2quot 3 2equotW why from which the result follows E Remark 622 In particular as N gt 00 ON sinwwquot d2 converges to boundedly Note that this occurs due to cancellations Indeed f0 gdx 2 d zgj Uin jsin dr 63919 8 j1 14 w 65 Since this last integral is equal to 2 for all j the Dirichlet integral does not converge absolutely D Proof of Theorem 618 Note that for any 16 E R and all N gt 0 N ito39 as N N A N A f e 6 dt costa cost6 dtif s1ntadti s1nt6dt 7N t 7N t 7N t 7N t N A N A 2i s1ntadt2i sint6 div 0 t 0 t since cosine is even and sine is odd To be sure we need to check that the three integrals absolutely converge but this follows from the ready estimat jcosw cosy 3 jar y and j sinw 3 jar both valid by Taylor expansion By Lemma 621 Fubini Tonelli Theorem 36 and the bounded convergence theorem Theorem 219 as N gt so 1 N eiitar 67m A 1 so N Emile ei xib 11tdt Vf do 00 N A N r r 1 dt s1ntx bdt Md 621 7r so 0 t 0 t T Sgnl f a Sgnw a max 620 3 where sgn is the signuni function sgnz 1mm z 1w70z But sgnw a sgnw b equals 1 if either a z a or a b it equals 2 if a 6 ab and otherwise it equals zero The inversion theorem follows from this There are several simpler variants of the inversion theorem that hold under various restrictions Here is a particularly useful one Theorem 623 Fourier Inversion Formula If 17 is Lebesgueintegrable then 11 is absolutely continuous with respect to the Lebesgue measure an A 1 3 A N m and 1 7 f were at 622 27r 30 for fw Finally f 2 0 and is bounded and continuous Proof Once we show the existence of f and the integral formula the nonnegativity continuity and bound edness of f follow readily We will concentrate on the integral representation and existence of f then Since 17 is integrable by the inversion theorem Theorem 618 for any a lt b b ff W dt 623 1 gu all uab S 2 We have used the already used fact that je m e m g b at in order to get the above inequality Let b l a and appeal to the continuity of measure from the outside to deduce that for all a E R 11a 0 Plug this back into the inversion theorem to obtain the following for all a lt b 1 f mm mm 624 27m 730 66 Note that 6 6 it 6 du Since 17 E L1 dz we can appeal to Fubini Tonelli Theorem 36 to exchange the order of the integrals and get 11ab b e i tdt du b mm 625 The rest is an exercise in applying the Radon Nikodym theorem Theorem 42 do it D 6 The GlivenkO L vy Continuity Theorem In this section we discuss a second important tool from harmonic analysis It is a re nement due to Levy 1937 of a slightly earlier result of Glivenko 19364 Theorem 624 The Glivenko L vy Continuity Theorem If 11111112 are probability measures on R BR such that 11 gt 1 then for all t E R limnil t 17t Conversely suppose that for all t E R t limn t exists where t is continuous in a neighborhood of 0 Then there exists a probability measure 1 on R 15 R such that i t 3 and ii 11 2 1 Proof of the First Assertion The rst portion of the proof is simple If 11 gt 11 then for all bounded continuous functions f R gt C limn f 111 f 1115 We can choose fw 6 where t E R is ved to deduce that limn 17quott 17t D For the converse half we need a tightness estimate Roughly speaking it tells us that the degree of smoothness of a Fourier transform near zero regulates the tails of the corresponding probability measure Lemma 625 IfX is a random variable with distribution 11 then for all A gt 0 A 2x 33le Z A S g 1 170 dt 626 720 Proof For any 5 gt 0 17tdt eit 11dwdt em dt1d1 by Fubini Tonelli Theorem 36 Since 6 dt 2wquot sin5w 1 tdt25 11dw225 Since jsinyj 3 lg the integrand is nonnegatiye This is the key observation for then 1 tdt225 22 1 Use siny g 1 and the fact that MM 2 25 WAX 2 25 to obtain the bound 5X 2 25 3 j1 t dt Let 5 2 to nish D 121 627 628 4Glivenko s preliminary version of this theorem states the following If an and u an all pmbabilily measuies on R 9306 then an 5 u if and only in gt i3 poinlwise The present version due to P Levy is an extremely useful adaptation 5The careful reader may notice that f is allowed to be cOmplexvalued here This is acceptable since we may consider Ref and 1m f separately 67 Proof of The Second Assertion of The Continuity Theorem Throughout let X denote a random variable whose distribution is 111 Thanks to the preceding lemma for all A gt 0 and all integers n 2 1 X Z A g g 7105 dt The bounded convergence theorem Theorem 219 and the assumptions on 17 together imply that limsupn X Z A g g 1 0 dt Since 4 is assumed to be continuous in a neighborhood of the origin for any 5 gt 0 we can nd no 2 1 and Ag gt 0 such that supnmo X 2 Ag 3 5 On the other hand since no is xed we can nd A1 such that maxnma X 2 A1 3 5 Let A Ag V A1 to that for all A 2 Al sup X Z A g 5 That is the sequence X is tight Theorem 68 insures us that all subsequential weak limits are probability measures That is whenever 11 limk yum exists weakly then 1 is a probability measure By the rst portion of this proof for all t E R lim t Mt which tells us that the characteristic function of any of the subsequential weak limits equals identically Thanks to the uniqueness theorem Theorem 620 all subsequential weak limits are one and the same ie 11 converges weakly to a probability measure 11 whose Fourier transform is t This completes the proof 7 The Central Limit Theorem in R1 We are ready to state and prove the main result of this chapter that is a cornerstone of classical probability theory Theorem 626 The Central Limit Theorem Suppose X1X2 are iid realvalued random vari ables in L2 and assume that VarX1 gt 0 so that the st are truly random random variables Then writing 5 X1 v v v X as before it follows that for all real a lt b A 5 8 A 5 n8X1 7 6quot 22 lt lt 1 a lt 3135 b 11203 a lt nVarX1 b a df 6 29 Proof Let Xj 8Xj xVarXj and S X Then the st have mean zero and variance one Moreover S Sn n8X1 VIX1 In other words we can assume without loss of generality that the X j s have mean zero and variance one This simpli es the assertion to the following 7125 gt Z where Z is a normal random variable with mean zero and variance one Thanks to the continuity theorem and Example 614 we need to prove that for all t E R Sn o2 2 n1318expit e 630 By Taylor s expansion for all a E R 2 6 1iw Ys R r 631 where R is a complex valued function and Rw g gjwjfg If jar g 1 this is a good estimate On the other hand the worst estimate of all works well enough when jar gt 1 viz 6 1 iw jets 1 m 132 S 1 jar 52 g Combine terms to obtain the bound mm s g m3 A we 632 68 Thus by independence Lemma 512 and by the identical distribution of the X j s S quot X 8 exp it n 8 exp 2173 X t2 In2 X1 quot z 1 7 is 7 a R t7 633 Zt8E 2n 2 t2 I X1 n 1 7 t7 1 MSW am On the other hand the nal expectation involving the function R is bounded as follows n1 2 By the dominated convergence theorem limn 6 0 and we obtain the following for a sequence 5 gt 0 A Sn A t2 39 n 7 2 2 V 13208 exp itV 131130 1 0 Cn e 63a since by Taylor expansion for a 2 0 we have ln1 It 2 w why This proves 630 whence the central limit theorem D S The Central Limit Theorem in Rd Optional Reading Now we turn to the case of random variables in Rd Throughout X X1 X2 are iid random variables that takes values in Rd and 51 X1 v v v X Our discussion is somewhat sketchy since we have already encountered most of the key ideas earlier on in this chapter De nition 627 The characteristic function ofX is the function ft 8expith t 6 Rd where v denotes the Euclidean inner product If 11 is the distribution of X then this is also written as 17 D The following is the simplest analogue of the inversion theorem it is an immediate consequence of Theorem 618 Theorem 628 Inversion Theorem for d 2 1 IfA a1 M x v x adbd C Rd is a nite hypercube and if 11 is a probability measure on Rd 15 R l such that A does not charye the boundary ofA then 1 A eiibar 672147 A MA W119 V W ya at 636 The proof follows the argument that we used to derive Theorem 618 all the time noting that the integrand inside the big brackets is the product of the one dimensional versions The continuity theorem Theorem 624 also has a d dimensional analogue 69 Theorem 629 Continuity Theorem forn 1 If11111112 are probability measures on Rd 15 R l such that 11 gt 11 then for all t 6 Rd li 17t 11t Conversely suppose that for all t 6 Rd t lim17t exists where t is continuous in a neighborhood of 0 Then there exists a probability measure 1 on Rd 15 R l such that i t z 3 and ii 11 gt 1 The only real issue to worry about here is tightness De nition 630 A sequence X1X2 of Rd xialiied random variables is tight if for every 5 gt 0 there exists A gt 0 such thet for all n 2 1 X Z A g 5 where x M12 v v v Note that the tightness of the d dimensional sequence X1X2 follows from the tightness of its 11 coordinates Combining these remarks with the proof of the central limit theorem in one dimension and a multidimensional Taylor expansion we are led to the following Theorem 631 The Multidimensional Central Limit Theorem Suppose X1X2 are iid ran dom variables that take values in Rds Suppose further that 8Xf 11 for some 11 6 Rd where X is the ith coordinate of X1 and CovXfXf QM for an invertible d x 11 matrix Q Then nquot2S n11 converges weakly to a multidimensional Gaussian distribution with mean vector 0 and covariance matrix Q ie for all measurable sets G 9 Rd 5 n11 HalQ lim 3 7 E G d 637 new x a 2mm auto y x 1 The following is an important perhaps the most 1 I of the p of this chapter Theorem 632 The Cram r Wald Device X gt X if and only iffor all t 6 W t v X gt t v X The point is that the weak convergence of the d dimensional random variable X is boiled down to that of the one dimensional t v X but this needs to be checked for all t 6 Rd Proof Suppose X gt X ie for all bounded continuous f Rd gt R 8fX gt 8fX Since gx t v x is continuous this implies also that 8fgX gt 8fgX which is half of the result The converse follows from the continuity theorem Let 11 and 11 denote the distributions of X and X respectively The condition th gt th is saying that for all t 6 Rd 17t gt 17t and the converse follows from Theorem 629 D 9 Exercises Exercise 61 If 11111112 11 are probabilitvimeasures on Rt 15 R l then show that the following are characteristic functions of probability measures 17 Re 17 172 17 and 22 p l where p1 p Z 0 and pj 1 Also prove that 17 17 Consequently if 11 is a symmetric measure ie 11 A 11A for all A 6 01K then 17 is a real function 70 Exercise 62 If X has the probability density function fw 1 w then compute the characteristic function of X Use this and the inversion theorem Theorem 618 to show that f itself is the characteristic function of a probability m 39 39 ire In particular conclude that there are probability measures that possess a real nonnegative characte ic function that vanishes outside a compact Exercise 63 Use the central limit theorem Theorem 626 to derive 62 Exercise 64 Prove the following variant of the inversion theorem Theorem 618 For any a lt b and all probability measures 11 on R 15 R 30 2 2 f 7i a fitb wet 392 an dt u at gm gm 63s Exercise 65 Prove the law of rare events If Xn is a binomial random variable Example 616 with parameters n and Anquot where A 6 0m is xed then as n gt so Xn converges weakly to a Poisson distribution Example 617 with parameter A Exercise 66 A probability measure 11 on R R is said to be in nitely divisible if for any n 2 1 there exists a probability measure 1 such that 17 11quot i Let X have distribution 11 Show that in nite divisibility is equivalent to the following For all n 2 1 there exist iid random variables X1 X such that X has the same distribution as X1 v v 39 Xn ii Prove that the normal distribution and the Poisson distribution are in nitely divisible So is the probability density fx 7F1 1 272quot known as the Cauchy distribution HINT For the Cauchy use the inversion theorem to nd f iii Prove that the uniform distribution on 01 is not in nitely divisible Exercise 67 Let X1X2 denote independent not nece 1rin identically distributed real valued L23 random variables for all j let a VarXj and write z 0 In addition suppose that for all 7 A 1 1i 2 13320 a ng ijj gt 551 0 639 If Sn X1 v v v X then prove the Lindebem theorem Lindeberg 1922 For all a lt b l7 7122 lim 3gta lt g b 6 dz 640 max 5 a m HINT Consider the truncated random variables X z X j1 X Sam Exercise 68 Let 631 ed denote the usual basis vectors of Rt ie e3 1 0 0 e 2 0100 etc Consider iid random variables X1X2 such that X1 zej 2d en the random rt proc n X1 Xn with So 0 is the simple walk on Z it starts at zero and moves to each of the neighboring sites in 2 1 with equal probability and the pro ontinues in this way ad in nitum Find vectors an and constants bn such that Sn anbn converges weakly to a nontrivial limit distribution Compute the latter distribution CI
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'