Review Sheet for MATH 697 at UMass(1)
Review Sheet for MATH 697 at UMass(1)
Popular in Course
Popular in Department
This 36 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 14 views.
Reviews for Review Sheet for MATH 697 at UMass(1)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Stochastic Processes and MonteCarlo methods University of Massachusetts Fall 2007 Luc Rey Bellet Contents 1 Random Variables 3 11 Review of probability 3 12 Some Common Distributions 6 13 Simulating Random Variables 11 14 Markov7 Chebyshev7 and Chernov 17 15 Limit Theorems 20 16 Monte Carlo methods 27 17 Problems 32 Chapter 1 Random Variables 11 Review of probability In this section we brie y review the basic terminology of probability and statistics see any elementary probability book for reference Any real valued random variable X is described by its cumulative distribution func tion abbreviated cdf of X ie the function FX R a 01 de ned by PX S If there exists a continuous function f R a 000 such that foo fXy dy then X is said to be continuous with probability density function abbreviated pdf fX By the fundamental theorem of calculus the pdf of X is obtained from the cdf of X by differentiating ie fx96 F96 On the other hand if X takes values in the set of integers or more generally in some countable or nite subset Sof the real numbers then the random variable X and its cdf are completely determined by its probability distribution function pdf ie by p S a 01 where pi PXi iES In this case X is called a discrete random variable The pdf f of a continuous random variable satis es ff f dx 1 and the pdf of a discrete random variable satis es 26510 1 Either the cdf or pdf describes the distribution of X and we compute the probability of any event A C R by fAfz dx if X is continuous PltX E A EieApQ39 dx if X is discrete Let X X1 Xd be a random vector ie X1 Xd are real valued random variables with some joint distribution Often the joint distribution can be described CHAPTER 1 RANDOM VARIABLES 4 by the multiparameter analogue of the pdf For example if there is a function fx Rd a 000 such that PXeA Afxz177dd1dd then X is a continuous random vector with pdf fx Similarly a discrete random vector X taking values i i1L7 7id is described by P 17quot397 dPX1 17quot39Xd d A collection of random variables X17 7Xd are independent if fxx fX1x1 fXd dd 7 continuous case pxi pX1i1 de id discrete case 11 If X is a random vector and g Rd a R is a function then Y gX is a real random variable The mean or empeetation of a real random variable X is de ned by EX ff fXz dx if X is continuous T Eies PXi if X is discrete More generally if Y gX then i i f d gxf x dx if X is continuous EW 7 T 590 10x2 if X is discrete The variance of a random variable X7 denoted by varX7 is given by varX 7 E X 7 Em 7 EX2 7 EXP The mean of a random variable X measures the average value of X while its variance is a measure of the spread of the distribution of X Also commonly used is the standard deviation 4 varX Let X and Y be two random variables then we have EX Y EX ElY For the variance a simple computation shows that varX Y varX 2covX7 Y varY where covX7 Y is the eovaiianee of X and Y and is de ned by covX Y 7 E X 7 EXY 7 EY CHAPTER 1 RANDOM VARIABLES 5 In particular ifX and Y are independent then ElXY EXEY and so covX7 Y 0 and thus varX1 X2 varX1 varX2 Another important and useful object is the moment generating function mgf of a random variable X and is given by MXt EletXl Whenever we use a mgf we will always assume that MXt is nite at least in an interval around 0 Note that this is not always the case If the moment generating function of X is known then one compute all moments of X7 ie ElX l by repeated differentiation of the function MXt with respect to t The nth derivative of Mmt is given by and therefore EX MWO In particular ElX and varX 7 It is often very convenient to compute the mean and variance of X using these formulas see the examples below An important fact is the following its proof is not that easy Theorem 111 Let X and Y be two random variables and suppose that MXt Myt for all t E 766 then X and Y have the same distribution Another important property of the mgf is Proposition 112 If X and Y are independent random variable then the mgf of X Y satis es MXYt MXtMYt v ie the mgf of a sum of independent random variable is the product of the mgf Proof We have E 6tXY E 6amp6th E6tX E6tY 7 since etX and e are independent I CHAPTER 1 RANDOM VARIABLES 6 12 Some Common Distributions We recall some important distributions together with their basic properties The fol lowing fact are useful to remember Proposition 121 We have 1 Suppose X is a continuous random variable with pdf For any real number a the pdf ofX a is fz 7 a 2 Suppose X is a continuous random variable with pdf For any non zero real number b the pdf of bX is f 3 Suppose X is a random variable For any real numbera andb we have MbXat Eathbt Proof The cdf of X a is FXax PXa x PX Spia FXx7a Differentiating with respect to z gives fXa96 FSHQW fx96 a This shows To prove ii one proceeds similarly For b gt 0 FbXz PbX S p PX S pb Differentiating gives bep f The case b lt 0 is left to the reader To prove iii note that beai E etltbXagtl emE 5M emMXbt 1 Uniform Random Variable Consider real numbers a lt b The uniform random variable on a7b is the continuous random variable with pdf i if a lt z lt b bia 7 7 fa 0 otherwise CHAPTER 1 RANDOM VARIABLES 7 The moment generating function is 5 6th 7 eta tX 7 tr 7 Ele liQe dzi bia and the mean and variance are l 7 a b 7 a2 Elei 2 varX i 12 We write X Ua7b to denote this random variable 2 Normal Random Variable Let M be a real number and o be a positive number The normal random variable with mean M and variance 02 is the continuous random variable with pdf 1 we 2 fa 6 027T The moment generating function is see below for a proof 1 00 PM 022 tX tr if nt Ee ligm mee 22 dx e 2 12 and the mean and variance are EX n varX 02 We write X NW 02 to denote this random variable The standard normal random variable is the normal random variable with M 0 and o 17 ie7 N01 The normal random variable has the following property X N01 if and only if UX M Nnoz To see this one applies Proposition 121 and ii and this tells us that the density of oX M is To show that the formula for the moment generating function we consider rst X N01 Then by completing the square we have M I i if eme dz X m 700 H 2dz 1 i 7 e 2 e N27T Loo e 1 00 e WT202 dx i V 27139 foo e 1 00 e g dy t2 M 52 13 CHAPTER 1 RANDOM VARIABLES 8 This proves the formula for N01 Since Nioz oN07 M by Proposition 0 t2 1217 iii the moment generating function of NM0 2 is etf eT as claimed 3 Exponential Random Variable Let A be a positive number The exponential random variable with parameter A is the continuous random variable with pdf Ae M if z gt 0 fa T 0 otherwise The moment generating function is E ax A emf Aft if A ltt 0 00 otherw1se and the mean and variance are We write X EzpA to denote this random variable This random variable will play an important role in the construction of continuous time Markov chains It often has the interpretation of a waiting time until the occurrence of an event 4 Gamma Random Variable Let n and A be positive numbers The gamma random variable with parameters n and A is the continuous random variable with pdf Ae ME1 if z gt 0 0 otherwise fa The moment generating function is 00 n71 A n i E etX A emAe Amm A4 If t lt A 0 n 7 1 00 otherwise and the mean and variance are 77 EX 3 varX V A 7 We write X Gamman7 A to denote this random variable To compute the mgf note that for any 04 gt 0 0 1 e wdp 7 0 a CHAPTER 1 RANDOM VARIABLES 9 and differentiating repeatedly wrt or gives the formula w in 670mn71d n 39 0 04 Note that Gamma1A ExpA Also the mgf of Gamman7 A is the mgf of EzpA to the nth power Using Theorem 111 and Proposition 112 we conclude that if X17 7X are exponential random variables with parameters A then X1 Xn Gamman7 A 5 Bernoulli Random Variable A Bernoulli random variable models the toss a possibly unfair coin7 or more generally any random experiment with exactly two outcomes Let p be a number with 0 S p S 1 The Bernoulli random variable with parameterp is the discrete random variable taking value in 01 with 190 1719 191 p The moment generating function is EletX 17ppet7 and the mean and the variance are Ele p7 varX M1719 A typical example where Bernoulli random variable occur is the following Let Y be any random variable7 let A be any event7 the indicator random variable 1A is de ned by 1 7 1 if Y e A A 0 if Y e A Then 1A is a Bernoulli random variable with p PY E A 6 Binomial Random Variable Consider an experiment which has exactly two outcomes 0 or 1 and is repeated n times7 each time independently of each other ie7 n independent trials The binomial random variable is the random variable which counts the number of 1 obtained in the n trials Let p be a number with 0 S p S 1 and let n be a positive integer The Bernoulli random variable with parameters n andp is the random variable which counts the number of 1 occurring in the n outcomes The pdf is W lt7lgtpi1p i7 239 07 17 i The moment generating function is E letXl 17pp6 7 CHAPTER 1 RANDOM VARIABLES 10 and the mean and the variance are Em np varltXgt nplt1 am We write X Bnp to denote this random variable The formula for the mgf can be obtained directly using the binomial theorem7 or simply by noting that by construction Bnp is a sum of n independent Bernoulli random variables 7 Geometric Random Variable Consider an experiment which has exactly two outcomes 0 or 1 and is repeated as many times as needed until a 1 occurs The geometric random describes the probability that the rst 1 occurs at exactly the nth trial Let p be a number with 0 g p S 1 and let it be a positive integer The Geometric raridom variable with parameter p is the random variable with pdf Mn 1 pYHp n172737 The moment generating function is Pe f e i 1 lt 1 E tX tn 1 7 mil 151771 1 10 6 l 2 6 p p 0 otherwise 7 The mean and the variance are We write X Geometricp to denote this random variable 8 Poisson Random Variable Let A be a positive number The Poisson raridom vari able with parameter A is the discrete random variable which takes values in 07 17 27 and with pdf n A 1071 e47V n012 n The moment generating function is tX 00 t n A A t 1 i n T 5 T Ee linioe me e The mean and the variance are ElX A varX A We write X P0issonA to denote this random variable CHAPTER 1 RANDOM VARIABLES 11 13 Simulating Random Variables In this section we discuss a few techniques to simulate a given random variable on a computer The rst step which is built in in any computer is the simulation of a random number ie the simulation of a uniform random variable U01 rounded off to the nearest In principle this is not dif cult take ten slips of paper numbered 0 1 9 place them in a hat and select successively n slips with replacement from the hat The sequence of digits obtained with a decimal point in front is the value of a uniform random variable rounded off to the nearest 10 In pre computer times tables of random numbers were produced in that way and still can be found This is of course not the way a actual computer generates a random number A computer will usually generates a random number by using a deterministic algorithm which produce a pseudo random number which 77looks like77 a random number For example choose positive integers a c and m and set Xn1 aXn c modm The number Xn is either 01 m 7 1 and the quantity Xnm is taken to be an approximation of a uniform random variable One can show that for suitable a C and in this is a good approximation This algorithm is just one of many possibles and used in practice The issue of actually generating a good random number is a nice interesting and classical problem in computer sciences For our purpose we will simply content ourselves with assuming that there is a 77black box77 in your computer which generates U01 in a satisfying manner We start with a very easy example namely simulating a discrete random variable X Algorithm 131 Discrete random variable Let X be a discrete random variable taking the values 12 with pdf pj PX xj To simulate X 0 Generate a random number U U0 67 1 if U ltp1 952 if p1 lt U lt p1 p2 X 3 3 xn ifp110n71ltUltp1pn Then X has the desired distribution We discuss next two general methods simulating continuous random variable The rst is called the inverse transformation method and is based on the following CHAPTER 1 RANDOM VARIABLES 12 Proposition 132 Let U U01 and let F FX be the cdf of the continuous random variable X Then X F 1U and also XF 117U Proof By de nition the cdf of the random variable X is a continuous increasing function of F7 therefore the inverse function F 1 is well de ned and we have PF 1U a PU Fa Fa and this shows that the cdf of F 1U is F and thus X F 1U To prove the second formula simply note that U and 1 7 U have the same distribution I So we obtain Algorithm 133 Inversion method for continuous random variable Let X be a random variable with cdfF FX To simulate X 0 Step 1 Generate a random number U U07 0 Step 2 SetX F 1U Example 134 Simulating an exponential random variable If X Ezp then its cdf if 1 7 e The inverse function F 1 is given by M 1 17e u 1ffu7xlog17u Therefore we have F 1u 7 log1 7 So if U U01 then 1 l Ezp 7Xlog17 U 7X logU The inversion method is most straightforward when there is an explicit formula for the inverse function F l In many examples however a such a nice formula is not available Possible remedies to that situation is to solve FX U numerically for example by Newton method Another method for simulating a continuous random variable is the rejection method Suppose we have a method to simulate a random variable with pdf 9a and that we want to simulate the random variable with pdf The following algorithm is due to Von Neumann CHAPTER 1 RANDOM VARIABLES 13 Algorithm 135 Rejection method for continuous random variable Let X be a random variable with pdf fd and let Y be a random variable with pdf Furthermore assume that there CCElStS a constant C such that 30 forally am To simulate X 0 Step 1 Simulate Y with density 9 0 Step 2 Simulate a random number U o Step3 If US set X Y Otherwise return to Step 1 That the algorithm does the job is the object of the following proposition Proposition 136 The random variable X generated by the rejection method has pdf l Proof To obtain a value of X we will need in general to iterate the algorithm a random number of times We generate random variables Y1 7YN until YN is accepted and then set X YN We need to verify that the pdf of X is actually Then we have PX x PYN x mfw w OP U S 553 CHAPTER 1 RANDOM VARIABLES 14 If we let x 7 00 we obtain that OP U S CAMZYL 1 and thus and this shows that X has pdf I In order to decide whether this method is ef cient of not7 we need to ensure that rejections occur with small probability The above proof shows that at each iteration the probability that the results is accepted is Png independently of the other iterations Therefore the number of iterations needed is Geom with mean 0 Therefore the ability to choose a reasonably small 0 will ensure that the method is ef cient Example 137 Let X be the random variable with pdf fx 2017x3 0ltzlt1 Since the pdf is concentrated on 01 let us take g1 0ltxlt1 To determine 0 such that S C we need to maximize the function h E 20z1 7 z3 Differentiating gives h 20 1 7 z3 7 3M1 7 x2 and thus the maximum is attained at z 14 Thus fz 1 3 3 7 135 7 ma a a039 We obtain flt 256 7 7 7 3 Cgx T 27 1 n and the rejection method is 0 Step 1 Generate random numbers U1 and U2 0 Step 2 If U2 3 U117 U1037 stop and set X U1 Otherwise return to step 1 The average number of accepted iterations is 13564 CHAPTER 1 RANDOM VARIABLES 15 Example 138 Simulating a normal random variable Note rst that to simu late a normal random variable X NW 02 it is enough to simulate N01 and then set X UN01 u Let us rst consider the random variable Z whose density is 2 x2 z 6 7 7 0 lt z lt 00 f gt if One can think of Z as the absolute value of N01 We simulate Z by using the rejection method with g e m 07 lt 007 ie7 Y Ep1 To nd 0 we note that fx 9164202 S 362039 g 7T 7139 One generates Z using the rejection method To generate X N01 from Z one generate a discrete random variable S with takes value 1 and 71 with probability and then set X SZ The random variable S is S 2B17 71 0 Step 1 Generate a random numbers U7 an exponential random variable Y and a Bernoulli random variable B Step 2lfU exp7wsetZYandX2B71Z For particular random variables many special techniques have been devised We give here some examples Example 139 Simulating a geometric random variable The cdf of the ge ometric random variable X G60mp is given by FM PX n 1PXgtn 1 Z lt17pgtr1p 171719 kn1 The exponential random variable Y Ezp has cdf 1 7 6 For any positive real number let denote the smallest integer greater than or equal to 7 eg 372 4 Then we claim that if Y Ezp then YT Geomp with p 1 7 67A Indeed we have PY1lt n PY n 1754 Thus we obtain CHAPTER 1 RANDOM VARIABLES 16 Algorithm 1310 Geometric random variable 0 Step 1 Generate a random number U 0 Step 2 Set X T logw T log1z7 Then X Ge0mp Example 1311 Simulating the Gamma random variableUsing the fact that Ga7n7m1n7 A is a sum of n independent Ezp one immediately obtain Algorithm 1312 Gamma random variable 0 Step 1 Generate n random number U17 7 U 0 Step 2 Set X 7 logUl 0 Step 3 SetXX1Xn Then X Gammanp Finally we give an elegant algorithm which generates 2 independent normal random variables Example 1313 Simulating a normal random variable BoxMiillerWe show a simple way to generate 2 independent standard normal random variables X and Y The joint pdf of X and Y is given by l 702y2 95721 57 2 Let us Change into polar coordinates 7319 with r2 2 y2 and tant9 The Change of variables formula gives T2 l ay dxdy re Tdr d Consider further the Change of variables set 5 r2 so that ay dxdy e g desigd The right hand side is iasily seen to be the joint pdf of the two independent random variables S Ezp12 and 9 U07 27d Therefore we obtain Algorithm 1314 Standard normal random variable CHAPTER 1 RANDOM VARIABLES 17 0 Step 1 Generate two random number U1 and U2 0 Step 2 Set X i72logU1 cos27rU2 M72 logU1 sin27rU2 lt l l Then X and Y are 2 independent N0 1 14 Markov Chebyshev and Chernov We start by deriving sirnple techniques for bounding the tail distribution of a random variable ie bounding the probability that the random variable takes value far from the its mean Our rst inequality called Markou s inequality simply assumes that we know the mean of X Proposition 141 Markovls Inequality Let X be a random variable which as sumes only nonnegatiue ualues ie PX 2 0 1 Then for any a gt 0 we have Ele PXgt lt J a Proof For a gt 0 let us de ne the random variable I 7 1 if X 2 a a 7 0 otherwise Note that since X 2 0 we have Ia S g 16 and that since 1 is a binomial random variable EUQ PX 2 a Taking expectations in the inequality 16 gives PX 2 a EL lt E CHAPTER 1 RANDOM VARIABLES 18 Example 142 Flipping coins Let us ip a fair coin n times and let us de ne the random variables X i 12 n by 1 if the ith coin ip is head Xi i 0 otherw1se Then each X is a Bernoulli random variable and S X1 Xn Bn7 is a binomial random variable Let us the Markov inequality to estimate the probability that at least 75 of the n coin ips are head Since ELSE 1 the markov7s inequality tells us that 3n ELSE n2 2 gt7lt 77 135 4L 3n4 3n4 3 As we will see later this is a terrible really terrible bound but note that we obtained it using only the value of the mean and nothing else Our next inequality which we can derive from Markov7s inequality involves now the variance of X This is called Chebyshev s inequality Proposition 143 Chebyshev7s Inequality Let X be a random variable with EX a and VarX 02 Then for any a gt 0 we have varX a2 39 FOX 7 m 2 a s Proof Observe rst that Pox e m 2 a PltltX e m2 2 a2 Since Xiay is a nonnegative random variable we can apply Markov7s inequality and obtain EKX e m varltXgt PlX MlZaSTT I Let us apply this result to our coin ipping example Example 144 Flipping coins cont7d Since S has mean n2 and variance n4 Chebyshev7s inequality tells us that Plts2igt 23092323 PM 3 712 7 17 CHAPTER 1 RANDOM VARIABLES 19 This is signi cantly better that the bound provided by Markov7s inequality Note also that we can do a bit better by noting that the distribution of Sn is symmetric around its mean and thus we can replace 4n by 2n We can better if we know all moments of the random variable X7 for example if we know the moment generating function MXt of the random variable X We have Proposition 145 Chernov7s bounds Let X be a random variable with moment generating function MXt EetX o For any a and any t gt 0 we have E etX PX 2 a 3 min 20 gm 0 For any a and any t lt 0 we have tX PX a S minE6 tlt0 em Proof This follows from Markov inequality For t gt 0 we have E6tX tX ta PX2aiPe gte 3 6m Since t gt 0 is arbitrary we obtain E tX PX 2a 3 min 6 tZO eta Similarly for t lt 0 we have E6tX tX ta PXlta Pe gte lt 6m and thus E tX PX2a 3min 6 I Let us consider again our ipping coin examples Example 146 Flipping coins cont7d Since Sn is a binomial Bn random variable its moment generating function is given by Msnt et To estimate PSn 2 3n4 we apply Chernov bound with t gt 0 and obtain 1 l t n n P lt6 2 S 2 35 le leggt 4 2 2 5T CHAPTER 1 RANDOM VARIABLES 20 To nd the optimal bound we minimize the function ft e T e The mimimum is at t log3 and 1 f10g3 hem 510 56i1 glt3gtlt 10g3 1 and thus we obtain 3 P Sn 2 I g 0877 This is course much better than 271 For n 100 Chebyshev inequality tells us that the probability to obtain 75 heads is not bigger than 002 while the Chernov bounds tells us that it is actually not greater than 209 gtlt 10 6 15 Limit Theorems In this section we study the behavior7 for large n of a sum of independent identically distributed variables That is let X17X27 be a sequence of independent random variables where all Xs have the same distribution Then we denote by Sn the sum aampmamp Under suitable conditions Sn will exhibit a universal behavior which does not depend on all the details of the distribution of the Xs but only on a few of its charcteristics7 like the mean or the variance The rst result is the weak law of large numbers It tells us that if we perform a large number of independent trials the average value of our trials is close to the mean with probability close to 1 The proof is not very dif cult7 but it is a very important result Theorem 151 The weak Law of Large Numbers Let X17 X27 be a sequence of independent identically distributed random variables with mean 11 and variance 02 Let SnX1Xn Then for any 6 gt 0 lim P lt naoo Sn 771426 0 n Proof By the linearity of expectation we have Sn 1 EXlXn 7 M 71 71 71 CHAPTER 1 RANDOM VARIABLES 21 ie the mean of Snn is M Furthermore by the independence of X17 Xn we have Sn 1 1 2 2 var varSn varX1 X 2L2 0 Applying Chebyshev7s inequality we obtain Plt and for any 6 gt 0 the right hand sides goes to 0 as n goes to 00 I fill 5 was 71 Usually we refer to the quantity as the empirical average The weak law of large numbers tells us that if we perform a large number of independent trials then the average value of our trials is close to the mean with probability close to 1 The proof is not very dif cult7 but it is a very important result We discuss now several re nements of the Law of Large numbers A look at the proof shows that the probability to observe a deviation from the mean behaves like 1717 and that we have used only the fact that the variance is nite One would expect that we know that higher moments Eanl are nite one should obtain better estimates For this we need some preparation Let X be a random variable with mgf gtt EletX It will be useful to consider the logarithm of t which we denote by u W mw mm l and referred to as the logarithmic moment generating function Recall that a function ft is called conver if for any 0 S 04 S t we have flt0 t1 7 Ott2 S Otfltt1gt 7 Graphically it means that the graph of ft for t1 3 t 3 t2 lies below the line passing through the t1L7 ft1 and t1ft2 From calculus f is convex iff f t is increasing iff f t is nonnegative provided the derivatives exist Lemma 152 The logarithmic moment generating function ut log gtt is a conver function which satis es u0 07 uO M u 0 oz Proof We will prove the convexity using Ho39lder inequality which states that if 1p 1q 1 then ElXY S EXp1pEYq1q We choose p i and q 1 and obtain 170 atl 1704th th 0 th 1 0 th 0 th 0 0 EV llleEW HERE lSEW HEW h CHAPTER 1 RANDOM VARIABLES 22 Taking logarithms proves the convexity Note further that 7 9 t lttgt t t t2 t at 39 If t 0 we nd that W0 0 M 150 0 0 02 U2 0 gtlt0gt2 De nition 153 The legendre transform of afunetz39on ft is the function de ned by NZ Sttlp2t W 18 Note that the supremum in Eq 18 can be equalt to 00 If the the supremum is nite we can comute f can using calculus if f is differentiable The supremum is attained at the point t such that the derivative of 2t 7 ft vanishes7 ie7 zfwy Then solving for t and inserting in the lhs of 18 gives ft For future use let us compute the Legendre transform of some logarithmic moment generating functions Example 154 Let t eF H Z be the mgf of N01 and let ut log gtt pt UZtZQ Given 2 the maximum of 2t 7 pt 7 02t22 is attained if t satis es 271 02 27M702t07 f and thus 2 7 M ti t72t222 W 2 u MM 202 We see that uz is a parabola centered around u Example 155 Let gtt 17p pet be the mgf of B1p and let ut log gtt log1 7 p pet We distinguish three cases CHAPTER 1 RANDOM VARIABLES 23 o lf 2 gt 1 then the function 2t 7 log1 7p pet is increasing since its derivative is z 7 gt 0 for all t The maximum is attained as t 7 00 and is equal to 00 since tlim zt7log17ppet tlim 2t717log17pe t 10 00 o lf 2 lt 0 then the function 2t 7 log1 7 p pet is decreasing for all t and thus the supremum is attained as t 7 700 The supremum is 00 0 For 0 S 2 S 1 the the maximum is attained if t satis es t 1 7 z L t log 2 7p 170p6 172 0 uz zlog17zloglt1gt A simple computation shows that uz is strictly convex and that uz has its mini mum at z p and we obtain Lemma 156 Let ut be the logarithmic moment generating function of the random variable X Then the Legendre transform uz of ut is a convecc function which satis es 2 0 If 02 gt 0 then 0 if z u ie uz is nonnegatiue and takes its unique minimum which is equal to 0 at the mean u of X Moreover ifz gt u then uz supt2 7 20 and ifz lt u then uz supt2 7 Proof39 1 The convexity of uz follows from au21 1 7 ozu22 Slt1p0421t 7 aut Sltlp1 7 a22t 7 1 7 aut Z Sltlp wl 1 a22t u10 uoz21 17 c022 19 2 Next note that uz 2 02 7 u0 0 and thus uz is nonnegative 3 Suppose that if 20 0 for some 20 Then supttz 7 0 The supremum is attained at t which satis es the equation 20 u t and thus we must have ft ut 7tu t 0 110 CHAPTER 1 RANDOM VARIABLES 24 This equation has one solution7 namely for t 0 since u0 0 In that case 20 u 0 M Let us show that this is the only solution The function ft ut 7 tu t is 0 at t 0 and its derivative is f t 7tu t lf u 0 02 gt 0 by continuity u t gt 0 for t 6 766 Thus f t gt 0 for t 6 06 and f t lt 0 for t 6 760 Therefore 0 is the only solution of f0 0 4 lf 2 gt M then for t lt 0 we have 2t 7 ut 3 pt 7ut S supmt 7ut ui 0 Since if gt 0 we conclude that the suprernurn is attained for some t 2 0 One argues similarly for z lt M I Theorem 157 Onehalf of Cramer7s theorem Let XLXZ7 be a sequence of independent and identically distributed random variables Assume that the moment generating function gtt of Xi edists and is nite in a neighborhood of 0 Let ut log gtt and uz supzzt 7 Then for any a gt M we have S P lt7 gt a S e m a n and for any a lt M we have P lt a S e n Proof We use Chernov inequality Let a gt M for t gt 0 we have PltZagt P n2an rnin e mtE 65quot 20 gamma min 67nat7ut 20 67nsupt20at7ut 67nsuptat7ut 67nua 39 One proceeds similarly for a lt 0 There is a strengthening of the weak law of large numbers called the strong law of large numbers CHAPTER 1 RANDOM VARIABLES 25 Theorem 158 Strong Law of Large Numbers Let XLXZ7 be a sequence of independent identically distributed random variables with mean n Then Snn converges to u with probability 1 ie Figaro The strong law of large numbers is useful in many respects Imagine for example that you are simulating a sequence of iid random variables and that you are trying to determine the mean u The strong law of large numbers tells you that it is enough to do ONE simulation for a suf ciently long time it will always reproduce the mean The weak law of large numbers tells you something a little weaker with very large probability you will obtain the mean Based on the weak law of large numbers only you might want to repeat your experiment a number of times to make sure you were not unlucky and hit an event of small probability The strong law of large numbers tells you not to worry Proof The proof of the strong law of large numbers use more advanced tools that we are willing to use here Finally we discuss the Central Limit Theorem The Law of large number and Cramer7s theorem deals with large uctuations for Snn7 that is with the probability that Snn is at a distance away from the mean which is of order 1 In particular these uctuations vanish when n a 00 For example we can ask if there are non trivial uctuations of order 7 for some a gt 0 One can easily gure out which power a has to be chosen Since ELSE nu varSn n02 we see that the ratio Sn 7 nu u has mean 0 and variance 1 for all n This means that uctuation of order 1N7 may be non trivial The Central limit theorem shows not the uctuation of order 1N7 of Sn are in fact universal for large n they behave like a normal random variable7 that is Sn 7 nu 7 N N 0 1 g 7 7 01 S 1 AN 7N0 2 n M 70 What we exactly mean by N is given in Theorem 159 Central Limit Theorem Let X17X27 be a sequence of inde pendent identically distributed random uariables with mean u and variance 02 gt 0 Then for any 700 S a S boo we have 7 b limP agwgb i e m22dd Hoe g 27T a CHAPTER 1 RANDOM VARIABLES 26 Proof We will not give the complete proof here but we will prove that the moment generating function of Lg converges to the moment generating of N07 1 as n a 00 Let by X then 0 and varXl 1 If S Sf X then Sn 7 my 7 S g 7 39 Therefore without loss of generality we can assume that M 0 and 039 1 If b be the moment generating function of Xi then MO 0 and 15 0 1 We denote the moment generating function of Sn by nt Using independence we have n axe29 L W l 7 4 Recall that the mgf of N07 1 is given by 6 22 so we need to show that nt a EtZZ as n a 00 To show this we set ut log gtt and un t log nt and show that un t a t22 as n a 00 We have Mt E 857 E um log nt nlog lt gt Notethat ultogt Iog ltogto 7 110 7 0 W M 0 W0 gtHltogt ltogtiltwltogtgt22721 450 By using L7Hospital rule twice we obtain lim un t lim naoo 94100 5 A WtWW 91330 25712 t2 t2 giggle WW 5 Therefore limyH00 nt 6 22 From this it seems at least plausible that the cdf of Sn converges to the cdf of N01 but proving this requires some serious work and some Fourier analysis I CHAPTER 1 RANDOM VARIABLES 27 16 MonteCarlo methods The basic Monte Carlo method uses sum of independent random variables and the law of large numbers to estimate a deterministic quantity In order to illustrate the method let us start by an example Example 161 Estimating the number 7139 We construct a random algorithm to generate the number 7139 Consider a circle of radius 1 that lies inside a 2 gtlt 2 square The square has area 4 and the circle has area 7139 Suppose we pick a point at random within the circle and de ne 1 if the point is inside the circle X i 0 otherw1se and PX 1 7r4 We repeat this experiment 71 times That is we we select 71 points inside the circle independently The number of points Sn within the circle can be written as Sn X1 Xn where the Xs are independent copies of X So Sn B016 and ELSE mr4 Suppose now that perform 71 10000 trials and observe Sn 79327 then our estimator for 7139 is 4 31728 We try to determine how accurate our estimator is By the Central Limit Theorem7 L i has for suf ciently large n a normal distri bution N01 Therefore S 039 S 039 P 7 71967 lt lt 7 1967 2 095 lt n W 7 M 7 n 7 for suf ciently large n The value x 196 is such that PlN01l S s 095 For this reason we call the interval 7 196 h 196 a 95 con dence interval 7 n In our case a 95 con dence interval for 7r4 is Sn 039 Sn 039 7 71967 7 1967 n W7 n W where 039 4amp1 7 E which we cant really evaluate since we do not know 7139 There are several ways to proceed 1 Use the simple bound x1 7 s S i so 039 S and thus 1 2 196i 196 00098 mm This gives the interval 313367 32120 for a conservative 95 con dence interval CHAPTER 1 RANDOM VARIABLES 28 2 We can simply use our estimate for 7139 into the formula 039 1 7 E g 0405 This gives a con dence interval of 3141032046 3 Another way to estimate the variance when 02 is unknown is to use the sample variance given by 1 71711 V2 n Sn Xi i 1 n The sample is an unbiased estimator since we have 02 To see this note we can assume that a 0 and then we have 1 n S S EXL2X1i lt4 7171 1 71 71 2 n 2 1 2 2 017 nia Ellfl 7171 This example is a particular case of the hit or miss method Suppose you want to estimate the volume of the set E in Rd and that you know the volume of a set A which contains B The hit or miss method consists in choosing n points in A uniformly at random and use the fraction of the points that land in B as an estimate for the volume of B Another class of examples where Monte Carlo methods can be applied is the com putation of integrals Suppose you want to compute the integral 1 i 00513 11 dz 0 3 cosx or more generally 12 hxdx s where S is a subset of Rd and h is a given real valued function on S A special example is the function h 1 on S in which case you are simply trying to compute the volume of S Another example is 13 Rdhxfxdx where h is a given real valued function and f is a pdf of some random vector on Rd All these examples can be written as expectations of a suitable random variable Indeed we have 3 where X has pdf fx We have also 1 EhU where U U07 To write 2 has an expectation choose a random vector such that its pdf f satis es fx gt 0 for every x E S Extend k to Rd by setting k 0 if z S Then 12 deXdac Rdfxdx 39 CHAPTER 1 RANDOM VARIABLES 29 Note that you have a considerable freedom in choosing f and this is what lies behind the idea of importance sampling see Example 163 below Many many other problems can be put in the form maybe after some considerable work I E lhXl and the random variable X could also be a discrete random variable Algorithm 162 Simple sampling In order to estimate I E hX the simple samling consists in generating iid random variables X17 X27 Xn and set 1 71 In E 7 Z hXl n i1 The quantity In gives an unbiased estimate ofI ie E In I By the strong law of large numbers In converges to I with probability 1 as n a 00 The variance of the simple sampling estimate is varhX 39 var I n n Note that the variance varIn can be used to determine the accuracy of our esti mate7 for example by determining a 95 con dence interval as in Example 161 If we denote 02 varhX then the half length of 95 con dence interval is given by 196on If we wish our con dence to half length 6 we need to choose n such that 62 n 2 1962a239 So7 as a rule the accuracy of the method is of order We consider next another example which illustrate one technique through which one can reduce the variance considerably variance reduction The technique we will use goes under the name of importance sampling Suppose we want to compute We can use simple sampling by simulating iid random variables with pdf Instead of using X we can choose another random variable Y with pdf g and write WWW hYf Y 996 90 We then simulate iid rndom variables K with pdf g and this gives a new estimator Ewan hltzgtfltzgtdz gltzgtdz El h5jfY F21 9Y39 7 J J CHAPTER 1 RANDOM VARIABLES 30 K66 ew k 51 54 57 511 514 518 520 ltezeggtmgtemgt 53 55 59 512 516 519 522 Figure 11 A graph with 22 edges The variance is given by varJn ivar lt var lt varhX and thus to improve the ef ciency of our method There are many other methods to reduce the variance and some are touched upon in the exercises Example 163 Network reliabilityLet us consider an application of simple sam pling to nework reliability Consider a connected graph as in Figure 11 Each edge as a probability q of failing and all edges are independent Think of q has a very small number7 to x the idea let q 10 Fix two vertices s and t and we want to compute the disconnection probability pD E P s is not connected to t by working edges This can be computed by hand for very small graphs but even for the graph shown in Figure 11 this is hardly doable Our graph here has 22 edges and let S 61622 denote the set of all edges Let X denote the set of edges that fail7 so X is a random subset of 8 So for every B C S we have PX B 11311 Dispigi CHAPTER 1 RANDOM VARIABLES 31 where lAl denotes the cardinality of A If we denote by S the set of all subsets of S then X is a random variable which takes value in 8 Let us de ne the function k S a R by MB 7 1 if s is not connected to t when the edges of B fail T 0 if s is connected to t when the edges of B fail Then we have m Z PltXBgt ZkltBgtPltX 3 ElkXl BkB1 B The simple sampling estimator for pp is where X17 Xn are iid copies of X Each Xi can be generated by tossing an unfair coin 22 times Then our estimator is simply the fraction of those simulated networks that fail to connect 5 and t In order to get an idea of the number involved let us give a rough estimate of pp It is easy to see that at least 3 nodes must fail for s not to be connected to It So we have 2 22 pD 13le 2 3 17 2 Jam 122 2 000136 j0 since le B22q On the other hand we can get a lower bound for pp by noting that pp 2 1361762763 fail q3 1076 Therefore pp is between 10 2 and 10 6 which is very small We will thus need very tight con dence intervals To compute varIn note that kX is a Bernoulli random variable with parameter pp Hence 1 VaFUn EPDO PD 210 since pp is small To get a meaningful con dence interval we need its half length 24ppn to be at the very least less than pp2 This implies however that we must choose 71 gt 16pp7 and thus we need millions of iterations for a network which is not particularly big Let us use importance sampling here Note that is very small which means that typical X have kX 0 The basic idea is to choose the sampling variable in CHAPTER 1 RANDOM VARIABLES 32 such a way that we sample more often the X for which kX 1 ie7 large in our case A natural try is take the random variable Y to have a distribution D Py B 6317022 l3l with a well chosen 0 Since MY 0 whenever lYl gt 3 we can for example choose 6 such that 3 Since lYl B220 this gives 226 and thus 6 322 The estimator is now i i MEMO1 71 06 i1 J where are iid with distribution Y Let us compute the variance of J We have varJn ll A 03M Q A U VA 7 Q A U V l 6 UN V B1p3 10 111 Note that 193 231Q22quotB lt1 2 911 79 202 x 0064 B B 6317 9227131 17 9 617 Q In Eq 111 all terms with MB 1 have lBl 2 3 For those B we have 193 7 g 202 gtlt 00643 3 00053 B So we get i 00053pD Jn lt var 7 n 1 i 2 00053 pB B kB1 This means that we have reduced the variance by a factor approximately of 200 So for the same 71 the con dence interval is going to be about V20 14 times smaller Alternatively a given con dence interval for In can be obtained for JnZOO This is good 17 Problems Exercise 1 1 For positive numbers a and b7 the Paret0ab distribution has pdf f abax 1 for z 2 b and f 0 for z lt 1 Apply the inversion method to generate Paret0ab an 2 The standardized logistic distribution has the pdf f Use the inversion method to generate a random variable having this distribution CHAPTER 1 RANDOM VARIABLES 33 Exercise 2 Consider the technique of generating a Gam77mn7 A random variable by using the rejection method with g being the pdf of an exponential with parameter An 1 Show that the average number of iterations of the algorithm is nnel n 7 1l 2 Use Stirling formula to show that for large n the answer in 1 is approximately 6477 7127T 3 Show that the rejection method is equivalent to the following 0 Step 1 Generate Y1 and Y2 independent exponentials with parameters 1 0 Step 2 If Y1 lt n 7 1lY2 7logY2 71 return to step 1 0 Step 3 Set X nYZA Exercise 3 Generating a uniform distribution on the permutations In this problem we will use the following notation If x is positive real number we denote by z the integer part of x ie is the greatest integer less than or equal x For example 237 2 Consider the following algorithm to generate a random permutation of the elements 1237 n We will denote by Si the element in position 239 For example for the permutation 243175 of 5 elements we have 51 27 52 4 and so on 1 Set k 1 2 Set 81 1 3 If k n stop Otherwise let k k 1 4 Generate a random number U7 and let Sk SkU 1 SkU 1 k Go to step 3 Explain7 in words7 what the algorithm is doing Show that at iteration k 7 ie when the value of Sk is initially set7 817 827 Sk is a random permutation of 12k Hint Relate the probability Pk obtained at iteration k with the probability Pk1 obtained at iteration k 7 1 Exercise 4 Compute the Legendre transform if of the logarithmic mgf u for the random variables PUZSSOHltAgt and Exp Discuss in details where if is nite or not CHAPTER 1 RANDOM VARIABLES 34 Exercise 5 We have seen in class that if Xi are independent B1p Bernoulli random variable and Sn X1 Xn then for a gt p p E Z a S Ewan n where 2 zlogltigt1izlogltigt if0 z 1 00 otherwise and a similar bound for a lt p In order to make that bound more easy to use in practice 1 Show that for a gt 0 and 0 lt1 lt 1 we have 7 22 7p2 2 0 Hint Differentiate twice 2 Show that for any 6 gt 0 Sn P 710 2 6gt S 2647 71 Exercise 6 On a friday night you enter a fast food restaurant which promises that every customer is served within a minute Unfortunately there are 30 customers in line and you an appointment will force you to leave in 40 minutes Being a probabilist you assume that the waiting time of each customer is exponential is mean 1 Estimate the probability that you will miss your appointment if you wait in line until you are served using a Chebyshev inequality7 b The central limit theorem7 c Cramer7s theorem Exercise 7 Consider the problem of estimating 7139 that we have considered in class Estimate the number of trials you should perform to ensure that with probability 1 7 6 your result is at distance no more than 6 from the true value of 7139 Do this in two ways 1 Use the central limit theorem7 b Use the estimates of Exercise 5 and Cramer7s theorem Exercise 8 Hitor missmethod 1 Suppose that you wish to estimate the volume of a set E contained in the Eu clidean space Rk You know that B is a subset of A and you know the volume of A The hit or miss method consists in choosing n independent points uni formly at random in A and use the fraction of points which lands in B to get an estimate of the volume of B We used this method to compute the number 7139 in class Write down the estimate In obtained with this method and compute varIn CHAPTER 1 RANDOM VARIABLES 35 2 Suppose now that D is a subset of A and that we know the volume of D and the volume of D B You decide to estimate the volume of B by choosing n points at random from AD and counting how many land in B What is the corresponding estimator I of the volume of B for this second method Show that this second method is better than the rst one in the sense that varIL S varIn 3 How would use this method concretely to the estimation of the number 7T7 Com pute the corresponding variances Exercise 9 Suppose f is a function on the interval 01 with 0 lt f lt 1 Here are two ways to estimate I fol fxdz 1 Use the hit or miss from the previous problem with A 01 gtlt 01 and Bzy 03z310yf96 2 Let U17 U27 be iid uniform random variables on 01 and use the estimator Show that that In has smaller variance than the estimator of a Exercise 10 Antithetic variablesln this problem we describe an example of a method to reduce the variance of the simple sampling method 1 Suppose that k and h are both nondecreasing or both nonincreasing functions then show that covkXhX 2 0 Hint Let Y be a random variable which is independent of X and has the same distribution as X Then by our assumption on h7 k we have 7kYhX 7 hY Z 0 Take then expectations 2 Consider the integral I 01 kzd and assume that k is nondecreasing or nonincreasing The simple sampling estimator is In ll where Ul are independent U01 random variables Consider now the alterna tive estimator for 71 even set fn 14m W 7 U1 i1 CHAPTER 1 RANDOM VARIABLES 36 where Ul are indepegdent U01 random variables Show that In is an estimator for I and that varIn S varIn Hint Use part 1 to ShOW varkU1 1617 U1 3 varkU1 3 Let 4V1 7 2 Then I 7T Compute varfn and varIn
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'