### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Class Note for MATH 564 at UA

### View Full Document

## 15

## 0

## Popular in Course

## Popular in Department

This 49 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Arizona taught by a professor in Fall. Since its upload, it has received 15 views.

## Reviews for Class Note for MATH 564 at UA

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Notes for probability John Kerl February 37 2008 Abstract This is a crib sheet for probability Content is taken from Dr Tom Kennedy s splendid lectures for Math 564 probability at the University of Arizona in spring of 2007 That is7 these are except for my appendices simply my handWritten class notes With some examples omitted for brevity7 made legible and searchable When I take a course my two fundamental questions for myself are 1 What do I expect myself to know and 2 How do I Imow I know it A crib sheet such as this one addresses the rst question homework assignments address the second Contents Contents 1 Events lil Fundamental de nitions i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 112 Conditioning and independence i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 2 Discrete random variables 21 2 2 23 De nitions i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 1 Catalog of discrete random variables i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i Expectations 3 Multiple discrete random variables 31 3 2 3 3 3 4 3 5 36 De nitions i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i Expectations i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 7 Independence i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 1 Sample mean i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i Momentgenerating functions and characteristic functions i i i i i i i i i i i i i i i i i i i i Sums of discrete random variables 4 Continuous random variables 41 4 2 4 3 4 4 4 5 46 De nitions i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 1 Catalog of continuous random variables The normal distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i The gamma distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 1 Functions of a single random variable i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i Expectations 5 Multiple continuous random variables 51 5 2 5 3 54 De nitions i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 1 Independence i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i Expectations The HD paradigm Sn and Y7 i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 10 10 10 10 11 12 12 14 14 15 16 17 17 17 19 19 20 21 55 Functions of multiple random variables i l l i l i i i i i i i i i i l i l i i i i i i i i i i l i l 21 5 6 Momentgenerating functions and characteristic functions i i i i i i i i i i i i i i i i i i i i 22 57 Change of variables i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 22 5 8 Conditional density and expectation i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 23 59 The bivariate normal distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 24 5 10 Covariance and correlation i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 24 6 Laws of averages 26 61 The weak laW of large numbers i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 26 62 The strong laW of large numbers i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 27 63 The central limit theorem i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 28 64 Con dence intervals i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 28 A The coin ipping experiments 30 All Single coin ips i l l i l i i i i i i i l i l l i l i i i i i i i i i i l i l i i i i i i i i i i l i l 30 A2 Batches of coin ips i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 31 B Bayes7 theorem 34 Bil Algebraic approach i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 B2 Graphicalnumerical approach i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 BS Asymptotics i l i l l i l i i i i i i i l i l l i l i i i i i i i i i i l i l i i i i i i i i i i l i l 36 B4 Conclusions i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 37 C Probability and measure theory 39 D A proof of the inclusionexclusion formula 42 References 46 Index 47 1 Events 11 Fundamental de nitions De nitions 11 When we do an experiment we obtain an outcome The set of all outcomes conven tionally written 9 is called the sample space Mathematically we only require 9 to be a set De nition 12 An event is intuitively any subset of the sample space Technically it is any measurable subset of the sample space Example 13 The experiment is rolling a 6sided die once The outcome is the number of pips on the top face after the roll The sample space 9 is l2345 6 Example events are the result of the roll is a 377 and the result of the roll is odd De nition 14 Events A and B are disjoint if A B 0 De nition 15 A collection 7 of subsets of Q is called a a eld or event space if 0 6 f f is closed under countable unions and f is closed under complements Note in particular that this means 9 E f and f is closed under countable intersections Remark 16 There is a super cial resemblance with topological spaces A topological space X has a topology T which is a collection of open77 subsets that is closed under complements nite rather than countable intersection and arbitrary rather than countable union Examples 17 The smallest or coarsest a eld for any 9 is 09 the largest or nest is 29 the set of all subsets of Q There may be many a elds in between For example if A is any subset of 9 which isnlt Z or Q then one can check that the 4 element collection 0AACQ is a a eld De nition 18 A probability measure P is a function from a a eld f to 01 such that o For allA 6f PA 2 0 0 139 l and 130 0 o If A1 A2 is a nite or countable subset of f with the Ails all pairwise disjoint then PuA Z Pm This is the countable additivity property of the probability measure P Remark 19 For uncountable Q 29 is a a eld but it is impossible to de ne a probability measure on it it is too big Consult your favorite textbook on Lebesgue measure for the reason why For nite or countable Q on the other hand 29 is in fact what we think of for 7 De nition 110 A probability space is a triple 97 P of a sample space 9 a a eld f on Q and a probability measure P on 7 Remark 111 Technically a probability space is nothing more than a measure space 9 with the additional requirement that 139 1 12 Conditioning and independence De nition 112 Let AB be two events with PB gt 0 Then we de ne the conditional probability of A given B to be PA B PAB MB Mnemonic intersection over given Notation 113 We will often write PA7 B in place of PA N B De nition 114 Two events A and B are independent or pairwise independent if PA B PAPB Mnemonic write down PA PA B PA BPB and clear the denominators Remark 115 This is not the same as disjoint If A and B are disjoint7 then by countable additivity of P7 we have PA U B PA PB De nition 116 Events A17 An are independent if for all I Q 11 7n P ieIAZ H PA i6 Mnemonic We just look at all possible factorizations Example 117 Three events A7 B7 and C are independent if PA B PAPBPA C PAPCPB C PBPC7 and PA B C PAPBPC Theorem 118 Partition theorem Let be a countable partition of and let A be an event Then PltAgt ZPltABgtPltBgt Proof Note that PA Z PA n B since the Bi7s partition 9 D 2 Discrete random variables 21 De nitions De nition 21 A random variable X is a function X 9 7gt R We say X is a discrete random variable if its range7 XQ7 is nite or countable Example 22 Roll two 6sided dice and let X be their sum De nition 23 Given a random variable X7 the probability mass function or PMF of X7 written x or fXI7 is PX I PX lz This is the probability that X s value is some speci c real number 1 Note that X 1z the preimage of z is an event and so we can compute its probability using the probability measure P De nition 24 Let X1 and X2 be two random variables on two probability spaces 91771131 and 9272 P27 respectively Then X1 and X2 are identically distributed if X1 X2z for all z E R Remark 25 Tdentically distributed discrete random variables X and Y have the same PMF Remark 26 Just as with events7 which have a general measuretheoretic de nition de nition 127 there is also a general measuretheoretic de nition for random variables they are simply measurable functions from a probability space to a measurable space 22 Catalog of discrete random variables Note mean and variance are de ned in section 23 They are included here for ready reference Bernoulli DRV 0 Parameter p E 01 Range X 01 o PMF PX0p7 PXll7p 0 Example Flip a p weighted coin once 0 Mean 1 7 p o Variance p1 7 p Binomial DRV 0 Two parameters 0 E 07 l and n E Z o RangeX0l2n PMF PX k gpk 7 p k 0 Example Flip a p weighted coin n times X is the number of heads 0 Mean np o Variance npl 7 p Remark 27 There is a trick to see that the sum of probabilities is l 7 recognize the sum of probabilities as the expansion of p 1 i 10 using the binomial theorem 1 1 7 20 1727 7 ZltZgtpklt1 7 p k0 Poisson DRV 0 Parameter A gt 0 Range X 012 PMF PX k gm Example Limiting case of binomial random variable With large n small p and A np m 1 Mean A o Variance A Geometric DRV 0 Parameter p 6 01 0 Range X l23 PMF PX k pl 7pk 1 Example Flip a pweighted coin until you get heads X is the number of ips it takes Note some authors count the number of tails before the head 0 Mean lp Variance l 7 pp2 Negative binomial DRV 0 Two parameters p 6 01 and n 1 2 3 Range X nnln2 PMF PX k bpm 7 pk n 0 Example Flip a iiiweighted coin until you get n heads X is the number of ips it takes Note Deriving the factor is a nontrivial counting problem it is deferred until later in the course Mean np o Variance nl 7 pp2 23 Expectations De nition 28 Functions of a discrete random variable If X 9 7gt R and g R 7gt R then gX 9 7gt R is another random variable Let Y gX To nd the PMF ofgX given PMF of X write the latter as fX PX Then P9Xy Z PXI Z fxr 169404 169 1y De nition 29 Let X be a discrete random variable with PMF fX If E mmz lt 00 7 ie if we have absolute convergence then we de ne the expected value also call expectation or mean of X to be EX 2mm ZIPX z a Mnemonic This is just the weighted sum of possible X values weighted by their probabilities Theorem 210 Law of the Unconscious Statistician Let X be a discrete random variable andg R 7gt R Let Y gX If 2mm fxltzgt lt 00 then ElYl ZQWVXWl De nition 211 The variance of X7 written 02X or VarX7 is 02X ElX EleVl De nition 212 The square root of the variance is the standard deviation7 written 0X or 0X Proposition 213 Let n Then 02X EX2 719m Proof FOIL7 and use linearity of expectation D Proposition 214 VarcX c2VarX Proof VarcX EC2X2 7 ECX2 62EX2 7 52EX2 c2 EX2 7 EX2 C2VarX Theorem 215 Let X be a discrete random variable let ab 6 R Then 239 EaXb aEX b If PX b 1 then EX b fa S X S b then a S EX S b iv Ifgh R A R andgXhX have nite means then EgX hX EgX EhX De nition 216 Let X be a discrete random variable and let B be an event The conditional PMF of X given B is fIlB PX IlB The conditional expectation of X given B is EleBl ZINE provided as usual that the sum converges absolutelyi Theorem 217 Partition theorem Let be a countable partition of and let X be a random vaiiable Then Em ZElxlBZiPwly 3 Multiple discrete random variables 31 De nitions De nition 31 We de ne the joint PMF or joint density of X and Y to be Isa179 PX LY 9 Proposition 32 For A Q R2 we have 1317 y E A Z fXyzy acy6A Corollary 33 Let gzy R2 A R Let X and Y be discrete random variables and let Z gXY Then hPwa E kymw 269 1wy We can use joint densities to recover marginal densities Corollary 34 Let X and Y be discrete random variables with joint density ny Then fXI ZfXYIvy and fYy ZfXYIvy 32 Expectations Theorem 35 Law of the Unconscious Statistician Let X and Y be discrete random variables and let gzy R2 A R Let Z gXY Then ElZl ZQvalevayl 1W Proof As in the singlevariable case theorem 35 1 Corollary 36 Let X and Y be discrete random variables and let ab 6 R Then mmmwwwm Proof Use gz7 y az by7 and use the theorem twice D 3 3 Independence Recall de nition 114 of independent events A and B are independent if PA N B PAPB We use this to de ne independence of discrete random variables De nition 37 TWO discrete random variables X and Y are independent if7 for all z and y PX LY y PX zPY 10 Using PMF notation7 we say X and Y are independent if braLy fXIfYy for all z and y ie if the joint PMF factors Notation 38 We often abbreviate independent and identically distributed de nitions 37 and 24 as IID Question given only the joint density of X and Y7 can we tell if X and Y are independent From corollary 34 we can recover the PMFs of X and Y fXI ZfXY17 y and fYy foy yyl Then we can multiply them back together and see if we get the joint density back Point o If X and Y are independent7 then we can go from marginals to joint PMFs by multiplying 0 We can always go from joint PMFs to marginals by summing as above Theorem 39 IfX and Y are independent discrete random variables then EX Theorem 310 IfX and Y are independent discrete random variables and gh R A R then gX and hY are independent discrete random variables Corollary 311 In particular ElyXhYl ElyXlElhYl provided gX and hY have nite mean Theorem 312 IfX and Y are independent discrete random variables then VarX Y VarX VarY Proof Use de nition 211 and theorem 39 1 De nition 313 We say that X and Y are uncorrelated if EXY Remark 314 lndependent implies uncorrelated but not vice versa Remark 315 Theorem 312 holds for uncorrelated discrete random variables 34 Sample mean De nition 316 Let X17 Xn be independent and identically distributed Think of multiple trials of the same experiment Since each Xi has the same expectation7 we call their common mean the population mean and denote it by a Likewise we call their common variance the population variance and denote it by 02 Let Y 2211 Xi n This is a new random variable7 called the sample mean of X17 Xn 11 By linearity of expectation Recall from proposition that the variance scales as VarcX c2VarX Also since the Xils are indepen dent their variances add Thus 2 if 21Xi 71 n 7n2ia VarX7Varlt n in2Var 2X1 in2a 7 n 35 Momentgenerating functions and characteristic functions De nitions 317 Let X be a discrete random variable Then the momentgenerating function or MGF of X is MXt Ee X By the Law of the Unconscious Statistician theorem 210 this is MX t Z en fx 1 Likewise the Characteristic function of X is 520 E lei Xl Which is BXG Zemfx ac Remark 318 These functions are just computational tricks There is no intrinsic meaning in these functions Proposition 319 Let MX t be the momentgenerating function for a discrete random variable X Then Em M fgtltogt gm t0 Proposition 320 IfX and Y are identically distributed they have the same PMF remark 25 and thus they also have the same MGF Proposition 321 Let X and Y be independent discrete random variables and let Z X Y Then Mzt MxtMyt 36 Sums of discrete random variables Momentgenerating functions are perhaps a better approach than the following Let X and Y be discrete random variables and let Z X Y We can nd the PMF of Z by fzz PZ 2 PXY 2 Z fxy1y Z Z fxy1y Ef gy127 zyz 1 311771 12 Now further suppose that X and Y are independent Then fXY172 I fXIfYZ 7 I SO m2 fozfyzi zgti This is the convolution of fX and fyi Note that7 as always in convolutions7 we are summing over all the ways in Which I and y can add up to 2 13 4 Continuous random variables 41 De nitions A continuous random variable mimicking de nition 21 is a function from Q to R with the property that for all z E R PX z 0 Thus the PMF which we used for discrete random variables is not useful Instead we rst de ne another function namely PX S De nition 41 The cumulative distribution function or GDP for a random variable whether discrete or continuous is PX S Theorem 42 The CDF FX for a random variable satis es the following properties 0 0 g FX S l and FX is nondecreasing o limgcn00 FX 0 and limgchsr00 FX l 0 FX is rightcontinuous the sense from introductory calculus De nition 43 A random variable X is a continuous random variable if there exists a function fx 1 called the probability density function of PDF such that the CDF FX is given by fX dz Remark 44 Note that the integral is done using Lebesgue measure or for the purposes of this course Riemann integration If we allow counting measure and require countable range then we can subsume discrete random variables into this de nition However that is outside the scope of this course Remark 45 Some random variables are neither discrete nor continuous these appear for example in dynamical systems Remark 46 The PDF and CDF are related as follows making use of the second fundamental theorem of calculus Fxltzgt mm f lt gt 7 EF lt gt X I 7 dz X I Remark 47 The PDF of X is the probability that X lies in some interval Eg b Pa X b Thus PDFs are nonnegative and have integral 1 Remark 48 We can now neatly de ne some terminology from statistics Namely Let X be a random variable from R to R with CDF FX and x o The mean of X is the expectation value EX as de ned below 0 The median of X is F 105 A mode of X is a local maximum of fX If the PDF has two local maxima we say that X is bimodal If the PDF has a single maximum we call it the mode of X 14 42 Catalog of continuous random variables Note mean and variance are de ned in section 46 They are included here for ready reference I thought about including graphs of the PDFs and CDFs7 but instead I Will refer you to the excellent Wikipedia article on Probability distn39bution and the pages linking from there Uniform CRV 0 Parameters a lt b 0 Range m b 0 PDF 0 CDF 0 Mean ab2 o Variance bia212 Exponential CRV 0 Parameter A gt 0 0 Range X 000 0 PDF 0 CDF 0 Mean UK 0 Variance lA2 Cauchy CRV o No parameters 0 Range X 70000 0 PDF 1 my a S I S b fX 07 elsewhere 07 z lt a FXI a zltb 07 12 1 Ae z 2 0 fXz 0 zlt0 17 6 z gt 0 F y Xm 0 z lt 0 fXI 15 o CDF 0 Mean in nite o Variance in nite Normal CRV see section 43 0 Parameters M E R a gt 0 0 Note with M 0 and a 1 we have the standard normal distribution One says it has zero mean and unit van39zmce 0 Range X 70000 0 PDF f lt gt 1 1 lt3 7 M gt 2 z 7 ex 7 X ax27r p 2 U o CDF No closedform expression Related to erfz but not quite the same The standard normal CDF is l erfz Note that some computing systems may have erfz but not normalcdf7 or vice versa Thus this conversion formula comes in handy 0 Mean M o Variance 02 Gamma CRV see section 44 0 Parameters A gt 07 w gt 0 0 Range 1 2 0 0 PDF Aw fX Fltwgt zwileimw 2 0 o CDF TBD 0 Mean wA Variance wA2 43 The normal distribution De nition 49 The normal distribution has two parameters real M and positive a A random variable With the normal distribution has PDF fzal explt1 gt2 Remark 410 The normal distribution has mean M and variance 02 De nition 411 With M 0 and a 1 we have the standard normal distribution 16 44 The gamma distribution De nition 412 The gamma function is de ned by 00 lquotw zwileixdz 0 Remark 413 Integration by parts shows that lquotn n71l Thus7 the gamma function is a generalized factorial function De nition 414 The gamma distribution has two parameters Aw gt 0 A random variable with the gamma distribution has PDF Wkgjzw le z 2 0 07 z lt 0 Remark 415 With w 1 we have an exponential distribution Thus7 the gamma distribution generalizes the exponential distribution Proposition 416 The gamma distribution has mean wA 45 Functions of a single random variable Given a continuous random variable X and 91 R A R we have a new continuous random variable Y gX Often one wants to nd the PDF of Y given the PDF of X The method is to rst nd the CDF of Y7 then differentiate to nd the PDF Example 417 Let X be uniform on 01 and let Y X2 The CDF of Y is W PngPX2gyPXgmO Maia3 ie 07 ylt 0 My W7 0Sylt1 17 1 S y Then the PDF is 07 ylt 0 My 2 0ylt1 07 1S y The critical step in the method is going from PX2 S y to PX S 46 Expectations De nition 418 Let X be a continuous random variable with PDF fX 1f 00 700 17 ie if we have absolute convergence then we de ne the expected value also call expectation or mean otho be 00 EX dz foo Mnemonic As in the discrete case7 this is just the weighted sum of possible X values7 weighted by their probabilities De nition 419 Let X be a continuous random variable The variance of X is MM7W where n By the corollary to the Law of the Unconscious Statistician below7 VarX EX2 7 EX2 Theorem 420 Law of the Unconscious Statistician Let X be a continuous random variable andg R A R Let Y gX If 00 mmaWMltw then Emawammltw Corollary 421 IngL R gt R and ab 6 R then ElagX bhXl aEgXl bEhXgtl 18 5 Multiple continuous random variables 51 De nitions De nition 51 Given two continuous random variables X and Y7 we de ne their joint GDP to be FELTIvy PX S LY S De nition 52 Two random variables X and Y are jointly continuous if there exists a function fxyr y their joint PDF such that FXY17 y E y fXYu7 Ud Udu The joint PDF is thought of as 051 yd Pa X bc Y d fXyzydydz 111 yc and more generally for A Q R2 1317 y E A Afxy17ydyd1 The joint CDF and joint PDF are related by 62 fXYIv y MFXY17 y FXYIvy fuvdvdur Given a joint CDF Fxy17y we may recover the marginal CDFs FX and Fyy by FXI 11311100 FXYIzy and similarly for Fy How do we recover the marginal PDFs given the joint PDFs To derive the formula7 differentiate the CDF fX FX gFXJI 00 gj fXyuvdu dv ii 0 fXyuvdv du O fXyzv d1 ie the formula is Mr foo 10mm dy 700 That is7 we integrate away one variable 19 52 Independence De nition 53 Two continuous random variables X and Y are independent iff their CDFs factor BaxIvy FXIFYyA Remark 54 For discrete random variables7 we de ned independence de nition 37 in terms of the factoring of the PMFs for continuous random variables7 we de ne independence in terms of the factoring of the CDFs Factorization of PDFs does hold as shown in the following theorem7 but it is a consequence rather than a de nition Theorem 55 Two continuous random variables X and Y are independent i their PDFs factor braLy fXIfYy Proof 62 fXYIay WFXJWW 62 azaylFXIFYyl Fxltzgt1 6 PM fXIfYy 53 Expectations Theorem 56 Law of the Unconscious Statistician Let X and Y be continuous random variables and gzy R2 A R Let Z gXY If 00 00 l9lt179leYIydI dylt 00 then 00 00 EZ m 00 gzy fXyzy dzdy Corollary 57 Let X and Y be continuous random variables and let a7 b E R Then EaXbY aEX bEY Theorem 58 Let X and Y be independent continuous random variables and let 97 h R A R Then EgXhY EgXEhYl Corollary 59 Let X and Y be independent continuous random variables Then EXY Corollary 510 Let X and Y be independent continuous random variables Then VarX Y VarX VarY 20 Proof Computation D Remark 511 Expectations always add They multiply only when X and Y are independent Variances add only when X and Y are independent Theorem 512 Let X and Y be independent random variables and let gh R gt R Then gX and hY are independent 54 The IID paradigm Sn and Xn Notation Let Xn be a sequence of HD random variables7 with common mean M and common variance 02 We write n 5 2X7 i1 and 7 1 Xn 7 Xn The latter is called the sample mean of the ans Mean and variance of Sn We already know Em E 213 W and using independence of the trials to split up the variance VarSn Var Xi iVarXZ n02 i1 Mean and variance of Xn Likewise Em E liXi limxi int t n i1 n i1 n and using independence of the trials to split up the variance v Y7V lix iiv ix iiiv M73271 at n7 ax ni1 Z inQ ar i1 l inQil at linQ 77 55 Functions of multiple random variables Let XY be continuous random variables and let gzy R2 A R Let Z gXY How do we nd the PDF of Z As in the univariate case section 457 the method is to rst nd the CDF of Z7 then differentiate Need to type up the example from 321 Need to type up convolution notes for independent X7 Y from 323 fzz 7 zdz 21 56 Momentgenerating functions and characteristic functions De nition 513 Much as in the discrete case de nitions 3 177 00 MXt Ee X emezdz and BXt Eei X 00 eimezdzl 700 Proposition 514 We have 239 MPH Em fY aX b then Myt eb MXat IfX and Y are independent then Mxyt MX tMyt Example 515 The standard normal random variable Z has tedious computations omitted here moment generating function M20 expltt22gt The general normal random variable X u O39Z has momentgenerating function MX t exput 02t22l Proposition 516 Let X1 Xn be independent normal random variables with means hi and variances 012 Then Y EXi has normal distribution with mean EM and variance 2012 57 Change of variables Let X and Y be continuous random variables If T R2 A R2 is invertible7 sending U7 V TXY7 then fzydzdy fzuvyuv Juvl dudv D TD Where Juv is the Jacobian matrix Wag 222 Theorem 517 Let X and Y be jointly continuous random variables with PDF fxyr Let D Iy1fxy17ygt 0 ie the range of X7Y Suppose T D A S Q R2 is 11 and onto De ne new random variables U and V by U7 V TX7 Y Then ny uy v fxy 1u7 v7 yu7 v Ju7 v ifu7 v E S 0 otherwise where Juv is as above 22 58 Conditional density and expectation Given two continuous random variables X and Y7 we want to de ne PX Y Since Y y is a null event7 the usual intersectionovergiven notion of conditional probability see section 12 will give us zero divided by zero Somewhat as in l Hopital7s rule in calculus7 we can nonetheless make sense of it De nition 518 Let XY be jointly continuous with PDF fXyzy The conditional density of X given Y is no w 0 anI y 07 fyy 0 Remark 519 Recall that 100 fYy bra70M 1700 so we can think of the conditional density of X given Y as being faxLy prI y whenever the denominator is nonzero De nition 520 Let XY be jointly continuous with PDF ny The conditional expectation of X given Y is xoo Emu4 fx ymwdm 1700 De nition 521 EX Y y is a function of y it must depend only on y Call it Then gY is a random variable7 which we write as EX This new random variable has the following properties Theorem 522 Let XY7 Z be random variables a7 b E R and g R A R Then 0 EMY a EaX bZlY eEXlY bEZlY Note this is linearity on the left linearity on the right emphatically does not hold IfX 2 0 then EXY 2 0 If X7Y are independent then EXlY Mnemonic Y gives no information about X o This is the partition theorem in disguise See below EXgYlY gYEXlY Mnemonic given a speci c y gy is constant Special case EgY Y gY 23 59 The bivariate normal distribution If X and Y are independent and standard normal7 then their joint PDF is M k M 1 faxLy 7 gen lt7 2 It is possible for X and Y to not be independent7 while their marginals are still standard normal In fact there is a lparameter family of such XY pairs De nition 523 Let 71 lt p lt 1 The bivariate normal distribution with parameter p has PDF f l lt I272pzyy2gt 1 ex 7 X Y y 27rx17p2 p 20722 Remark 524 Note the following o It is straightforward but tedious to verify that f f jcxgz7 y l 0 With p 0 we obtain equation as a special case 0 One may complete the square and use translation invariance of the integral to nd that the marginals are in fact univariate standard normals 0 Again completing the square7 one nds that fy Xy z is normal with n pm and a2 1 7 p2 510 Covariance and correlation De nition 525 Let X and Y be random variables with means MX and My respectively The covariance of X and Y is covltXYgt EltX 7 my 7 My WM 7 Ewan Mnemonic 526 With Y X we recover the familiar formula de nition 419 for the variance of X EX2 7 EX2 Theorem 527 Let X and Y be random variables Then VarX Y VarX 2CovX Y VarY Remark 528 Bill Faris calls this the most important theorem in probability Corollary 529 If CovX7 Y 0 then VarX Y VarX VarY Theorem 530 IfX and Y are independent then CovX7 Y 0 The converse does not hold De nition 531 Let X and Y be random variables with variances 0 and a respectively The correla tion coe icient of X and Y is Cov X 7 Y X7Y aXaY Remark 532 The covariance is quadratic in X and Y with respect to linear rescaling the correlation coef cient is scaleinvariant Theorem 533 The correlation coe cient satis es 71 pXY S l 24 Proof lt suf ces to show lpXYl S l which is equivalent to showing lCovXYl S aXayi Faris claims that this follows from the Schwarz inequality lu Vl S although I don7t see the immediacy He then sketches another route which I complete herei Normalize X and Y as follows Let MX My 0X and O39y be their means and standard deviations respectively Then X 7 7 X and Y My O39X O39y each have zero mean and unit standard deviation In particular this will mean below that their second moments are 1 We can create a new pair of random variables 2 ltX MX i Y Mygt O39X 01 Since each takes nonnegative values the means are nonnegative as well xxx Xref forward to where this is proved H it seems obvious but actually requires proof EKXWX i Yew 0X O39y gt0i FOlLing out we have ElltXXgt2i2ltX WgtltYYgt2 gt0i Using the linearity of expecation and recalling that the normalized variables have second moments equal to 1 we have 2 i 2E 2 0 UXO39Y 1313 Memoem S 1 O39XO39Y UXUY S E X MXY MYl S UXUY iUXO39y S CovXY S aXay lCovXYl S aXayi 25 6 Laws of averages Here is a statistics paradigm Run an experiment n with HD random variables Xn whether continuous or discrete The ntuple X1 Xn is called a sample The average of X1 through Xn is called the sample mean written Xn it is also a random variable The big question is what does Xn look like as n gets large For example roll a 6sided die so a 35 a million times What is the probability of the event an 7 35 gt 001 One would hope this probability would be small and would get smaller as n increases We have two main theorems here 0 The law of large numbers says that X7 7 M although we need to de ne the notion of convergence of a random variable to a real number There are two avors of convergence weak and strong 0 The central limit theorem describes the PDF of X 61 The weak law of large numbers De nition 61 Let Xn and X be random variables We say Xn 7gt X in probability if for all 5 gt 0 lim Pw E Q anw 7 Xwl gt 5 0 More tersely we may write Pan7Xl gt a 70 Theorem 62 Weak law of large numbers Let Xn be an HD sequence with common mean a and nite variance 02 Then XnXZ7gtM in probability Here is another notion of convergence De nition 63 Let Xn and X be random variables We say X7 7 X in mean square if EXn 7 X2 7 0 Theorem 64 Chebyshev s inequality Let X be a random variable with nite variance Let a gt 0 Then PltX7m2agtE X 2l a2 Remark 65 This means Var X 2 Pow42a a Proof Use the partition theorem with two partitions on some asyetunspeci ed event A EltX7 m 7 E ltX 7 i l A M E ltX 7 i AC PM 26 Regardless of what A is7 the last term is nonnegative So we have EX 2 m 2 E ltX 7 m2 M PltAgt Now let A be the particular event that lX 7 M 2 a Then we have EKX MVl 2 EX M2llX l2alPlX Ml2a EX M2lX M22a2lPlX Ml2a 2 a2PlX7ilZa Dividing through by a2 we have ElX M21 a2 as desired D Pow42a Theorem 66 Convergence in mean square implies convergence in probability Proof Let 5 gt 0 We need to show 13an 7 X1 gt 5 A 0 By Chebyshev7s inequality7 P0X 2 X12 5 2 52m 2 X Remark 67 Notes about Chebyshevls inequality o It is used in the proof of the weak law7 which I am omitting o The bounds provided by Chebyshevls inequality are rather loose7 but they are certain The central limit theorem gives tighter bounds7 but only probabilistically 62 The strong law of large numbers Here is a third notion of convergence De nition 68 We say Xn A X with probability one wip 1 or almost surely as if P w E Q Xwgt 1 More tersely7 we may write PXn a X 1 Theorem 69 Strong law of large numbers Let Xn be an ID sequence with common mean n Then Xn A a with probability 1 Theorem 610 Xn A X wp 1 implies Xn A X in probability Remark 611 This means the strong law is stronger than the weak law 27 63 The central limit theorem Motivation Let Xn be an HD sequence and let S We know from section 54 that Emu nh and VarSn n02 We expect the PDF of Sn to be centered at nh with width approximately J50 Likewise if Xn EXnn then we know that h and VarXn UQn We expect the PDF of Sn to be centered at nh with width approximately an For various distributions one nds empirically that these PDFs look approximately normal if n is large De nition 612 Let X be a random variable with mean n and variance 02 The standardization or normalization of X is X 039 In particular if we standardize Sn we get Sninh Zn a with mean 0 and variance 1 l 1 n02 VarZn EVangn 7 HM TVangn g l 2n Likewise if we standardize Xn we get i Xn M an The central limit theorem says that the standardizations of Sn and Xn both approach standard normal for large n De nition 613 Let I be the CDF of the standard normal 1 00 7122 I E e dz Theorem 614 Let Xn be an ID sequence with nite mean it and nite nonzero vaiiance 02 Let Zn be the standardization of 5 as above Then PZn S I gt z for all 1 Remark 615 This convergence is called convergence in distribution xxx include some examples here 6 4 Con dence intervals Here is an application of the Central Limit Theorem to statistics The population is a random variable X A sample of size n is n HD copies of X The random variable X has a true population mean nX but we do not know what it is all we have is the sample mean Xn 221 Xkn which is an estimate of the population mean We would like to put some error bars on this estimate 28 We quantify this problem using the notion of con dence intervals We look for 5 gt 0 such that P 17 7W1 2 a 0105 or7 alternatively7 P Kn 7 m lt a 0195 Five percent is a conventional value in statistics Using the CLT7 we treat 7 as being approximately normal We standardize it statisticians call this taking the 2score in the usual way Z7nm Yn l X a aX Then Payn MXl25Plyn LX125gt lerAxl i Plt am 2 m 1322 oiosi ltls worth memorizing or you can compute it if you prefer that the standard normal curve has area 0195 for Z running from 7196 to 196 So7 5aX7L should be 1196 Solving for 5 in terms of n gives 1 960X 5 Note that this requires the population standard deviation to be known We can compute the sample standard deviation and use that as an estimate of the population standard deviation 0X but welve not developed any theory as to the error in that estimate Example 616 D Let X have the Bernoulli distribution with parameter p 7 ip a coin with probability p of heads assign the value 0 to tails and 1 to headsi Recall from section 212 that X has mean p In section 212 we took tails to be 1 1 have changed the convention Suppose you ip the coin 1000 times ie n 1000 and obtain 520 heads Then Yn 0520 Then 71964111710 N 5 7 W N 0 0619xp1 7 p For p 0 57 5 m 0031 Thus we are 95 certain that ex is within 0031 on either side of 05207 ie between 0489 and 0551 lt1 29 A The coin ipping experiments This section is an extended worked example tying together various concepts We apply the Central Limit Theorem rst to repeated tosses of a single coin then to repeated collections of tosses A1 Single coin ips The rst experiment is tossing a single coin which has probability p of heads Then 9 THi Let X be the random variable which takes value 0 for tails and 1 for heads As discussed in section 212 X has the Bernoulli distribution with parameter p 1will allow p to vary throughout this section although 1 will focus on p 05 and p 016 Recall that X has mean MX p In section 212 we took 1 for tails which is the opposite convention from the one here Its standard deviation 0X 01 0 which is V02 05 and V0124 W 04899 for p 05 and p 016 respectively Now ip the coin a large number n of times 7 say n 1000 7 and count the number of heads Using the notation of section 514 the number of heads is Sni There are two ways to look at this 0 On one hand from section 22 we know that Sn is geometric with parameters p and n 7 if we think of the 1000 tosses as a single experiment This is precisely what we will do in the next section The PMF of Sn is the one involving binomial coef cients we would expect MS 1 ie 500 or 600 and Us V 7100 i P which is V250 m 1581 or 240 15149 for p 0 5 and p 016 respectively 0 On the other hand the Central Limit Theorem section 613 says that as n increases the distribution of Sn begins to look normal This PDF involves the exponential function as shown in section 514 The mean of the sums Sn is Ms nMX np again 500 or 600 The standard deviation of those sums about the means 500 or 600 is as m EUX xnp1 7 p which are again 1581 and 1549 respectively Note that the binomial PMF and the normal PDF are not the same even though they produced the same means and standard deviations the geometric random variable has an integervalued PMF 13499 1 S 5 S 49919 0 and likewise Sn can never be anything out of the range from 0 to 1000 The normal PDF on the other hand gives 13499 1 S Sn S 49919 f 0 since we are taking area under a curve where the function is nonnegativei Since the output of the exponential function is never 0 the normal PDF gives nonzero although admittedly very very tiny probability of Sn being less than 0 or greater than 1000 See also MM for some very nice plots 30 Now we can ask about fairness of coins The probabilistic point of View is to x p and ask about the probabilities of various values of SW 1f the coin is fair what are my chances of ipping anywhere between 470 and 530 heads Using the geometric PMF is a mess 7 in fact my calculator canlt compute 137 without over owi Using the normal approximation though is easy I asked my TlSS to integrate its normalpdf function with M 500 and a 15181 from 470 to 530 and it told me 094221 How surprised should 1 be if1 toss 580 heads The standardization which call the 2score of Sn is de nition 612 Sn Ms 2 if as This counts how many standard deviations away from the mean a given observation is 1 have 801581 m 5106 so this result is more than ve standard deviations away from the mean 1 would not think the coin is fair lfl redo that computation with p 06 Ms 600 and as 15149 1 get a 2score of 71129 which is not surprising if the coin has parameter p 06 The point of view used in statistics is to start with the data and from that try to estimate with various levels of con dence what the parameters are Given a suspicious coin what experiments would we have to run to be 95 sure that we7ve nd out what the coin7s parameter p is to within say i001 Continuing the example from section 614 we ask for 5 001 We had 87 1 96aX so setting 8 001 and solving for n we have n 1 96axgt2 Mgt2 38416 101717 001 001 Now p1 7 p has a maximum at p 05 for which n 9604 so that many ips would determine p to within i001 with 95 con dence Re doing the arithmetic with 0001 in place of 001 gives n 960 400 Generalizing we see that each additional decimal place costs 100 times as many runsi A2 Batches of coin ips The second experiment is 1000 tosses of a coin where each coin has probability p of heads Or think of simultaneously tossing 1000 identical such coinsi Then 9 21000 W 103011 Let Y Q A R be the random variable which counts the number of heads This is an example where the random variable is far easier to deal with than the entire sample space which is huge As discussed in section 212 Y has the binomial distribution with parameters p and n 1000 Recall from the previous section that that Y has mean MY 1000p eg 500 or 6001 Likewise its standard deviation is ay 1000P1 10 which is V250 15181 or 240 m 1549 for p 05 and p 06 respectively Let YN be the average of N runs of this experiment 7 that is the sample mean The Central Limit Theorem section 613 says that as N increases the distribution of YN begins to look normal The mean of the sample means is 2 MY 1000107 31 which is again 500 or 600 The standard deviation of the sample means is a Naif WV YNNW N This is 15181V or 15149V respectively Here is the crucial interpretation Given the parameters p and n of Y there is a true population mean and a true population standard deviation If p 015 then 0y m 1581 and even if we7re on the three millionth iteration of the ip1000 coins experiment its still going to be quite likely as ever that welll get a 514 or a 492 and so on If we donlt know the true p of the coins then while the true population mean My and the true population standard deviation O39y exist we don7t know what they are All we have is some suspiciouslooking identical coins and our laboratory equipment As we run and rerun the ip1000 coins experiment the following happens 0 The sample mean MVN for increasingly larger N will approach the population mean My 0 The variations of the sample mean will decrease We might say the error in our estimate of the population mean is shrinking o The population standard deviation appears in the above formulas via its effects on the standard devi ation of the sample mean 0 Nothing we have done so far has given us a reliable connection between the sample standard deviation and the population standard deviation We can guess that the sample standard deviation approaches the population standard deviation but in this course we have not developed any information about the error in that computationi Here is a numerical example 1 run simulated on a computer the ip1000 coins experiment 400 times The rst time I get 482 so the sample mean is 482 The second time 1 get 521 so the sample mean is 482 5212i The third time 1 get 494 so the sample mean is 482 521 4943 and so on Here is p 015 l N l Y l 7N l Sample stdi devi l 1 482 4821000 NA 2 521 5011500 271577 3 494 4991000 191975 4 512 5021250 171557 5 485 498800 171050 6 507 5001167 151613 395 493 5001258 161163 396 505 5001270 161144 397 501 5001272 161124 398 494 5001256 161107 399 501 5001258 161086 400 474 5001192 161120 Here is p 0 6 32 N Y 7N Sample std deV 1 616 616000 NA 2 584 600000 221627 3 620 6061667 191732 4 617 6091250 161919 5 583 604000 181775 6 613 6051500 171190 395 608 5991484 17031 396 615 5991523 17027 397 609 5991547 17013 398 607 5991565 161995 399 604 5991576 161975 400 611 5991605 161964 33 B Bayes theorem Bayes7 theorem is so important that it merits multiple points of view algebraic graphical and numerical B1 Algebraic approach Recall from de nition 112 that if A is an event and if B is another event with nonzero probability the conditional probability of A given B is Pmnm MAE PB Bayes7 theorem tells us how to invert this how to compute the probability ofB given A First the algebraic treatmenti Theorem B1 Bayes7 theorem Let A and B be events with nonzero probability Then Hammmmg Proof Using the de nition intersection over given we have Pwnm Pam Hm Multiplying top and bottom by PB which is OK since PB 0 we get Pwnmgya HEW WHB Now notice that B WA is the same as A NB so in particular PB HA is the same as PA NB Transposing the terms in the denominator gives us PmnBHm HE g HE HAW m PBlA as desired 1 B2 Graphical numerical approach The following example is adapted from Suppose that 08 of the general population has a certain disease and suppose that we have a test for it Speci cally if a person actually has the disease the test says so 90 of the time If a person does not have the disease the test gives a false diagnosis 7 of the time When a particular patient tests positive what is the probability they have the disease We can write this symbolically as follows Let D be the event that the person has the disease and b be its complement let Y for yes be the event that the test says the person has the disease with complement 34 7 Writing the given information in these terms we have PD 0008 PYlD 090 POW 007 Before computing any conditional probabilities let s nd the probability of Y by itself Using the partition theorem theorem 118 we know this is PY PmD PD Pm HF 090 0008 007 0992 0077 So the nonconditional probabilities are H 0992 PW 0923 PD 0008 PY 0077 Looking at D and Y separately we can think of the population at large as being split into those with and without the disease and those for whom the test is positive or negative Suppose in particular that we have a sample of 1000 people who are representative of the general population Here are some very rectangular Venn diagrams i D D ll 992 8 923 77 lt7 Bayes7 theorem has to do with how these two partitions intersect to make four groups lii Fl 7 Dllil lii Fl 7 Dllil PmD 992 7 130713 s 7 and PDlY 923 7 PDlY 923 7 Pm 992 7 130713 s 7 HEY 77 7 PDlY 77 7 ltlt ltlt We can use the theorem to nd the probability our patient has the disease given the positive test result PD PDlY PltY7Dgt 0008 0 90 39 0077 0094 That is there7s only a oneineleven chance the patient actually has the disease Kaplan and Kaplanls idea is to look at this surprising result in the context of the other 999 people also tested 1 will elaborate on this working out all the math We have two conditional probabilities given PYlD and PYlb We found PDlY What about PDl7 We can use the partition theorem theorem 118 again to solve for what we don7t know in terms of what 35 we do know PD PDlYPY PDV 137 PD PDlY PO P D 7 7 lt l gt Pm 7 0008 7 0094 0077 0923 00008 We now have all four conditional probabilities POW 007 PDl7 00008 PYlD 090 PDlY 0094 Now we can ll out the foursquare table 0 Since PYlb 0077 seven percent of the 992 diseasefree people 70 of them get false positives the rest 922 get a correct negative result 0 Since PYlD 090 ninety percent of the 8 people with the disease test positive ie all but one of them one of the 8 gets a false sense of security 0 Since PDl7 000087 008 of the 923 with negative test results one person does in fact have the disease the other 922 as we found just above get a correct negative result 0 Since PDlY 00947 only 94 of the 77 people with positive test results 7 people have the disease the other 70 get a scare and7 presumably7 a retest So7 the sample of 1000 people splits up as follows nul 922 1 70 7 Moreover7 we can rank events by likelihood 1 Healthy people correctly diagnosed 922 2 False positives 7 3 People with the disease7 correctly diagnosed 07 4 False negatives 01 Now its no surprise our patient got a false positive this happens 10 times as often as a correct positive diagnosis B 3 Asymptotics The speci c example provided some insight7 but what happens when we vary the parameters We had PYlD 131 7 PYlDPD Pam PltYgt PltYDgtPltDgtPltYFgtPltDgt 36 Let7s turn this into a design problemi Let p PD a PmD b Pmn 175 Paw 5 HEY How would you choose the testdesign parameters a and 12 ie how good would your test have to be to get 5 small What if the disease is rarer p smaller Suppose we want a high likelihood of correct detection ie 5 small Then PW WWW mil pup HEY Wig 71107 Solving for a and b we get a 1 i 51 7 P 7 gt i b 8p There are two free parameters a and b so 1711 just consider their ratio Now the function 171 I blows up near zero for small 1 its approximately 11 For small 8 and p we have b i lt 511 a If say we have p and 5 both 0001 then a and I need to differ by a factor of a million Recall that a and b are both probabilities and so range between zero and one To test a oneinamillion event with 999976 con dence p 10 6 and 5 10 4 I must be less than 10 10i That tells us how to choose POW to in order to get PblY ie the probability of false positives small What about false negatives PDlV If you do similar algebra to the above you should nd that getting less than 5 requires 5 1 7 a lt 7 p This is a less strict constraint PYlD needs to be very close to 1 only when 8 lt p B 4 Conclusions Some points 37 o PYlD is information known to the person who creates the test 7 say at a pharmaceutical company PDlY is information relevant to the people who give and receive the test 7 for example at the doctors of ce This duality between design and implementation suggests that Bayes7 theorem has important consequences in many practical situations 0 The results can be surprising 7 after all in the example above the test was 90 accurate was it not Bayes7 theorem is important to know precisely because it is counterintuitivei 0 We can see from the example and the asymptotics above that rare events are hard to test accurately for If we want certain testing for rare events the test might be impossible or overly expensive to design in practical terms 38 C Probability and measure theory Probability theory is measure theory with a soul 7 Mark Kac Modern probability is a special case of measure theory but this course avoids the latter Here we draw the connections for the reader with a measuretheoretic background Full information may be found in FG but I like to have a brief handy reference See also Fol Rud or R0y Measure theory analysis Probability The sample space 9 is simply a set Same Measure theory analysis Probability A a eld 7 on Q is a subset of 2 satisfying the Same axioms of de nition 15 7 must contain at least 0 and Q and it must be closed under complements countable unions and countable intersections Note that even if Q is uncountable for example 9 R 29 still satis es the axioms for a a eld lf 9 is a topological space eg Rd the standard a eld is the Borel a eld which is the one generated by all the open sets of Q Measure theory analysis Probability The pair 97 is called a measurable space Same This is an unfortunate misnomer since it may not be possible to put a measure on it 7 in which case we would certainly think of it as unmeasurable Measure theory analysis Probability Elements of 7 are called measurable sets This is also a misnomer because we havenlt de ned measures yetl Measure theory analysis An event is nothing more than a measurable set Probability A measure is a function p z 7 A 0oo with the following properties 0 M0 0 o For all A E 7 MA 2 0 o If 141142 is a nite or countable sub set of 7 with the Ails all pairwise dis joint then PUiAl This is the countable additivity property of the probability measure M A probability measure is a measure with the additional requirement that 6 1 Thus we have M 7 A 01 39 Note that if Q is nite or countably in nite it is possible to de ne a measure on the biggest a eld on 9 namely 7 29 lf 9 R then it is not possible to de ne a measure on f 29 See Fol for a proof Measure theory analysis Probability A measure space is a triple Q u where u is a measure on 7 Measure theory analysis A probability space is a triple 97 P where P is a probability measure on 7 Probability A measurable function is a function f from one measurable space 9 to another measurable space 119 such that the preimage under f of each measurable set in 11 is a measurable set in 9 That is for all B 6 Q f 1B e f Measure theory analysis A random variable is a measurable function X from a probability space 51 P to a measur able space 11 Q For this course that measurable space has been 11 R with Q being the Borel sets in R Probability Expectation dPoJ Remark C1 In an undergraduate context we use the terms probability density function and cumulative Expectation EX distribution function In a graduate context we use the following 0 We have a realvalued random variable X That is we have X Q A R where Q 7 P is a probability space and the measurable space is R 3 namely the reals with the Borel aalgebra o This gives us a probability measure uX on R by uX B PX 1 This is called simply the distribution of X 0 We can then measure the particular Borel sets 700 This the GDP is called the distribution function of X o If duX is absolutely continuous with respect to Lebesgue measure ie dux fX dz where fX is the familiar PDF we say fX is the density of X 40 This de nes a function FX uX7oo Measure theory analysis Probability The Laplace and Fourier transforms respectively ofszHlRare wt R fltzgt dt ma R fltzgt dt xxx 829 cty of nite measures xxx three kinds of measures discrete cts singularl xxx 95 and 924 MCTXZ DCTXZ Fub Fatou The momentgenerating and characteristic func tions respectively of a realvalued random vari able X are Mxt EetX and BXQ Ean Let M be the distribution of Xi We say that the momentgenerating and characteristic func tions are the Laplace transform and Fourier transform respectively of the measure 2 LMemdu and fMem du R R If the distribution of X has a density function fz ilei db Where db is absolutely continuous With respect to Lebesgue measure dz then these are Mxlttgt Ele Xl fzdz wt and gm Eei X R em 161 dz Hwy XXX cvgce aislWipll ilpl iimlsl in dist 10 29 E51031 avorsl XXX 914 Borel Cantellil XXX 926 a eld indepl xxx 928 prod meas xxx 1019 Kolm 01 and tail eld 41 D A proof of the inclusionexclusion formula Proposition Inclusionexclusion formula Let A1 i i i An be events Then Pu1A Z PA7 Z PAi Aj Z PAmAj Ak7i 71 1PA1 u Ani lsisn 15iltjsn lsiltjltksn Proof The proof is by strong induction For the base case n l the lefthand side is PA and the righthand side is also PAi A bonus case n 2 is not necessary but helps to illustrate what s going on The formula is PAU B PA PB 7 PA B This is easy to prove using Venn diagrams and finite additivity of P on the disjoint sets AB A N B and B A I am not including a picture in this note The point though is that if A and B overlap then A N B is counted twice overcounted in PA N B so we need to subtract off PA N B to compensate Now for the induction stepi Suppose the inclusionexclusion formula is true for 12 l i i n 7 l welll only need 2 and n 7 l and show it s true for n Notationally this is a mess llll do the n 3 case since it is easier to understand This will illuminate how to proceed in the messier general case For n 3 we are asked to show that PAUBUC PAPBPC7PA B7PA C7PB CPA B Ci Since we want to use induction we can try to isolate C from A and B We can write PA U B U C PA PB PC 7PAmB 7PAmC 7PBmC PA O B 0 We have PA PB 7 PA B PA U B by the induction hypothesis at n 7 l 2 By isolating terms with C we have found terms involving one fewer set For the moment to make things a little clearer write X A U B Then we need to show that PXUC PXPC7PA C7PB CPA B Ci Using the induction hypothesis for the two sets X and C we know that PX U C PX PC 7 PX W 0 Looking at these last two equations 7 the first of which we need to prove and the second of which we already know is true 7 we see that we7ll be done if only we can show that 7PA C7PB CPA BoC 7PXo0i 42 Toggling the negative signs this is PXmC PA CPB C 7PAmB no I put X A U B only for convenience Ilm done with it nowi The statement Ineed to prove is PAUB n0 PA CPB C 7PA B no The trick is that AmBmCAmCmBmC and AUBWC A CUBmCi That is I distribute the C7si So the statement I need to prove is PA C UB C PA CPB NC 7 PA C 7 B 70 Now we again have one fewer set involved this is the inclusionexclusion formula for the two sets A N C and B Ci Thus this statement is true by the induction hypothesis And that was the last thing we needed to prove Now guided by the n 3 case we can con dently wade into the morass of subscripts which is the induction step We are asked to show that Pu1A Z PAi7 Z PAi Aj Z PAmAj Aklt 71 1PA1 u Ani lsiin 1 iltj n lsiltjltk n Since we want to use induction we can try to isolate An from the others We can write PU11Ai UAn E1gign71PAi PAn Eigiqgna PAi A1 Elgignil PAi H An El iltjltkgnil PAi Aj Ak El iltjinil PAi A1 n An 71 PA1 m i H m Ana 71n 2151971 PA1 m i H m A m i H 7 Ana 7 An 71 1PA1 m i H O An where the notation means omit the Ai from the intersection We have Pu11A Z PA7 Z PAi Aj Z PAi Aj Akltii71 PA1 H An1 13157171 1 iltj nil 15iltjltk5n71 43 by the induction hypothesis at n 7 1 By isolating terms with An we have again found terms involving one fewer seti As above for clarity temporarily write X UL Then we need to show that PXUA 7 PXPAn 7 Z PA An 1957171 Z PAmAijn 15iltj n71 H71 2 PA1 ui Ai ui An1 An 15157171 71 1PA1 O i H An Using the induction hypothesis for the two sets X and An we know that PX U An PX PAn 7 PX An Looking at these last two equations 7 the rst of which we need to prove and the second of which we already know is true 7 we see that we7ll be done if only we can show that 7 Z PA An 15157171 7 Z PA n A An 15iltj n71 H71 2 PA1 m flmm An1 An 15157171 4771 1PA1 m i H O A 7PX An Toggling the negative signs this is PX n An 7 Z PA n An 15157171 7 Z PA ml n An 15iltj n71 m 771 Z PA1mm A mmAnl An 15i5n71 771 1PA1 O i H An Note that 771 is the same as flyi li So we need to show PXn An 7 Z PA An 1515771 7 Z PA m1 mAn 13iltjsn71 i H H71 1 2 PA1 O i i i r721 O i H O Ana An 15157171 71 PA1 m i i i NA 44 As before I don7t need to write X Uzi anymore The statement I need to prove is PltltufAigtmAngt Z PltAmAngt lgignil 7 Z PAmAijn 13iltj n71 m 71 1 Z PA1mm 21mm An1 An lgignil 71 PA1 m i i i O A7 The distribution tricks are Al mmAnAlmAnmeAwl An and AlumUAn1 AnAl AnUmUAn1 An So the statement I need to prove is PU11AmAn Z POLMn lgignil 7 Z PAi An Aj An 15iltjsn71 m 71W1 Z PA1minmmmAmAnwmmAwmAn 1ltiltn71 71 PAl An N i H O An1 Ani Now we again have one fewer set involved this is the inclusionexclusion formula for the n71 sets A1 An through An71 Ani Thus this statement is true by the induction hypothesis Since that is all that remained 1 to be shown we are done 45 References F01 Folland GiBi RealAnalysis Modern Techniques and Their Applications 2nd ed WileyInterscience 1999 FG Fristedt B and Gray L A Modern Approach to Probability Theory Birkhauser 1997 GS Grimmett G and Stirzaker D Probability and Random Processes 3rd ed Oxford 2001 Kennedy Kennedy T Math 564 Course at University of Arizona spring 2007 KK Kaplan Mi and Kaplan E Chances Are 39 Adventures in Probability Penguin 2007 MM Moore 138 and MCCabe GPi Introduction to the Practice of Statistics Freeman and Co 2005 Roy Royden HiLi Real Analysis 2nd ed MacMillan 1968i Rud Rudin Wi Principles of Mathematical Analysis 3rd ed MCGraWHill 1976i 46 47 Index A almost surely i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 27 B Bayes7 theorem i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 Bernoulli distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i 6 bimodal i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 14 binomial distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 6 bivariate normal distribution i i i i i i i i i i i i i i i i i i i i 24 Borel a eld i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 39 C Cauchy distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 15 CDF i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 14 central limit theorem 1 i 28 change of variables 1 i i i i i i i i i 22 characteristic function H 12 22 41 Chebyshev7s inequality i i i i i i i i i i i i i i i i i i i i i i i i i i 26 coarsest i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 4 conditional density i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 23 conditional expectation i i i i i i i i i i i i i i i i i i i i i i i 9 23 conditional PMF i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 9 conditional probability i i i i i i i i i i i i i i i i i i i 5 34 H29 H14 28 con dence intervals i i i i i i 1 continuous random variable i convergence in distribution 1 i convergence in mean square i i i i i i i i i i i i i i i i i i i i i 26 convergence in probability i i i i i i i i i i i i i i i i i i i i i i 26 convolution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 13 correlation coef cient i i i i i i i i i i i i i i i i i i i i i i i i i i i 24 countable additivity i i i i i i i i i i i i i i i i i i i i i i i i i i 4 39 covariance i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 24 cumulative distribution function i i i i i i i i i i i i i i i i 14 D density i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 40 discrete random variable i i i i i i i i i i i i i i i i i i i i i i i i i 6 disjoint i i i i i i i i i i i i i i i i i i i i 4 distribution 1 i i 40 distribution function i i i i i i i i i i i i i i i i i i i i i i i i i i i i 40 E event i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 4 39 event space i expectation i expected value i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 8 18 experiment i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 4 exponential distribution i i i i i i i i i i i i i i i i i i i i i i i i i 15 48 F factorial i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 17 factors i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 11 nest i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 4 Fourier transform i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 41 G gamma distribution i i i i i i i i i i i i i i i i i i i i i i i H 16 17 gamma function i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 17 geometric distribution 1 i i 7 given i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 5 I identically distributed i i i i i i i i i i i i i i i i i i i i i i i i i i i i 6 HD i i i i i i i i i i i i i i i i i i i i i 11 28 in distribution 1 in mean square 1 i i 26 in probability i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 26 Inclusionexclusion formula i i i i i i i i i i i i i i i i i i i i i 42 independent i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 5 10 20 integrate away i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 19 J Jacobian matrix i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 22 joint CDF i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 19 joint density i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i H 10 joint PDF i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 19 joint PMF i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 10 jointly continuous i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 19 L Laplace transform i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 41 law of large numbers strong H i 27 law of large numbers weak i i i i i i i i i i i i i i i i i i i i i 26 Law of the Unconscious Statistician 8 10 18 20 M marginal H i i i i i i i H 10 19 mean i i i i i i i i i i i i i H 6 814 1518 measurable function i i i i i i i i i i i i i i i i i i i i i i i i i i i i 40 measurable sets i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 39 measurable space i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 39 measurable subset i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 4 measure i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 39 measure space i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 40 median i i i i i i i i i i i i i 14 method H 17 21 MGF 12 mode i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 14 momentgenerating function i i i i i i i i i i i H 12 22 41 N negative binomial distribution i i i i i i i i i i i i i i i i i i i i 7 normal distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i H 16 normalization i i i i i i i i i i i i i i i i i i i i i i i i i i i 28 normalize i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 25 O outcome i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 4 P pairwise independent i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 5 partition theorem i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 5 9 PDF i i i i i i i i i i i i i i i i i i i i i i i i 14 40 PMF i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 6 Poisson distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 7 population i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 28 population mean i i i i i i i i i i i i i i i i i i i i i i i i i i i i 11 28 population variance i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 11 preimage i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 6 probability density function i i i i i i i i i i i i i i i i i 14 40 probability mass function i i i i i i i i i i i i i i i i i i i i 16 probability measure i i i i i i i i i i i i i i i i i i i i i H4 39 40 probability space i i i i i i i i i i i i i i i i i i i i i i i i i i i H 4 40 R random variable i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 6 40 range i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 22 S sample i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 26 28 sample mean i i i i i i i i i i i i i i i i i i i H 11 21 26 28 31 sample space i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 4 39 a eld i i i i i i i i i i i i i i i i i i i i i i i 4 39 standard deviation i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 8 standard normal distribution 1 i i i i i 16 standardization i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 28 31 strong laW of large numbers i i i i i i i i i i i i i i i i i i i i i 27 U uncorrelated i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 11 uniform distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i 15 V variance i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i H6 8 1518 W weak laW of large numbers i i i i i i i i i i i i i i i i i i i i i i 26 With probability one i i i i i i i i i i i i i i i i i i i i i i i i i i i i 27 Z zscore i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 29 31 49

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.