### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Class Note for MATH 564 at UA

### View Full Document

## 18

## 0

## Popular in Course

## Popular in Department

This 61 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Arizona taught by a professor in Fall. Since its upload, it has received 18 views.

## Popular in Subject

## Reviews for Class Note for MATH 564 at UA

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Notes for probability and statistics John Kerl February 4 2009 Abstract This is my primary reference for probability and statistics I include What I feel to be the most important de nitions and examples Probability content is taken from Dr Tom Kennedy s splendid lectures for Math 564 probability at the University of Arizona in spring of 2007 That is these are except for my appendices simply my handWritten class notes With some examples omitted for brevity made legible and searchable Statistics content is merged from Dr Rabi Bhattacharya s 567A theoretical statistics course in spring 20087 Casella and Berger s Statistical Inference7 and my oWn attempts to create a uni ed prob abilitystatistics Vocabulary and example base The statistics content is as of this Writing under con struction Contents Contents 2 1 What s the difference 5 2 Probability 7 21 Events i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 7 21111 Fundamental de nitions i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 7 21112 Conditioning and independence i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 8 2 2 Discrete random variables i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 9 2121 De nitions i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 9 2122 Catalog of discrete random variables i i i i i i i i i i i i i i i i i i i i i i i i i i i i 9 21213 Expectations i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 11 23 Multiple discrete random variables i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 13 2131 De nitions i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 13 21312 Expectations i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 13 2133 Independence i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 13 2134 Sample mean i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 14 21315 Momentgenerating functions and characteristic functions i i i i i i i i i i i i i i i i 15 21316 Sums of discrete random variables i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 15 24 Continuous random variables i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 17 2411 De nitions i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 17 2412 Catalog of continuous random variables i i i i i i i i i i i i i i i i i i i i i i i i i i 18 21413 The normal distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 20 21414 The gamma distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 20 21415 Functions of a single random variable i i i i i i i i i i i i i i i i i i i i i i i i i i i 20 21416 Expectations i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 21 25 Multiple continuous random variables i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 22 2511 De nitions i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 22 2512 Independence i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 23 21513 Expectations i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 23 254 The HD paradigm Sn and 7 i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 24 2 55 Functions of multiple random variables i i i i i i i i i i i i i i i i i i i i i i i i i i 24 2 56 Momentgenerating functions and characteristic functions i i i i i i i i i i i i i i i i 25 25 Change of variables i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 25 2 58 Conditional density and expectation i i i i i i i i i i i i i i i i i i i i i i i i i i i i 26 259 The bivariate normal distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 27 2510 Covariance and correlation i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 27 26 Laws of averages i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 29 261 The weak law of large numbers i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 29 262 The strong law of large numbers i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 30 263 The central limit theorem i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 31 2 64 Con dence intervals i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 31 2 Stochastic processes i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 33 271 Die tips i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 33 2 72 Coin ips i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 33 2 73 Filtrations i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 33 2 74 Markov processes i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 33 2 75 Martingales i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 33 3 Statistics 34 31 Sampling i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 311 Finitepopulation example i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 3 12 In nitepopulation example i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 32 Decision theory i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 3 3 Parameter estimation i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 36 3 31 Maximumlikelihood estimation i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 36 3 32 Method of moments i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 36 3 33 Bayes estimation i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 36 3 34 Minimax estimation i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 38 A The coin ipping experiments 39 All Single coin ips i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 39 At Batches of coin ips B Bayes7 theorem Bil Algebraic approach i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i B Graphicalnumerical approach i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i B Asymptotics i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 134 Conclusions i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i C Probability and measure theory 01 Dictionary i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i C Measurability i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i C Independence and measurability i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i D A proof of the inclusionexclusion formula References Index 43 43 43 45 46 48 48 50 50 54 58 59 1 What s the difference In probability we start with a model describing what events we think are going to occur with what likeli hoods The events may be random in the sense that we don7t know for sure what will happen next but we do quantify our degree of surprise when various things happen The standard example is ipping a fair coin Fair means technically that the probability of heads on a given ip is 50 and the probability of tails on a given ip is 50 This doesn7t mean that every other ip will give a head 7 after all three heads in a row is no surprise Five heads in a row would be more surprising and when you7ve seen twenty heads in a row youlre sure that something shy is going on What the 50 probability of heads does mean is that as the number of ips increases we expect the number of heads to approach half the number of ips Seven heads on ten ips is no surprise 700000 heads on 1000000 tosses is highly unlikely Another example would be ipping an unfair coin where we know ahead of time that there7s a 60 chance of heads on each toss and a 40 chance of tails A third example would be rolling a loaded die where for example the chances of rolling l 2 3 4 5 or 6 are 25 5 20 20 20 and 10 respectively Given this setup you d expect rolling three 17s in a row to be much more likely than rolling three 27s in a row As these examples illustrate the probabilist starts with a probability model something which assigns various percentage likelihoods of different things happening then tells us which things are more and less likely to occur Key points about probability 0 Rules 7 data Given the rules describe the likelihoods of various events occurring 0 Probability is about prediction 7 looking forward 0 Probability is mathematics The statistician turns this around 0 Rules lt7 data Given only the data try to guess what the rules were That is some probability model controlled what data came out and the best we can do is guess 7 or approximate 7 what that model was We might guess wrong we might refme our guess as we get more data 0 Statistics is about looking backward 0 Statistics is an art It uses mathematical methods but it is more than math 0 Once we make our best statistical guess about what the probability model is what the rules are based on looking backward we can then use that probability model to predict the future This is in part why I say that probability doesn7t need statistics but statistics uses probability Here s my favorite example to illustrate Suppose I give you a list of heads and tails You as the statistician are in the following situation 0 You do not know ahead of time that the coin is fair Maybe you7ve been hired to decide whether the coin is fair or more generally whether a gambling house is committing fraud 0 You may not even know ahead of time whether the data come from a coin ipping experiment at all Suppose the data are three headsi Your rst guess might be that a fair coin is being ipped and these data donlt contradict that hypothesis Based on these data you might hypothesize that the rules governing the experiment are that of a fair coin your probability model for predicting the future is that heads and tails each occur with 50 likelihood If there are ten heads in a row though or twenty then you might start to reject that hypothesis and replace it with the hypothesis that the coin has heads on both sides Then you d predict that the next toss will certainly be heads your new probability model for predicting the future is that heads occur with 100 likelihood and tails occur with 0 likelihoodi If the data are heads tails heads tails heads tails then again your rst faircoin hypothesis seems plausible If on the other hand you have heads alternating with tails not three pairs but 50 pairs in a row then you reject that model It begins to sound like the coin is not being ipped in the air but rather is being ipped with a spatula Your new probability model is that if the previous result was tails or heads then the Next result is heads or tails respectively with 100 likelihoodi 2 Probability 21 Events 211 Fundamental de nitions De nitions 21 When we do an experiment7 we obtain an outcome The set of all outcomes conven tionally written 9 is called the sample space Mathematically7 we only require 9 to be a set De nition 22 An event is7 intuitively7 any subset of the sample space Technically7 it is any measurable subset of the sample space Example 23 The experiment is rolling a 6sided die once The outcome is the number of pips on the top face after the roll The sample space 9 is l23457 6 Example events are the result of the roll is a 377 and the result of the roll is odd De nition 24 Events A and B are disjoint if A B 0 De nition 25 A collection 7 of subsets of Q is called a a eld or event space if 0 E 7 f is closed under countable unions7 and f is closed under complements Note in particular that this means 9 E f and f is closed under countable intersections Remark 26 There is a super cial resemblance with topological spaces A topological space X has a topology T which is a collection of open77 subsets that is closed under complements7 nite rather than countable intersection and arbitrary rather than countable union Examples 27 The smallest or coarsest a eld for any 9 is 09 the largest or nest is 297 the set of all subsets of Q There may be many a elds in between For example7 if A is any subset of 9 which isn7t 0 or Q then one can check that the 4 element collection 0A7A079 is a a eld De nition 28 A probability measure P is a function from a a eld f to 01 such that o For all A E f PA 2 0 0 139 l and 130 0 o If A17 A27 7 is a nite or countable subset of f with the Ai7s all pairwise disjoint7 then PuA Z PA This is the countable additivity property of the probability measure P Remark 29 For uncountable Q 20 is a a eld7 but it is impossible to de ne a probability measure on it it is too big Consult your favorite textbook on Lebesgue measure for the reason why For nite or countable Q on the other hand7 20 is in fact what we think of for 7 De nition 210 A probability space is a triple 97 P of a sample space 9 a a eld f on Q and a probability measure P on 7 Remark 211 Technically7 a probability space is nothing more than a measure space 9 with the additional requirement that 139 l 212 Conditioning and independence De nition 212 Let A7 B be two events with PB gt 0 Then we de ne the conditional probability of AgivenBtobe PA B PltAwBgt Mnemonic intersection over given Notation 213 We will often write PAB in place of PA N B De nition 214 Two events A and B are independent or pairwise independent if PA O B PAPB Mnemonic write down PA PA B PA BPB and clear the denominators Remark 215 This is not the same as disjoint If A and B are disjoint7 then by countable additivity of P7 we have PA U B PA PBl De nition 216 Events A17 An are independent if for all I Q 127 n P ieIAi H PAi ieI Mnemonic We just look at all possible factorizations Example 217 Three events A7 B7 and C are independent if PA B PAPBPA C PAPCPB C PBPC7 and PA B C PAPBPC Theorem 218 Partition theorem7 or law of total probablity Let be a countable partition of and let A be an event Then PltAgt ZPltABgtPltBgt Proof Note that PA Z PA n B since the Bils partition 9 D 22 Discrete random variables 221 De nitions De nition 219 A random variable X is a function X 9 7gt R We say X is a discrete random variable if its range7 XQ7 is nite or countable Example 220 Roll two 6sided dice and let X be their sum De nition 221 Given a random variable X7 the probability mass function or PMF of X7 written or is PX z PX 1I This is the probability that X s value is some speci c real number 1 Note that X 1z7 the preimage of z is an event and so we can compute its probability using the probability measure P De nition 222 Let X1 and X2 be two random variables on two probability spaces Qlf1P1 and 92f2 P27 respectively Then X1 and X2 are identically distributed if X1 X2z for all z E R Remark 223 ldentically distributed discrete random variables X and Y have the same PMF Remark 224 Just as with events7 which have a general measuretheoretic de nition de nition 227 there is also a general measuretheoretic de nition for random variables they are simply measurable functions from a probability space to a measurable space 222 Catalog of discrete random variables Note mean and variance are de ned in section 223 They are included here for ready reference Bernoulli DRV 0 Parameter p E 01 0 Range X 01 PMF PX0pPXll7p 0 Example Flip a p weighted coin once 0 Mean 1 7 p o Variance pl 7 p Binomial DRV 0 Two parameters p E 01 and n E Z o RangeX 012n PMF PX k gpk 7 p k 0 Example Flip a p weighted coin n times X is the number of heads 0 Mean np o Variance npl 7 p Remark 225 There is a trick to see that the sum of probabilities is l 7 recognize the sum of probabilities as the expansion of p 1 i 10 using the binomial theorem 11 ltplt1epgtgtn Z 910 710w k0 Poisson DRV 0 Parameter A gt 0 0 Range X 012 PMF PX k gee 0 Example Limiting case of binomial random variable With large n small p and A np m l 0 Mean A o Variance A Geometric DRV 0 Parameter p 6 01 0 Range X l23 PMF PX k pl 7pk 1 Example Flip a pweighted coin until you get heads X is the number of ips it takes Note some authors count the number of tails before the head 0 Mean lp Variance l 7 pp2 Negative binomial DRV 0 Two parameters p 6 01 and n l 2 3 H o RangeX nnln2 PM W k 3pm 7 mm 7 0 Example Flip a iiiweighted coin until you get n heads X is the number of ips it takes Note Deriving the factor is a nontrivial counting problem it is deferred until later in the course Mean np o Variance nl PVPQ 10 223 Expectations De nition 226 Functions of a discrete random variable If X 9 7gt R and g R 7gt R then gX 9 7gt R is another random variable Let Y gX To nd the PMF of gX given PMF of X write the latter as fX PX Then Pamw Z HX7m 2 an 169404 169 1y De nition 227 Let X be a discrete random variable with PMF fX lf Zmamltw ie if we have absolute convergence then we de ne the expected value also call expectation or mean of X to be EX 2mm ZIPX z a Mnemonic This is just the weighted sum of possible X values7 weighted by their probabilities Theorem 228 Law of the Unconscious Statistician Let X be a discrete random variable andg R 7gt R Let Y gX If ZMkaltw then ElYl ZQWVXWl De nition 229 The variance of X7 written 02X or VarX7 is UWXFJWX7EWWl De nition 230 The square root of the variance is the standard deviation7 written 0X or 0X Proposition 231 Let n Then 02X EX2 7 EX2 Proof FOIL7 and use linearity of expectation D Proposition 232 VarcX c2VarX Proof VarcX EC2X2 7 ECX2 62EX2 7 52EX2 c2 EX2 7 EX2 C2VarX Theorem 233 Let X be a discrete random variable let ab E R Then 239 EaX b aEX b 11 If PX b 1 then EX b fa S X S b then a S EX S b iv Ifgh R A R andgXhX have nite means then EgX hX EgX EhX De nition 234 Let X be a discrete random variable and let B be an event The conditional PMF of X given B is fIlB PX IlB The conditional expectation of X given B is mmm juwwgt 1 provided as usual that the sum converges absolutelyi Theorem 235 Partition theorem7 or law of total expectation Let be a countable partition of Q and let X be a random variable Then anzmmmmmi 12 23 Multiple discrete random variables 231 De nitions De nition 236 We de ne the joint PMF or joint density of X and Y to be fXYIy PX LY 9 Proposition 237 For A Q R2 we have 179 E A Z fXYIvy acy6A Corollary 238 Let gzy R2 A R Let X and Y be discrete random variables and let Z gXY Then 1622 PZ 2 Z mam 269 1wy We can use joint densities to recover marginal densities Corollary 239 Let X and Y be discrete random variables with joint density ny Then fXI ZfXJLy and fYy foy w y as 232 Expectations Theorem 240 Law of the Unconscious Statistician Let X and Y be discrete random variables and let gzy R2 A R Let Z gXY Then ElZl ZQvaleYval 1W Proof As in the singlevariable case theorem 240 1 Corollary 241 Let X and Y be discrete random variables and let ab 6 R Then EaX bY aEX bEY Proof Use gz7 y az by7 and use the theorem twice D 233 Independence Recall de nition 214 of independent events A and B are independent if PA N B PAPB We use this to de ne independence of discrete random variables De nition 242 TWO discrete random variables X and Y are independent if7 for all z and y PX LY y PX zPY Using PMF notation7 we say X and Y are independent if faxLy fXIfYy for all z and y ie if the joint PMF factors 13 Notation 243 We often abbreviate independent and identically distributed de nitions 242 and 222 as IIDl Question given only the joint density of X and Y7 can we tell if X and Y are independent From corollary 2397 we can recover the PMFs of X and Y fXI ZfXY17 y and fYy foy yyl Then we can multiply them back together and see if we get the joint density backl Point o If X and Y are independent7 then we can go from marginals to joint PMFs by multiplying 0 We can always go from joint PMFs to marginals by summing as above Theorem 244 IfX and Y are independent discrete random variables then EXY Theorem 245 IfX and Y are independent discrete random variables and gh R A R then gX and hY are independent discrete random variables Corollary 246 In particular ElyXhYl ElyXlElhYl provided gX and hY have nite mean Theorem 247 IfX and Y are independent discrete random variables then VarX Y VarX VarY Proof Use de nition 229 and theorem 244i 1 De nition 248 We say that X and Y are uncorrelated if EXY Remark 249 lndependent implies uncorrelated but not vice versa Remark 250 Theorem 24 holds for uncorrelated discrete random variables 234 Sample mean De nition 251 Let X17 Xn be independent and identically distributed Think of multiple trials of the same experiment Since each Xi has the same expectation7 we call their common mean the population mean and denote it by a Likewise we call their common variance the population variance and denote it by 02 Let Y Ei1 Xi in i This is a new random variable7 called the sample mean of X17 Xn l4 By linearity of expectation Recall from proposition that the variance scales as VarcX c2VarX Also since the Xils are indepen dent their variances add Thus 2 if 21Xi fl n 7n2ia VarX7Varlt n in2Var 2X1 in2a 7 n 235 Momentgenerating functions and characteristic functions De nitions 252 Let X be a discrete random variable Then the momentgenerating function or MGF of X is Mm maxi By the Law of the Unconscious Statistician theorem 228 this is Mxt 26111541 1 Likewise the characteristic function of X is BXO Elei Xl Which is amzwam a Remark 253 These functions are just computational tricks There is no intrinsic meaning in these functions Proposition 254 Let MX t be the momentgenerating function for a discrete random variable X Then d k mmWm mm t0 Proposition 255 IfX and Y are identically distributed they have the same PMF remark and thus they also have the same MGF Proposition 256 Let X and Y be independent discrete random variables and let Z X Y Then Mzt MxtMyt 236 Sums of discrete random variables Momentgenerating functions are perhaps a better approach than the following Let X and Y be discrete random variables and let Z X Y We can nd the PMF of Z by fzzPZzPXYz Z fXyryZ Z fXyryZfXyzziz zyz 1 311771 15 Now further suppose that X and Y are independent Then fXY172 I fXIfYZ I SO m2 foltzgtfyltz 7 ac This is the convolution of fX and fyi Note that7 as always in convolutions7 we are summing over all the ways in Which I and y can add up to 2 16 24 Continuous random variables 241 De nitions A continuous random variable7 mimicking de nition 2197 is a function from Q to R with the property that for all z E R7 PX z 0 Thus7 the PMF which we used for discrete random variables is not useful lnstead we rst de ne another function7 namely7 PX S De nition 257 The cumulative distribution function or GDP for arandom variable whether discrete or continuous is FXI PX S Theorem 258 The CDF FX for a random variable satis es the following properties 0 0 S FX S l and FX is nondecreasing o limgcnDo FX 0 and limggnur00 FX l 0 FX is rightcontinuous the sense from introductory calculus De nition 259 A random variable X is a continuous random variable if there exists a function fX I7 called the probability density function of PDF7 such that the CDF FX is given by fX dz Remark 260 Note that the integral is done using Lebesgue measure or7 for the purposes of this course7 Riemann integration If we allow counting measure and require countable range7 then we can subsume discrete random variables into this de nition However7 that is outside the scope of this course Remark 261 Some random variables are neither discrete nor continuous these appear for example in dynamical systems Remark 262 The PDF and CDF are related as follows making use of the second fundamental theorem of calculus FXI 5 fxtdt Remark 263 The PDF of X is the probability that X lies in some interval Eg 1 Po 3 X g b MIMI Thus PDFs are nonnegative and have integral 1 Remark 264 We can now neatly de ne some terminology from statistics Namely 0 Let X be a random variable from R to R with CDF FX and fX o The mean of X is the expectation value EX as de ned below 0 The median of X is F 105 o A mode of X is a local maximum of fX If the PDF has two local maxima7 we say that X is bimodal If the PDF has a single maximum7 we call it the mode of X 17 242 Catalog of continuous random variables Note mean and variance are de ned in section 246 They are included here for ready reference I thought about including graphs of the PDFs and CDFs7 but instead I Will refer you to the excellent Wikipedia article on Probability distn39bution and the pages linking from there Uniform CRV 0 Parameters a lt b 0 Range m b 0 PDF 0 CDF Mean a 102 Variance b 7 a212 Exponential CRV 0 Parameter A gt 0 0 Range X 000 0 PDF 0 CDF Mean lA Variance lA2 Cauchy CRV o No parameters 0 Range X 70000 0 PDF 1 my a S I S b fX 07 elsewhere 07 z lt a FXI a zltb 07 12 1 Ae z 2 0 fXz 0 zlt0 17 6 z gt 0 F 7 Xm 0 z lt 0 fXI 18 CDF 71 FXltIgt tan Mean in nite Variance in nite Normal CRV see section 243 Parameters M E R a gt 0 Note with M 0 and a 1 we have the standard normal distribution One says it has zero mean and unit van39zmce Range X 70000 PDF 2 1 71 z 7 M W a 7ltagt CDF No closedform expression Related to erfz but not quite the same The standard normal CDF is 1 erfz Note that some computing systems may have erfz but not normalcdf7 or Vice versa Thus this conversion formula comes in handy Mean M Variance 02 Gamma CRV see section 244 Parameters A gt 07 w gt 0 Range 1 2 0 PDF All NW zwileim z gt 0 fXI CDF TBD Mean wA Variance wA2 Beta CRV Parameters a gt 07 B gt 0 Range 1 6 01 PDF ID 1 7 I Bmm CDF TBD Mean aa Variance 16 19 243 The normal distribution De nition 265 The normal distribution has two parameters real M and positive a A random variable with the normal distribution has PDF fzal explt1 gt2 Remark 266 The normal distribution has mean M and variance 02 De nition 267 With M 0 and a l we have the standard normal distribution 244 The gamma distribution De nition 268 The gamma function is de ned by 00 lquotw zwileixdz 0 Remark 269 Integration by parts shows that lquotn n 7 ll Thus the gamma function is a generalized factorial function De nition 270 The gamma distribution has two parameters Aw gt 0 A random variable with the gamma distribution has PDF Aw 1071 7M f WI 6 z 2 0 07 zlt0 Remark 271 With w l we have an exponential distribution Thus the gamma distribution generalizes the exponential distribution Proposition 272 The gamma distribution has mean wA 245 Functions of a single random variable Given a continuous random variable X and 91 R A R we have a new continuous random variable Y gX Often one wants to nd the PDF of Y given the PDF of X The method is to rst nd the CDF of Y7 then differentiate to nd the PDF Example 273 Let X be uniform on 01 and let Y X2 The CDF of Y is W PY yPX2 yPX O Maia3 ie 07 ylt0 Fyy 0ylt1 l7 lgy Then the PDF is 07 ylt0 My i 0ylt1 07 lgy The critical step in the method is going from PX2 S y to PX S 20 246 Expectations De nition 274 Let X be a continuous random variable with PDF fX lf 00 ie if we have absolute convergence then we de ne the expected value also call expectation or mean of X to be 00 EX dz foo Mnemonic As in the discrete case7 this is just the weighted sum of possible X values7 weighted by their probabilities De nition 275 Let X be a continuous random variable The variance of X is mweW where n By the corollary to the Law of the Unconscious Statistician below7 VarX EX2 7 EX2 Theorem 276 Law of the Unconscious Statistician Let X be a continuous random variable and g R A R Let Y gX If 00 mmaWMltw 700 then EY Jr 91 dz lt 00 700 Corollary 277 IngL R A R and ab 6 R then ElayX bhXl aEyXl bEhXlA 21 25 Multiple continuous random variables 251 De nitions De nition 278 Given two continuous random variables X and Y7 we de ne their joint CDF to be FXYIy PX S LY S y De nition 279 Two random variables X and Y are jointly continuous if there exists a function fxyr y their joint PDF such that T y FXYWW fXYuv dvdu The joint PDF is thought of as 051 yd PaXbcYd fxyltzygtdydz 111 and more generally for A Q R2 1317 y E A Afxyzy dydz The joint CDF and joint PDF are related by 62 fXYIy azayFXJLy FXYIvy fuvdvdul Given a joint CDF FXyzy we may recover the marginal CDFs FX and Fyy by FXltIgt lim FXyzy ygtoo and similarly for Fy How do we recover the marginal PDFs given the joint PDFs To derive the formula7 differentiate the CDF d d FX diFXJI 00 I d EL j fXyuv du d1 if 00 ny I 1 d1 fXI 00 fXyuv dv du 700 700 ie the formula is Mr foo 10mm dy 700 That is7 we integrate away one variable 22 25 2 Independence De nition 280 Two continuous random variables X and Y are independent iff their CDFs factor FaxIvy FXIFYyA Remark 281 For discrete random variables7 we de ned independence de nition 242 in terms of the factoring of the PMFs for continuous random variables7 we de ne independence in terms of the factoring of the CDFs Factorization of PDFs does hold as shown in the following theorem7 but it is a consequence rather than a de nition Theorem 282 Two continuous random variables X and Y are independent i their PDFs factor braLy fXIfYy Proof 62 fXYIa y MFXXW y aayFXacF1yl aEIUFXW gummy fXIfYy 253 Expectations Theorem 283 Law of the Unconscious Statistician Let X and Y be continuous random variables and gzy R2 A R Let Z gXY If 00 00 l9lt179leYIydI dylt 00 then 00 00 EZ Do Do gzy fXyzy dzdy Corollary 284 Let X and Y be continuous random variables and let ab E R Then EaX bY aEX bEY Theorem 285 Let X and Y be independent continuous random variables and let 9 h R A R Then EgXhY EgXEhYl Corollary 286 Let X and Y be independent continuous random variables Then EXY Corollary 287 Let X and Y be independent continuous random variables Then VarX Y VarX VarY 23 Proof Computation D Remark 288 Expectations always add They multiply only when X and Y are independent Variances add only when X and Y are independent Theorem 289 Let X and Y be independent random variables and let gh R A R Then gX and hY are independent 254 The IID paradigm Sn and Xn Notation Let Xn be a sequence of HD random variables7 with common mean M and common variance 02 Wewrite n SnZXn i1 and 1 n X 7 X The latter is called the sample mean of the Xn s Mean and variance of Sn We already know Em E 213 W and using independence of the trials to split up the variance VarSn Var Xi iVarXZ n02 i1 Mean and variance of Xn Likewise EX7E1nX71nEXI71 7 n ni1 z in l l nnL M and using independence of the trials to split up the variance v 77V lix iiv ix iiiv M73270 ar n7 ax ni1 Z inQ ar i1 Z in2i1 ar ZinQ in 255 Functions of multiple random variables Let XY be continuous random variables and let gzy R2 A R Let Z gXY How do we nd the PDF of Z As in the univariate case section 2457 the method is to rst nd the CDF of Z7 then differentiate Need to type up the example from 321 Need to type up convolution notes for independent X7 Y from 323 fz2 7 zdz 24 256 Momentgenerating functions and Characteristic functions De nition 290 Much as in the discrete case de nitions 2527 00 MXt Ee X eanzdz and mt EequotX mm 700 Proposition 291 We have 239 MPH Em fY aX b then Myt eb MXat IfX and Y are independent then Mxyt MX tMyt Example 292 The standard normal random variable Z has tedious computations omitted here moment generating function Mzlttgt expMv The general normal random variable X u O39Z has momentgenerating function MX t exput 02t22 Proposition 293 Let XL Xn be independent normal random variables with means hi and variances 012 Then Y EXZ has normal distribution with mean EM and variance 2012 257 Change of variables Let X and Y be continuous random variables If T R2 A R2 is invertible7 sending U7 V TXY7 then fzydzdy fzuvyuv lJuvl dudv D TD Where Juv is the Jacobian matrix W 232 2322 Theorem 294 Let X and Y be jointly continuous random variables with PDF fxyI Let D 179 I faxLy gt 0 ie the range of XY Suppose T D A S Q R2 is 11 and onto De ne new random variables U and V by U7 V TX7 Y Then ny 1u7 v7 yu7 v lJu7 vl ifu7 v E S 0 otherwise fUVu7 v where Juv is as above 25 258 Conditional density and expectation Given two continuous random variables X and Y7 we want to de ne PXlY Since Y y is a null event7 the usual intersectionovergiven notion of conditional probability see section 212 will give us zero divided by zero Somewhat as in l Hopitalls rule in calculus7 we can nonetheless make sense of it De nition 295 Let XY be jointly continuous with PDF fXyzy The conditional density of X given Y is fXYTgty 0 fXYzly 0 f y 7 0 Remark 296 Recall that 100 fYy fXYIvydI 1700 so we can think of the conditional density of X given Y as being braLy prIly whenever the denominator is nonzero De nition 297 Let XY be jointly continuous with PDF ny The conditional expectation of X given Y is xoo EleY y zfxlymwz 1700 De nition 298 EXlY y is a function of y it must depend only on y Call it Then gY is a random variable7 which we write as EX This new random variable has the following properties Theorem 299 Let XY7 Z be random variables a7 b E R and g R A R Then 0 EalY a o EaX bZlY eEXlY bEZlY Note this is linearity on the left linearity on the right emphatically does not hold 0 IfX 2 0 then EXlY 2 0 o If XY are independent then EXlY Mnemonic Y gives no information about X o This is the partition theorem in disguise See below 0 EXgYlY gYEXlY Mnemonic given a speci c y gy is constant 0 Special case EgYlY gY 26 259 The bivariate normal distribution If X and Y are independent and standard normal7 then their joint PDF is 1292gt 10mm 7 1 lt7 2 o It is possible for X and Y to not be independent7 while their marginals are still standard normal In fact7 there is a lparameter family of such XY pairs De nition 2100 Let 71 lt p lt l The bivariate normal distribution with parameter p has PDF 71272pzyy2gt l f Ivy explt X BMW 20 732 Remark 2101 Note the following It is straightforward but tedious to verify that f f jcxgz7 y l 0 With p 0 we obtain equation as a special case 0 One may complete the square and use translation invariance of the integral to nd that the marginals are in fact univariate standard normals 0 Again completing the square7 one nds that fy Xylz is normal with n pm and a2 l 7 p2 2510 Covariance and correlation De nition 2102 Let X and Y be random variables with means MX and My respectively The covariance of X and Y is covltX Y EltX 7 my 7 m1 ElXYl WM 7 Emmy Mnemonic 2103 With Y X we recover the familiar formula de nition 275 for the variance of X EX2 7 EX2 Theorem 2104 Let X and Y be random variables Then VarX Y VarX 2CovX7 Y VarY Remark 2105 Bill Faris calls this the most important theorem in probability Corollary 2106 If CovX7 Y 0 then VarX Y VarX VarY Theorem 2107 IfX and Y are independent then CovX7 Y 0 The converse does not hold De nition 2108 Let X and Y be random variables with variances 0 and a respectively The corre lation coe icient of X and Y is Cov X Y X7Y aXaY Remark 2109 The covariance is quadratic in X and Y with respect to linear rescaling the correlation coefficient is scaleinvariant Theorem 2110 The correlation coe cient satis es 71 pXY 1 27 Remark 2 111 Equivalently CovX Y2 S VarXVarYl Proof lt suf ces to show lpX Yl S 17 Which is equivalent to showing lCovX7Yl S axayl This follows abstractly from the CauchySchwarz inequality lltfyygtl S Hfll Hall with f X iux andg Y iuyl Namely ltfyggt2 S M2 HQHQ ltf7fgtlt979gt ltQX MXY MYdPgt2 sQltXwXgt2dPQltwagt2dP ElX XY MY S ElX MX2l EKY MY2l CovXY2 S VarXVarYl Faris sketches another route which I complete herel Normalize X and Y as follows Let MX My 0X and O39y be their means and standard deviations respectively Then X i X Y My 7 and 7 O39X O39y each have zero mean and unit standard deviation In particular this will mean below that their second moments are 1 We can create a new pair of random variables 2 ltX MX i Y Mygt O39X O39y Since each takes nonnegative values the means are nonnegative as well xxx Xref forward to where this is proved H it seems obvious but actually requires proof 2 i M gt0 0X 039Y FOlLing out we have EltXMXgt2 i 2X7MXYW Chmgt2 gt01 aX axay UY Using the linearity of expectation and recalling that the normalized variables have second moments equal to l we have 2i 2EX MXY MY 2 0 UXO39Y 11 Memoem S 1 UXO39Y UXUY S E X MXY MYl S UXUY iaxay S CovX Y S aXay lCovXYl S aXayl 28 26 Laws of averages Here is a statistics paradigm Run an experiment n with HD random variables Xn whether continuous or discrete The ntuple X1 Xn is called a sample The average of X1 through Xn is called the sample mean written Xn it is also a random variable The big question is what does Xn look like as it gets large For example roll a 6sided die so it 35 a million times What is the probability of the event an 7 35 gt 001 One would hope this probability would be small and would get smaller as it increases We have two main theorems here 0 The law of large numbers says that X7 7 M although we need to de ne the notion of convergence of a random variable to a real number There are two avors of convergence weak and strong 0 The central limit theorem describes the PDF of X 261 The weak law of large numbers De nition 2112 Let Xn and X be random variables We say Xn 7gt X in probability if for all 5 gt 0 n13ng e 9 Xnltwgt7Xltwgt 2 5 70 More tersely we may write Pan7Xl257gt0 or Pan7Xllt57gtl Theorem 2113 Weak law of large numbers Let Xn be an ID sequence with common mean it and nite variance 02 Then n YnXiHM in probability Here is another notion of convergence De nition 2114 Let X7 and X be random variables We say X7 A X in mean square if EXn 7 X2 A 0 Theorem 2115 Chebyshev s inequality Let X be a random vaiiable with nite vaiiance Let a gt 0 Then 2 p XM2aSm a2 Remark 2116 This means Var X a2 FOX Ml 2a 3 Proof Use the partition theorem with two partitions on some asyetunspeci ed event A EltX 7 m 7 E ltX 7 a l A M E ltX 7 a l AC PM 29 Regardless of what A is7 the last term is nonnegative So we have EX 2 m 2 E ltX 2 a M PltAgt Now let A be the particular event that lX 7 M 2 a Then we have EX 02 l V EltX2igt2llx2m2aPltX2m2a E ltX2 i l X2 i 2 a2 Pltlx2m 2 a 2 a2PlX Ml 2 a Dividing through by a2 we have Pox 2 m 2 a ElX 2 a2 as desired D Theorem 2117 Convergence in mean square implies convergence in probability Proof Let 5 gt 0 We need to show 13an 7 Xl gt 5 A 0 By Chebyshev7s inequality7 P0X 2 Xl gt a 2 gm 2 X Remark 2118 Notes about Chebyshev7s inequality o It is used in the proof of the weak law7 which I am omitting o The bounds provided by Chebyshevls inequality are rather loose7 but they are certain The central limit theorem gives tighter bounds7 but only probabilistically 262 The strong law of large numbers Here is a third notion of convergence De nition 2119 We say Xn 2 X with probability one wp l or almost surely as if P w E Q nlLIgoXnw Xwgtgt 1 More tersely7 we may write PXn a X 1 Theorem 2120 Strong law of large numbers Let Xn be an ID sequence with common mean n Then Xn A a with probability 1 Theorem 2121 Xn 2 X wp 1 implies Xn 2 X in probability Remark 2122 This means the strong law is stronger than the weak law 30 263 The central limit theorem Motivation Let Xn be an HD sequence and let S EXn We know from section 254 that Emu mi and VarSn n02 We expect the PDF of Sn to be centered at mi with width approximately u Likewise7 if Xn EXnn then we know that n and VarXn UQn We expect the PDF of Sn to be centered at mi with width approximately a For various distributions one nds empirically that these PDFs look approximately normal if n is large De nition 2123 Let X be a random variable with mean M and variance 02 The standardization or normalization of X is X 7 M a ln particular7 if we standardize Sn we get Zn Sn 7 mi an with mean 0 and variance 1 l 1 n02 VarZn EVangn 7 mi Va Sn g l Likewise7 if we standardize Xn we get i X n 7 M a The central limit theorem says that the standardizations of Sn and Xn both approach standard normal for large n De nition 2124 Let I be the CDF of the standard normal 1 422 I V e dz Theorem 2125 Let Xn be an ID sequence with nite mean n and nite nanzero variance 02 Let Zn be the standardization of Sn as above Then PZn S I A z for all 1 Remark 2126 This convergence is called convergence in distribution xxx include some examples here 264 Con dence intervals Here is an application of the Central Limit Theorem to statistics The population is a random variable X A sample of size n is n llD copies of X The random variable X has a true population mean MX but we do not know what it is all we have is the sample mean Xn 221 Xkn7 which is an estimate of the population mean We would like to put some error bars on this estimate 31 We quantify this problem using the notion of con dence intervals We look for 5 gt 0 such that P in 7M4 2 a 0 05 or7 alternatively P 7 MX lt a 095i Five percent is a conventional value in statistics Using the CLT7 we treat 7 as being approximately normal We standardize it statisticians call this taking the 2score in the usual way Z7nm Yn MX a aX Then Pltl7n MXl 2530771 MX 26 7 lYn MXl i Plt ax w m 1322 oiosi ltls worth memorizing or you can compute it if you prefer that the standard normal curve has area 0195 for Z running from 7196 to 196 So7 5aX7L should be 1196 Solving for 5 in terms of n gives 1 96039X 5 Note that this requires the population standard deviation to be known We can compute the sample standard deviation and use that as an estimate of the population standard deviation 0X but welve not developed any theory as to the error in that estimate Example 2127 D Let X have the Bernoulli distribution with parameter p 7 ip a coin with probability p of heads assign the value 0 to tails and 1 to headsi Recall from section 21212 that X has mean p In section 2122 we took tails to be 1 l have changed the convention Suppose you ip the coin 1000 times ie n 1000 and obtain 520 heads Then in 0520 Then 71 96xp1ip N 5 7 W N 0 0619xp1 7 I For p 0 57 5 m 0031 Thus we are 95 certain that ex is within 0031 on either side of015207 ie between 0489 and 015511 lt1 32 27 Stochastic processes xxx Examples 0 Sequence of die rolls independent 0 Sequence of die tips nonindependent but Markov 0 Sequence of coin ips independent 0 Sum of coin ips Martingale 271 Die tips 272 Coin ips 94171g Xw 5229 20123 2k7n737113 PSn 2k 7 n rm 7 W42 273 Filtrations Show the ltration tree re nement of partitions of Q for the sum of coin ips Have done sigma of nite generating set show the size is 22 274 Markov processes Homogeneous die tips Nonhomogeneous sum of coin ips Xref back to the partition theorem Key point PMFs Which evolve in time Show some matrix products for the sum of coin ips 2 75 Martingales Sum of coin ips sub and super and at7 depending on p 33 3 Statistics The key concept here is parameter estimation My goals are 0 Present a few concepts from parametric statistics7 with an exampleheavy approach 0 Give uni ed notation for probability and statistics 0 Work out several parameterestimation examples concretely 31 Sampling 311 Finitepopulation example Work out an example with nite population 117 17 and simple random sample with replacement X17m7XNl De ne M and 02 De ne unbiased estimator Show that 7 and 32 are unbiased estimators of M and a2 respectively Explain the factor of n 7 l in 82 3 1 2 In nit epopulation example 32 Decision theory The presentation here follows Bha I am simply tabulating and elaborating upon his de nitions One has the following o A population space 9 0 An observation space or sample space as 9 containing observations X XL Xn which are n tuples of HD random variables on Q o A parameter space 9 indexing for 6 E 9 a family of probability measures P9 on Q Then for each P9 one obtains a probability space 9 7 Pg One can think of these measures P9 as conditional probabilities Px l 9 Nominally7 9 is Rd or some subset thereof 0 An action space A indexing for 9 E G a family of probability measures P9 on Nominally A is all or part of 9 This is best explained by example Suppose t9 M 02 where each 9 is a 2tuple de ning a normal probability distribution on the real line Then one might want to use one s observation only to estimate M Then 9 R X R whereas A is merely R o The population space7 sample space7 parameter space7 and action space are all measurable spaces with their respective aalgebras 34 o A loss function L z A A K This quanti es the loss incurred when 9 eg the population mean M is estimated by a eg the sample mean The most common loss function is the leastsquared error L9a H9 7 all 0 A decision rule d as A A For example 21Xi TL adXY xxx to do De ne admissibility Suf cient rabi thm 33 and cor Sill Make some plots 0 The risk function associated with a given decision rule is then MM E9 W M v The subscript on the E reminds us for cases when we need reminding which variables not to integrate out E9 of something will be potentially a function of 9 and so we won7t integrate over 9 In particular 1w d E9 M9 400 321 AL 9dxdPgxl 322 To summarize the players in a decision problem are n as e A d W a Rw d Example 31 D Let the X s be HD with mean M and variance 02 If the decision rule is a 7 with 9 M and leastsquarederror loss function then ROM El 0 7 W 7 272 iXi JFLQE iiXiX i1 n 11j1 2 n 1 n 1 n i1 i1i 11 Now the X s are HD so M is the same for all i and EXin EXiEXj for i f j Then n n 7 1 n MM M2 7 2M2 M Ema Recall that the variance was 02 7 EX1 7 02 7 Elel 7 2MElX1l 2 7 Elel 7 2 35 so a2 2 Then 2 Rlt 7dgt 7M2MM2TUQ1M2 TL TL TL TL This means that7 if we use the sample mean Y to estimate the population mean M our risk increases with larger population variance and decreases with larger sample size lt1 33 Parameter estimation 331 Maximumlikelihood estimation loglikelihood example as well 332 Method of moments 333 Bayes estimation Here 19 E 9 is thought of as a random variable Notation Random variable I X 35 9 19 G One somehow knows or guesses a probability distribution 739 on 9 This is called the prior distribution or simply prior This encodes what we know about 9 values prior to making an observation X Below welll have a posterior distribution which is conditioned on the observation X If 739 has a density 16196 with respect to Lebesgue measure then we write me 1299 d6 We have a loss function Lt9a as de ned in section 32 This will always be leastsquared error unless otherwise noted De nition 32 The Bayes risk of a decision rule d is TT d eRt9d dTt9 By equation 3217 this is rrd eE9L6dX dr6 which by equation 322 is7 in turn7 rrd e L 9dxdP9x dr6 36 De nition 33 A Bayes rule do is a decision rule which minimizes Bayes risk 7 T do llallf 7 T d where the in mum is taken across all decision rules d Note that a minimizer may not exist Recall from section 258 that if we have two random variables X and 19 then 0 Given the joint density fx19x 19 we can integrate out x to obtain the marginal density 16199 1296 A fame Likewise we can integrate out 19 to obtain the marginal density fxx fXX fx19X79 d3 9 0 We have new random variables X l 19 and 19 l X 0 We can compute their conditional expectations EXl19 and E19 1 X 0 We have conditional density which is joint over marginal fx19X7 9 7 fx19X7 9 Wm 1196 fxfx9ltx9gtdx and likewise fx19X79 7 fx19X79 f01xltxl 9 W 7 W9 0 Given these two facts we can solve just as in Bayes7 theorem theorem Bl for one conditional density in terms of the other 10199 fXX fmxw l X foX l 9 These facts motivate the following de nition De nition 34 The posterior distribution of 19 given X is 1399 fxx De nition 35 The posterior mean is the expectation of the posterior distribution fmx 19 l x fmxw l X foX l 9 The theorem is that the posterior mean d0XE19le 37 is a Bayes estimator of 9 satisfying de nition 33 Following CB I currently think this is because of the following assuming the distributions of X and 19 both have densities as above and using Bayes7 theorem and eR6dd76 E9L97dXl W L 9dxdP9x 479 3e L6dxdPx 9 479 3e x Llt67dltxgtgt fxmx l 0 dxl 1296 do U Llt67dltxgtgt fmxw l We fxltxgtdx 3e 6 The quantity in square brackets which is a function of x is called the posterior expected loss The conditionals in the integrals are reminiscent of the partition theorem theorem 235 When computing a Bayes rule one selects d to minimize the posterior expected loss for each x To compute do X we need to nd all three righthand terms in 1399 fXX fme l X fXMX l 9 These are found as follows 0 16199 is the given prior 0 fx 19x l 9 is given as the distribution of X with parameter 9 o fxx is the marginal of fx 19x l 9 found by integrating out 9 However as usual in probability one may nd a trick to avoid doing the integral 334 Minimax estimation def7n Sufficient rabi thm 36 and 37 38 A The coin ipping experiments This section is an extended worked example tying together various concepts We apply the Central Limit Theorem rst to repeated tosses of a single coin then to repeated collections of tosses A1 Single coin ips The rst experiment is tossing a single coin which has probability p of heads Then 9 THi Let X be the random variable which takes value 0 for tails and 1 for heads As discussed in section 222 X has the Bernoulli distribution with parameter p I will allow p to vary throughout this section although I will focus on p 05 and p 06 Recall that X has mean MX p In section 222 we took 1 for tails which is the opposite convention from the one here Its standard 0X 01 0 which is V02 05 and V024 W 04899 for p 05 and p 06 respectively deviation is Now ip the coin a large number n of times 7 say n 1000 7 and count the number of heads Using the notation of section 254 the number of heads is Sni There are two ways to look at this 0 On one hand from section 222 we know that Sn is geometric with parameters p and n 7 if we think of the 1000 tosses as a single experiment This is precisely what we will do in the next section The PMF of Sn is the one involving binomial coef cients we would expect 5 7 1 ie 500 or 600 and Us 7100 i P which is V250 m 1581 or 240 15i49 for p 0 5 and p 06 respectively 0 On the other hand the Central Limit Theorem section 263 says that as n increases the distribution of Sn begins to look normal This PDF involves the exponential function as shown in section 254 The mean of the sums Sn is Ms W WX np again 500 or 600 The standard deviation of those sums about the means 500 or 600 is as m EUX xnpl 7 p which are again 1581 and 1549 respectively Note that the binomial PMF and the normal PDF are not the same even though they produced the same means and standard deviations the geometric random variable has an integervalued PMF P499il S 5 S 4999 0 and likewise Sn can never be anything out of the range from 0 to 1000 The normal PDF on the other hand gives P499il S Sn S 4999 f 0 since we are taking area under a curve where the function is nonnegativei Since the output of the exponential function is never 0 the normal PDF gives nonzero although admittedly very very tiny probability of Sn being less than 0 or greater than 1000 See also MM for some very nice plots 39 Now we can ask about fairness of coins The probabilistic point of View is to x p and ask about the probabilities of various values of SW 1f the coin is fair what are my chances of ipping anywhere between 470 and 530 heads Using the geometric PMF is a mess 7 in fact my calculator can7t compute 137 without over owi Using the normal approximation though is easy I asked my TlSS to integrate its normalpdf function with M 500 and a 15181 from 470 to 530 and it told me 094221 How surprised should 1 be ifl toss 580 heads The standardization which call the 2score of Sn is de nition 21123 Sn Ms 2 if as This counts how many standard deviations away from the mean a given observation is 1 have 801581 m 506 so this result is more than ve standard deviations away from the mean 1 would not think the coin is fair lfl redo that computation with p 06 Ms 600 and as 15149 1 get a 2score of 71129 which is not surprising if the coin has parameter p 06 The point of view used in statistics is to start with the data and from that try to estimate with various levels of con dence what the parameters are Given a suspicious coin what experiments would we have to run to be 95 sure that we7ve nd out what the coins parameter p is to within say i001 Continuing the example from section 2164 we ask for 5 001 We had 87 196 so setting 5 001 and solving for n we have n 1 96axgt2 Mgt2 38416 101717 001 001 Now p1 7 p has a maximum at p 05 for which n 9604 so that many ips would determine p to within i001 with 95 con dence Re doing the arithmetic with 0001 in place of 001 gives n 960 400 Generalizing we see that each additional decimal place costs 100 times as many runsi A2 Batches of coin ips The second experiment is 1000 tosses of a coin where each coin has probability p of heads Or think of simultaneously tossing 1000 identical such coinsi Then 9 21000 W 103011 Let Y Q A R be the random variable which counts the number of heads This is an example where the random variable is far easier to deal with than the entire sample space which is huge As discussed in section 2212 Y has the binomial distribution with parameters p and n 1000 Recall from the previous section that that Y has mean MY 1000p eg 500 or 6001 Likewise its standard deviation is ay 1000P1 P7 which is V250 15181 or 240 15i49 for p 05 and p 06 respectively Let YN be the average of N runs of this experiment 7 that is the sample mean The Central Limit Theorem section 2613 says that as N increases the distribution of YN begins to look normal The mean of the sample means is 2 MY 1000107 40 which is again 500 or 600 The standard deviation of the sample means is a Naif WV YNNW N This is 15181V or 15149V respectively Here is the crucial interpretation Given the parameters p and n of Y there is a true population mean and a true population standard deviation If p 015 then ay m 1581 and even if we7re on the three millionth iteration of the ip1000 coins experiment its still going to be quite likely as ever that welll get a 514 or a 492 and so on If we don7t know the true p of the coins then while the true population mean My and the true population standard deviation O39y exist we don7t know what they are All we have is some suspiciouslooking identical coins and our laboratory equipment As we run and rerun the ip1000 coins experiment the following happens 0 The sample mean MVN for increasingly larger N will approach the population mean My 0 The variations of the sample mean will decrease We might say the error in our estimate of the population mean is shrinking o The population standard deviation appears in the above formulas via its effects on the standard devi ation of the sample mean 0 Nothing we have done so far has given us a reliable connection between the sample standard deviation and the population standard deviation We can guess that the sample standard deviation approaches the population standard deviation but in this course we have not developed any information about the error in that computationi Here is a numerical example 1 run simulated on a computer the ip1000 coins experiment 400 times The rst time 1 get 482 so the sample mean is 482 The second time 1 get 521 so the sample mean is 482 5212i The third time 1 get 494 so the sample mean is 482 521 4943 and so on Here is p 015 l N l Y l 7N l Sample stdi devi l 1 482 4821000 NA 2 521 5011500 271577 3 494 4991000 191975 4 512 5021250 171557 5 485 498800 171050 6 507 5001167 151613 395 493 5001258 161163 396 505 5001270 161144 397 501 5001272 161124 398 494 5001256 161107 399 501 5001258 161086 400 474 5001192 161120 Here is p 0 6 41 N Y 7N Sample std deV 1 616 616000 NA 2 584 600000 221627 3 620 6061667 191732 4 617 6091250 161919 5 583 604000 181775 6 613 6051500 171190 395 608 5991484 17031 396 615 5991523 17027 397 609 5991547 17013 398 607 5991565 161995 399 604 5991576 161975 400 611 5991605 161964 42 B Bayes theorem Bayes7 theorem is so important that it merits multiple points of view algebraic graphical and numerical B1 Algebraic approach Recall from de nition 212 that if A is an event and if B is another event with nonzero probability the conditional probability of A given B is Pmnm mmm Pm Bayes7 theorem tells us how to invert this how to compute the probability of B given A First the algebraic treatmenti Theorem B1 Bayes7 theorem Let A and B be events with nonzero probability Then Hammmmggg Proof Using the de nition intersection over given we have Pwnm Pam Hm Multiplying top and bottom by PB which is OK since PB 0 we get PwnAng B Al WHB Now notice that B NA is the same as A B so in particular PB NA is the same as PA NB Transposing the terms in the denominator gives us PmnBHm HE g HE PAlB m as desired 1 PBlA B2 Graphical numerical approach The following example is adapted from Suppose that 08 of the general population has a certain disease and suppose that we have a test for it Speci cally if a person actually has the disease the test says so 90 of the time If a person does not have the disease the test gives a false diagnosis 7 of the time When a particular patient tests positive what is the probability they have the disease We can write this symbolically as follows Let D be the event that the person has the disease and b be its complement let Y for yes be the event that the test says the person has the disease with complement 43 Y Writing the given information in these terms we have PD 0008 PYlD 090 POW 007 Before computing any conditional probabilities let s nd the probability of Y by itself Using the partition theorem theorem 218 we know this is PY PmD PD Pm HF 090 0008 007 0992 0077 So the nonconditional probabilities are H 0992 PW 0923 PD 0008 PY 0077 Looking at D and Y separately we can think of the population at large as being split into those with and without the disease and those for whom the test is positive or negative Suppose in particular that we have a sample of 1000 people who are representative of the general population Here are some very rectangular Venn diagrams i D D ll 992 8 923 77 lt7 Bayes7 theorem has to do with how these two partitions intersect to make four groups lii Fl 7 Dllil lii Fl 7 Dllil PmD 992 7 130713 s 7 and PDlY 923 7 PDlY 923 7 Pm 992 7 130713 s 7 HEY 77 7 PDlY 77 7 ltlt ltlt We can use the theorem to nd the probability our patient has the disease given the positive test result PD PltD7Ygt PltY7Dgt 0008 0 90 39 0077 0094 That is there7s only a oneineleven chance the patient actually has the disease Kaplan and Kaplan7s idea is to look at this surprising result in the context of the other 999 people also tested 1 will elaborate on this working out all the math We have two conditional probabilities given PYlD and PYlb We found PDlY What about PDl7 We can use the partition theorem theorem 218 again to solve for what we don7t know in terms of what 44 we do know PD PDlYPY PDl7 137 PD PDlY PO P D 7 7 lt l gt Pm 7 0008 7 0094 0077 0923 00008 We now have all four conditional probabilities POW 007 PDl7 00008 PYlD 090 PDlY 0094 Now we can ll out the foursquare table 0 Since POW 0077 seven percent of the 992 diseasefree people 70 of them get false positives the rest 922 get a correct negative result 0 Since PYlD 090 ninety percent of the 8 people with the disease test positive ie all but one of them one of the 8 gets a false sense of security 0 Since PDl7 000087 008 of the 923 with negative test results one person does in fact have the disease the other 922 as we found just above get a correct negative result 0 Since PDlY 00947 only 94 of the 77 people with positive test results 7 people have the disease the other 70 get a scare and7 presumably7 a retest So7 the sample of 1000 people splits up as follows nul 922 1 70 7 Moreover7 we can rank events by likelihood 1 Healthy people correctly diagnosed 922 2 False positives 7 3 People with the disease7 correctly diagnosed 07 4 False negatives 01 Now its no surprise our patient got a false positive this happens 10 times as often as a correct positive diagnosis B3 Asymptotics The speci c example provided some insight7 but what happens when we vary the parameters We had PYlD 131 7 PYlDPD Pam PltYgt PltYDgtPltDgtPltYFgtPltDgt 45 Let7s turn this into a design problemi Let p PD a PmD b Pm 175 Paw 5 HEY How would you choose the testdesign parameters a and 12 ie how good would your test have to be to get 5 small What if the disease is rarer p smaller Suppose we want a high likelihood of correct detection ie 5 small Then my W HEY Wig 71177 Solving for a and b we get a 1 i 51 i P 7 gt i b 8p There are two free parameters a and b so 1711 just consider their ratio Now the function 171 I blows up near zero for small 1 it s approximately lzi For small 5 and p we have b i lt 511 a If say we have p and 5 both 0001 then a and I need to differ by a factor of a million Recall that a and b are both probabilities and so range between zero and one To test a oneinamillion event with 999976 con dence p 10 6 and 5 10 4 I must be less than 10 10i That tells us how to choose POW to in order to get PblY ie the probability of false positives small What about false negatives PDlV If you do similar algebra to the above you should nd that getting less than 5 requires 5 1 7 a lt 7 p This is a less strict constraint PYlD needs to be very close to 1 only when 8 lt p B 4 Conclusions Some points 46 o PYlD is information known to the person who creates the test 7 say at a pharmaceutical company PDlY is information relevant to the people who give and receive the test 7 for example at the doctors of ce This duality between design and implementation suggests that Bayes7 theorem has important consequences in many practical situations 0 The results can be surprising 7 after all in the example above the test was 90 accurate was it not Bayes7 theorem is important to know precisely because it is counterintuitivei 0 We can see from the example and the asymptotics above that rare events are hard to test accurately for If we want certain testing for rare events the test might be impossible or overly expensive to design in practical terms 47 C Probability and measure theory Probability theory is measure theory with a soul 7 Mark Kaci Modern probability is a special case of measure theory but this course avoids the latter Here we draw the connections for the reader With a measuretheoretic background Full information may be found in FG but I like to have a brief handy reference See also Fol Rud or Royi C1 Dictionary Measure theory analysis Probability The sample space 9 is simply a set Same Measure theory analysis Probability A a eld f on Q is a subset of 29 satisfying the Same axioms of de nition 25 7 must contain at least 0 and Q and it must be closed under complements countable unions and countable intersections Note that even if Q is uncountable for example 9 R 29 still satis es the axioms for a a eldi lf 9 is a topological space eigi Rd the standard a eld is the Borel a eld Which is the one generated by all the open sets of 9 Measure theory analysis Probability The pair 9 is called a measurable spacei Same This is an unfortunate misnomer since it may not be possible to put a measure on it 7 in Which case we would certainly think of it as unmeasurable Measure theory analysis Probability Elements of f are called measurable sets This is also a misnomer because we haven7t de ned measures yetl An event is nothing more than a measurable set 48 Measure theory analysis Probability A measure is a function M f A 0oo with the following properties 0 M0 0 o For all A E f MA 2 0 o If 141142 is a nite or countable sub set of f with the Ails all pairwise dis joint then PUiAl This is the countable additivity property of the probability measure M Note that if Q is nite or countably in nite it is possible to de ne a measure on the biggest a eld on 9 namely 7 29 lf 9 R then it is not possible to Measure theory analysis A probability measure is a measure with the additional requirement that MQ 1 Thus we have M f A 01 de ne a measure on f 29 See Fol for a proof Probability A measure space is a triple 97 M where M is a measure on 7 Measure theory analysis A probability space is a triple 97 P where P is a probability measure on 7 Probability A measurable function is a function f from one measurable space 9 to another measurable space 119 such that the preimage under f of each measurable set in 11 is a measurable set in 9 That is for all B 6 Q f 1B e f Measure theory analysis A random variable is a measurable function X from a probability space 51 P to a measur able space 11 Q For this course that measurable space has been 11 R with Q being the Borel sets in R Probability Expectation f9 Xw dPoJ Remark C1 In an undergraduate context we use the terms probability density function and cumulative Expectation EX distribution function In a graduate context we use the following 0 We have a realvalued random variable X That is we have X Q A R where Q 7 P is a probability space and the measurable space is R 3 namely the reals with the Borel aalgebra o This gives us a probability measure MX on R by MX B PX 1B This is called simply the distribution of X 0 We can then measure the particular Borel sets 700 This the CDF is called the distribution function of X o If dMX is absolutely continuous with respect to Lebesgue measure ie dMX fX dz where fX is the familiar PDF we say fX is the density of X 49 This de nes a function FX MX700 Measure theory analysis Probability The Laplace and Fourier transforms respectively ofszHRare wt R fltzgt dt ma R fltzgt dt xxx 829 cty of nite measures xxx three kinds of measures discrete cts singular xxx 95 and 924 MCTXZ DCTXZ Fub Fatou The momentgenerating and characteristic func tions respectively of a realvalued random vari able X are Mxt EetX and BXQ EeitX Let M be the distribution of Xi We say that the momentgenerating and characteristic func tions are the Laplace transform and Fourier transform respectively of the measure 2 LMemdz and fMem du R R If the distribution of X has a density function fz ilei do Where do is absolutely continuous With respect to Lebesgue measure dz then these are Mxlttgt Ele Xl R fltzgtdz wt and gm Eei X R em 161 dz Hwy xxx cvgce aislWipll ilpl iimlsl in dist 10 29 amp 1031 avorsl xxx 914 Borel Cantellil xxx 926 a eld indepl xxx 928 prod meas xxx 1019 Kolm 01 and tail eld C 2 Measurability xxx include the notless re ned than picturesi C3 Independence and measurability xxx type up handwritten notes XXX emph summarizing content from the main body of the paper con necting it With the more abstract and more puzzling measuretheoretic notionsi xxx mention calculus of expectations in the independent and measurable casesl XXX perhaps mention SDEs i i i 50 xxx caveat l7m Writing everything in terms of discrete random variables xxx recall the following PX I PX 1I and PAB PA O B The righthand sides are settheoretic the lefthand sides are in more common use Events 0 Events A and B are independent if PA N B PAPB 0 De nition of conditional probability PA nB PAlBW 0 Partition theorem PltAgt Z PltA l B 1331 Random variables 0 Partition theorem n Em ZElX l Bi PltBigt i1 Independence of a random variable and an event PX 1 NB PX zPB for all z E R Independence of two random variables PXz Yy PXIPYy for all LyeR Conditional expectation ET I P X 1 1 B EX B ZIPX I l B ZIPX 11 l B Note given means relative to o If X is independent of B7 then PB factors out of the numerator Em B Enggme zEzPlt gB zgtgtPltBgt ZIPX1I Em ac 51 o If X is Bmeasurable then X 1z 0 BBCQ E z PX 1z n B XBPB n B XBCPBC n B XBPB ElX Bl H3 H3 H3 XB EmwB Xw ELIPMgtWNWQ EXQ Pm ZIPX WD XBPB XB PB Ele EXZJ0 Claim this is just X When Written on partitions Claimz a nite aalgebra is generated by a partition xxx include the notless re nedthan pictures 0 Example Let Q l234 With Pk 14 for k 123 4 Let Xl X3 2 and X2 X4 3 Let B 13 Then EX 25 and EX B 2 Which is What one would expect 0 Conditional PMF RXLYB BXzlYygtm xxx 4X4 dot gure herei Example C 2 D Let n 1234 f 29 and Pk 14 for k 1234 Let X1 1 X2 0 X3 1 X4 0 That is X is the parity random variable Here is a aalgebra Q such that X is not Qmeasurablei Let QNHDSAHAZampQALZampQ Then X 11173 9 Thus X is not Qmeasurablei However X 1113e f so X is f measurablei Next I want to enumerate the subsets G of 9 such that X is independent of Cl We have xxx Xref X is independent of G if for all z E R PX k n G PX BPG 52 For brevity so that the table below Will t on the page Write A0 X 10 and A1 X 111 Ordered by sizes of C75 H G H Ao G H Al G PG PA0PG H PA0 G PA1PG H PA1 G H Indep H H H H HH 0 HH 0 H 0 HH 0 H 0 H yes H 1 1 14 18 0 18 14 no 2 2 14 18 14 18 0 no 3 3 14 18 0 18 14 no 4 4 14 18 14 18 0 no 13 13 12 14 0 14 12 no 24 24 12 14 12 14 0 no 12 2 1 12 14 14 14 14 yes 14 4 1 12 14 14 14 14 yes 2 3 2 3 12 14 14 14 14 yes 3 4 4 3 12 14 14 14 14 yes 123 2 13 34 38 14 38 12 no 124 24 1 34 38 12 38 14 no 134 4 13 34 38 14 38 12 no 234 2 4 3 34 38 12 38 14 no H1234H 24 H 13 HH 1 HH 12 H 12 HH 12 H 12 H yes H Ordered by independence and dependence H G H Ao G H Al G PG PA0PG H PA0 G PA1PG H PA1 G H Indep H 0 0 0 0 0 yes 1 2 2 1 12 14 14 14 14 yes 1 4 4 1 12 14 14 14 14 yes 23 2 3 12 14 14 14 14 yes 34 4 3 12 14 14 14 14 yes 12 34 2 4 13 1 12 12 12 12 yes 1 1 14 18 0 18 14 no 2 2 14 18 14 18 0 no 3 3 14 18 0 18 14 no 4 4 14 18 14 18 0 no 1 3 1 3 12 14 0 14 12 no 2 4 2 4 12 14 12 14 0 no 12 3 2 13 34 38 14 38 12 no 12 4 2 4 1 34 38 12 38 14 no 13 4 4 13 34 38 14 38 12 no 234 2 4 3 34 38 12 38 14 no lt1 53 D A proof of the inclusionexclusion formula Proposition Inclusionexclusion formula Let A1 i i i An be events Then Pu1A Z PAi7 Z PAi Aj Z PAmAj Ak7iH71 1PA1 u Ani liiin 1 iltj n 1 iltjltk n Proof The proof is by strong induction For the base case n l the lefthand side is PA and the righthand side is also PAi A bonus case n 2 is not necessary but helps to illustrate what s going on The formula is PAU B PA PB 7 PA O B This is easy to prove using Venn diagrams and finite additivity of P on the disjoint sets A B A N B and B Al I am not including a picture in this note The point though is that if A and B overlap then A N B is counted twice overcounted in PA N B so we need to subtract off PA N B to compensate Now for the induction stepi Suppose the inclusionexclusion formula is true for 12 l i i n 7 l welll only need 2 and n 7 l and show it s true for n Notationally this is a mess llll do the n 3 case since it is easier to understand This will illuminate how to proceed in the messier general case For n 3 we are asked to show that PAUBUC PAPBPC7PA B7PA C7PB CPA B Ci Since we want to use induction we can try to isolate C from A and B We can write PA U B U C PA PB PC 7PAnB 7PAnC 7PBnC PA O B 0 We have PA PB 7 PA O B PA U B by the induction hypothesis at n 7 l 2 By isolating terms with C we have found terms involving one fewer set For the moment to make things a little clearer write X A U B Then we need to show that PXUC PXPC7PA C7PB CPA B Ci Using the induction hypothesis for the two sets X and C we know that PX U C PX PC 7 PX O 0 Looking at these last two equations 7 the first of which we need to prove and the second of which we already know is true 7 we see that we7ll be done if only we can show that 7PA C7PB CPA B C 7PX Ci 54 Toggling the negative signs this is PX C PA CPB C 7PA B no I put X A U B only for convenience I7m done with it now The statement I need to prove is PAUB C PA CPB C 7PA B WC The trick is that A B CA C B C and AUB C A CUB Ci That is I distribute the C7si So the statement I need to prove is PA CUB C PA CPB C7PA C B Ci Now we again have one fewer set involved this is the inclusionexclusion formula for the two sets A N C and B N C Thus this statement is true by the induction hypothesis And that was the last thing we needed to prove Now guided by the n 3 case we can con dently wade into the morass of subscripts which is the induction step We are asked to show that Pu1A Z PAi7 Z PAi Aj Z PAmAj Aklt 71 1PA1 u Ani liiin 1 iltj n 1 iltjltk n Since we want to use induction we can try to isolate An from the others We can write PU11Ai UAn Elgignil PAi PAn Eigiqgna PAi A1 Elgignil PAi n An El iltjltkgnil PAi Aj Ak El iltjin71 PAi Aj n An 71 PA1 O i H O Ana H71 2151971 PA1 O i H O A O i H O Ana An 71 1PA1 O i H O An where the notation means omit the Ai from the intersection We have P U584 P A 7 P Ai A39 P Ai A Ak 7m 71 P Al ii An71 z 1 J J lgignil 1 iltj n71 1 iltjltk n71 55 by the induction hypothesis at n 7 1 By isolating terms with An we have again found terms involving one fewer seti As above for clarity temporarily write X Uglglli Then we need to show that PXUA 7 PXPAn 7 Z PAi An 15i5n71 7 Z PAi Aj An 15iltj n71 H71 2 PA1 m A m An1 An 15i5n71 71 1PA1 O i H O An Using the induction hypothesis for the two sets X and An we know that PX U An PX PAn 7 PX An Looking at these last two equations 7 the rst of which we need to prove and the second of which we already know is true 7 we see that we7ll be done if only we can show that 7 Z PA n An 15157171 7 Z PA n A An 15iltj n71 H71 2 PA1 m flmm An1 An igign71 4771 1PA1 O i H O An 7PX An Toggling the negative signs this is PX n An 7 Z PA n An igign71 7 Z PA ml n An 15iltj n71 i H 771 Z PA1 m A m An1 An 15i5n71 771 1PA1 O i H An Note that 771 is the same as 71 1i So we need to show PX n An 7 Z PA An 15157171 7 Z PA m1 An 15iltj n71 i H H71 1 2 PA1 O i i i r721 O i H O Ana An 15i5n71 4771 PA1 O i i i An 56 As before I don7t need to write X Uzi anymore The statement I need to prove is PltltuzglAigtnAngt Z PltAmAngt lgignil 7 Z PAi Aj An 15iltjsn71 m 71 1 Z PA1 m A m An1 An lgignil 71 PA1 O i H O A7 The distribution tricks are Al m An Al AnmeAwl An and A1 UmuAn1 An A1 714 Um U An1 An So the statement I need to prove is PltU2Z11ltAmAngtgt Z POLMn lgignil 7 Z PAi An Aj An 15iltjsn71 Hi 71W1 Z PA1 An m Ai An m An1 An 1ltiltn71 71 PA1 An N i H O An1 Ani Now we again have one fewer set involved this is the inclusionexclusion formula for the n7 1 sets A1 An through An71 Ani Thus this statement is true by the induction hypothesis Since that is all that remained 1 to be shown we are done 57 References Bha Bhattacharya R Theoretical Statistics Course notes University of Arizona Math 567A spring 2008 CB Casella G and Berger RiLi Statistical Inference 2nd ed Duxbury Press 2001 F01 Folland GiBi Real Analysis Modern Techniques and Their Applications 2nd ed WileyInterscience 1999 FG Fristedt B and Gray L A Modern Approach to Probability Theory Birkhauser 1997 GS Grimmett G and Stirzaker D Probability and Random Processes 3rd ed Oxford 2001 Kennedy Kennedy T Math 564 Course at University of Arizona spring 2007 KK Kaplan Mi and Kaplan E Chances Are 39 Adventures in Probability Penguin 2007 MM Moore 138 and MCCabe GPi Introduction to the Practice of Statistics Freeman and Co 2005 Roy Royden HiLi Real Analysis 2nd ed MaCMillan 1968i Rud Rudin Wi Principles of Mathematical Analysis 3rd ed MCGraWHill 1976i 58 59 Index A action space i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 almost surely i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 30 B Bayes risk i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 136 Bayes rule i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i H 37 Bayes7 theorem i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 143 Bernoulli distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i 9 beta distribution H H i i 19 bimodal i i i i i i i i i i i i i i 17 binomial distribution i i i i i i i i i i i i 9 bivariate normal distribution i i i i i i i i i i i i i i i i i i i i 27 Borel a eld i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 48 C Cauchy distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 18 CauchySchwarz inequality i i i i i i i i i i i i i i i i i i i i i i 28 CDF i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 17 central limit theorem i i i i i i i i i i i i i i i i i i i i i i i i i i 31 change of variables i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 25 characteristic function i i i i i i i i i i i i i i i i i H 15 25 50 Chebyshev7s inequality i i i i i i i i i i i i i i i i i i i i i i i i i i 29 coarsest i i i i i i i i i i i i i i i i i i i i i 7 conditional density 1 i i i i i i i i 126 37 52 conditional expectation 12 26 37 51 conditional PMF i i i i i i i i i i i i i i i i i i i i i i i i i i H 12 52 conditional probability i i i i i i i i i i i i i i H 8 34 43 51 con dence intervals i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 32 continuous random variable i i i i i i i i i i i i i i i i i i i i i 17 convergence in distribution i i i i i i i i i i i i i i i i i i i i i i 31 convergence in mean square i i i i i i i i i i i i i i i i i i i i i 29 convergence in probability 1 i i i i 129 convolution i i i i i i i i i i i i i 16 correlation coef cient r i i 27 countable additivity i i i i i i i i i i i i i i i i i i i i i i i i i i 7 49 covariance i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 27 28 cumulative distribution function i i i i i i i i i i i i i i i i 117 D decision rule i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 35 density i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 149 discrete random variable i i i i i i i i i i i i i i i i i i i i i i i i i 9 disjoint i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 7 distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 49 distribution function i i i i i i i i i i i i i i i i i i i i i i i i i i i i 49 E event i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 7 48 event space i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 7 expectation i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i H11 21 expected value i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 11 21 experiment i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 7 exponential distribution i i i i i i i i i i i i i i i i i i i i i i i i i 18 F factorial i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 120 factors i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 13 nest i i i i i i i i i i i i i i i 7 Fourier transform i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 50 G gamma distribution i i i i i i i i i i i i i i i i i i i i i i i i i 19 20 gamma function i i i i i i i i i i 120 geometric distribution 1 H10 given i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 8 I identically distributed i i i i i i i i i i i i i i i i i i i i i i i i i i i i 9 HD i i i i i i i i i i i i i i i i i i i i 14 in distribution 1 i i 31 in mean square i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 129 in probability i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 29 Inclusionexclusion formula i i i i i i i i i i i i i i i i i i i i i 154 independence i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 51 independent i i i i i i i i i i i i i i i i i i i i i i i i i 8 13 23 51 integrate away i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 22 J Jacobian matrix i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 25 joint CDF i i i i i i i i i i i i 22 joint density 1 i i i i 13 37 joint PDF 22 joint PMF i 13 jointly continuous i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 122 L Laplace transform i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 50 law of large numbers strong i i i i i i i i i i i i i i i i i i i i 30 law of large numbers weak i i i i i i i i i i i i i i i i i i i i i 129 Law of the Unconscious Statistician H 11 13 21 23 law of total expectation i i i i i i i i i i i i i i i i i i i i i i i i i 12 law of total probability i i i i i i i i i i i i i i i i i i i i i i i i i i i 8 leastsquared error i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 35 loss function i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 35 M marginal i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i H 13 22 60 marginal density i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 37 mean i i i i i i i i i i i i i H 9111718 21 measurable function i i i i i i i i i i i i i 49 measurable sets 1 i i i i i i 48 measurable space i i i 148 measurable subset i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 7 measure i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 149 measure space i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 49 median i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 17 method H 20 24 MGF i i i 15 mode i i i i i i i i i i i i i i i i i i i i i i i i i 17 momentgenerating function H H 15 25 50 N negative binomial distribution 1 i i i i i i i 10 normal distribution i i i i i i i i i i i i i 19 20 normalization i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 31 normalize i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 28 O observation space i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 observations i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 outcome i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 7 P pairWise independent i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 8 parameter space i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 parity i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 52 partition theorem i i i i i i i i i i i i i i i i i i i i i i i i i 8 12 51 PDF i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 17 49 PMF i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 9 Poisson distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 10 population i i i i i i i i i i i i i i i i 31 population mean H 14 31 35 population space i i i i i i i i i i 34 population variance i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 14 posterior distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i 37 posterior expected loss i i i i i i i i i i i i i i i i i i i i i i i i i i 38 posterior mean i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 37 preimage i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 9 prior distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 36 probability density function i i H 17 49 probability mass function i i i i i i i i i i i 19 probability measure i i i i H i 7 49 probability measures i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 probability space i i i i i i i i i i i i i i i i i i i i i i i i 1 17 34 49 R random variable i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 9 49 range i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 25 risk function i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 35 61 S sample i i i i i i i i i i i i i i i i i i i H 29 31 sample mean H 14 24 29 31 35 40 sample space H i i i i i i i i i H7 34 48 a eld i i i i i i i i i i i i i i i 7 48 standard deviation i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 11 standard normal distribution i i i i i i i i i i i i i i H 19 20 standardization i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 31 40 strong law of large numbers i i i i i i i i i i i i i i i i i i i i i 30 T total expectation law of i i i i i i i i i i i i i i i i i i i i i i H 12 total probability law of i i i i i i i i i i i i i i i i i i i i i i i i i i 8 U unbiased estimator i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 uncorrelated i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 14 uniform distribution i i i i i i i i i i i i i i i i i i i i i i i i i i i i 118 V variance i i i i i i i i i i i i i i i i i i i i i i i i i H9 11 18 21 28 W weak law of large numbers i i i i i i i i i i i i i i i i i i i i i i 29 With probability one i i i i i i i i i i i i i i i i i i i i i i i i i i i i 30 Z zscore i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 32 40

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I made $350 in just two days after posting my first study guide."

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.