### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# PROBABILITY I M 362K

UT

GPA 3.67

### View Full Document

## 5

## 0

## Popular in Course

## Popular in Mathematics (M)

This 96 page Class Notes was uploaded by Reyes Glover on Sunday September 6, 2015. The Class Notes belongs to M 362K at University of Texas at Austin taught by Staff in Fall. Since its upload, it has received 5 views. For similar materials see /class/181473/m-362k-university-of-texas-at-austin in Mathematics (M) at University of Texas at Austin.

## Reviews for PROBABILITY I

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/06/15

Lecture 12 ABSORPTION AND REWARD 1 of 6 Course M362K Stochastic Processes 1 Term Fall 2008 Instructor Gordan Zitkovic Lecture 12 ABSORPTION AND REWARD Note Even though it is not by any means necessary we are assuming that from this point onwards all Markov chains have nite state spaces 121 Absorption Remember the Tennis example from a few lectures ago and the question we asked there namely how does the probability of winning a single point affect the probability of winning the overall game An algorithm that will help you answer that question will be described in this lecture The rst step is to understand the structure of the question asked in the light of the canonical decomposi tion of the previous lecturer In the Tennis example all the states except for the winning ones are transient and there are two oneelement recurrent classes Am lie wins and Bjorn wins The chain starts from a transient state 00 moves around a bit and eventually gets absorbed in one of the two The probability we are interested in is not the probability that the chain will eventually get absorbed This probability is always 1 this is true in every nite Markov chain but we do not give a proof of this We are instead interested in the probability that the absorption will occurr in a particular state the state Am lie wins in the Tennis example A more general version of the problem above is the following let i E S be any state and let j be a recurrent state If the set of all recurrent states is denoted by C and if 70 is the rst hitting time of the set C then X70 denotes the rst recurrent state visited by the chain Equivalently X70 is the value of X at random time To its value is the name of the state in which it happens to nd itself the rst time it hits the set of all recurrent states For any two states ij E S the absorption probability uij is de ned as uij lP ZXTC j lP Z the rst recurrent state visited by X is jli When j is not a recurrent state then ui39 0 j cannot possibly be the rst recurrent state we hit it is not even recurrenti thn i j is a recurrent state then uii l we are in i right from the start The situation i E T j E C is the interesting one In many calculations related to Markov chains the method of rststep decomposition works miraclesi Simply we cut the probability space according to what happened in the rst step and use the law of total probability assuming i E T j E uzj 192le j ZMXTC leo LXI kllP le leo 239 Ices 2Ple lei klpik 68 The conditional probability lP XTC lel k is an absorption probability tool If k j then lP XTC lel k 1 If k E C j then we are already in C but in a state different from j so lP XTC lel k 0 Therefore the sum above can be written as ui j ZPikukj Pip which is a system of linear equations for the family uiji E Tj E C Linear systems are typically better understood when represented in the matrix formi Let U be a T X Cmatrix U uiji E Tj E C and let Q be the portion of the transition matrix P corresponding to the transitions from T to T iiei Last Updated November 25 2008 Lecture 12 ABSORPTION AND REWARD 2 of 6 Q piji E Tj E T and let R contain all transitions from T to C ie R pijieTjec lf Po denotes the matrix of all transitions from C to C ie Po piji E Cj E C then the canonical form of P looks like this P 1 c 0 P 7 l R Q r The system 121 now becomes U QU R ie I7 QU R If the matrix I7 Q happens to be invertible we are in business because we then have an explicit expression for U U I 7 Q 1R So is I 7 Q invertible It is when the state space S is nite but you will not see the proof in these notes When the inverse I 7 Q 1 exists it is called the fundamental matrix of the Markov chain Example 121 Before we turn to the Tennis example let us analyze a simpler case of Gambler7s ruin with a 3 The states 0 and 3 are absorbing and all the others are transient Therefore C1 0 C2 3 and T T1 1 2 The transition matrix P in the canonical form the rows and columns represent the states in the order 03 12 0 l l 7 p 0 17 Therefore The matrix I 7 Q is a 2 X 2 matrix so it is easy to invert 7 7 l l p QV mlw ll So 1717 p2 U 71 2 1 1 if 16 0 16 PPM 7 7 7p 1 Pt 1 1 W W Therefore for example if the initial wealth is l the probability of getting rich before bankruptcy is 10217 pp2 We have already calculated this probability in one of the homework problems HW4 Problem 44 There we obtained that the desired probability equals 1 7 l 7 3 You can check that these two expressions are really the same Example 122 In the Tennis example the transition matrix is 20 X 20 with only 2 recurrent states each in its own class The matrix given in Lecture 8 is already in the canonical form the recurrent states correspond to the rst two rowscolumns In order to get I 7 Q 1 we need to invert an 18 X 18 matrix This is a job for a computer and we use Mathematica P is the full transition matrix and 3 is order of the state 0 0 in the internal representation After we remove the absorbing states the number of 00 becomes 1 in the Mathematica internal order and that is why we are extracting the element in the rst Last Updated November 25 2008 Lecture 12 ABSORPTION AND REWARD 3 of 6 row and rst column in U to get the result n114 QP3 ii 20 3 20 RP3 ii 20 1 2 ln115 F InverseIdentityMatrix18 Q n116 UFR ln117 Ul 1 20p5 13 71 2pq Oui117 p 1 4 p 1 q 10 p 1 q2 r If we plot this probability against the value of p we get the following picture 10 122 Expected reward Suppose that each time you visit a transient state j E T you receive a reward gj E R The name reward is a bit misleading since the negative gj corresponds more to a ne than to a reward it is just a name anyway Can we compute the expected total reward before absorption 39Uz39 EAZMXM And if we can what is it good for Many things actually as the following two special cases show 1 If gj l for all j E T then 1 is the expected time until absorption We will calculate 11am for the Tennis example to compute the expected duration of a tennis game 2 If 90 l and gj 0 for j k then vi is the expected number of visits to the state k before absorption In the Tennis example if k 4040 the value of 11am is the expected number of times the score 4040 is seen in a tennis game Last Updated November 25 2008 Lecture 12 ABSORPTION AND REWARD 4 of 6 We compute vi using the rststep decomposition ElZQXnlX0 239 9239 ElZyXnlXo 239 gltigt Zmigmm 2X1 W1 MXO 21 121 68 711 TC 9239 ZPmIEiZaXmXI k 68 711 If h E T then the homogeneity of the chain implies that TC TC EZgXnlX1 k EZgXnlX0 k pk n1 n0 When h g T then TC EZgXnlX1 k 0 n1 because we have arrived and no more rewards are going to be collected Therefore for i E T we have 39Ui39 92 210174119 kET If we organize all 1 and all into column vectors 1 vii E T g gii E T we get 1 Qv 9 ie 1 If Q 1gi Having derived the general forumla for various rewards we can give an interpretation of the fundamental matrix itself Let us pick a transient state j and use the reward function 9 given y 1 k j k 1 7 v 7 y 1H 07 k i j By the discussion above the ith entry in v If Q 1g is the expected reward when we start from the state i Given the form of the reward function vi is the expected number of visits to the state j when we start from i On the other hand as the product of the matrix If Q 1 and the vector y 0 0 l 0 vi is nothing but the ijentry in I 7 Q 1 Proposition 123 For two transient states i andj the ijth entry in the fundamental matrix If Q 1 is the expteced number of visits to the statej prior to absorbtion the chain starts from the state i the time 0 is counted j Example 124 We continue the analysis of the WI ennis77 chain from Example 122 We set l for all transient states i to nd the expected duration of a tennis game mm GTablel 1 1 18 n8611 fp1 simplifyFG1111 m 1 p1 4 71prp276p3 18p4718p5 6p6 Ounas r 1 7 2 p 2 p2 Last Updated November 25 2008 Lecture 12 ABSORPTION AND REWARD 5 of 6 We can plot this as a function ofp to get the following graph 70 When p 0 the course of the game is totally predictable and Amelie wins in 4 points The same holds when p 1 only it is Bjorn who wins with probabilty 1 this time In between we see that the expected gamelength varies between 4 and about 7 actually the exact number is 675 and it longest when p How about the expected number of deuces scores 40 40 We can compute that too by setting 0 ifi 40 40 and 940 40 l the number 16 used below is just the Mathematica internal number of the state 4040 among the transient states mum GTableIf1 16 1 0 1 1 18 Oui117 H001030010000103WIND000100 ln11B fp simplifYFG 11 1 qal pH 20 71p3p3 Out118 7 172p2p2 The plot of the obtained expressions as a function of p looks like this 0 0 0 0 L J v o HiWHiWHH HH HH HH 0 to 0 Last Updated November 25 2008 Lecture 2 MATHEMATICA IN 15 MIN Course M562K Stochastic Processes l Term Spring 2008 Instructor Gordan Zitkovic Lecture 2 iof5 MATHEMATch IN 15 MIN Mathematica is a glorified calculator Here is how to use it 21 Basic Syntax 0 Symbols s are all supported by Mathemati ica Multiplication can be represented by a space between variables a x b and ax b are identical 0 Warning Mathematica is casesensitive For example the command to exit is Quit and not quit or QUIT o Brackets are used around function arguments Write SinEx not Sinxgt or Sinfx o Parentheses gt group terms for math operations SinExJ CosygtTanz z 2gt o lf you end an expression with a semiecolon it will be executed but its output will not be shown This is useful for simulations eg o Braces are used for lists n l A 1 2 3 0mm 1 2 3 0 Names can refer to variables expressions functions matrie ces graphs etc A name is assigned using name object An expression may contain undefined names n5 A a b A3 Oul5 a b 3 ln6 A A 2 Oui6 a b 6 o The percent sign 7 stores the value of the previous result In7 5 3 0mm 8 In8 A 2 0mm 64 22 Numerical Approximation o N expr gives the approximate numerical value of expres7 sion variable or comman n9 NSqrt2 Dung 1 4 1421 0 NFL gives the numerical value of the previous result 1Actually this 15 qu a tip of the Iceberg It can do many many many other things In17 E Pi 0ul17 e 7T ln M96 M13 5 8598 o NEexprn gives 71 digits of precision for the expression n14 NPi 30 Out14 31415926535897932384626433832E o Expressions whose result can t be represented exactly don t give a value unless you request approximation n11 Sin3 0mm Sin3 n12 N Sin3 omr121 014112 25 Expression Manipulation 0 Expand expr algebraically expands the expression expr In19 Expand a b A 2 0mm a2 2 a b b2 0 Factor expr factors the expression expr In20 Factor a A 2 b A 2 0ut20 a r b a b In21 Factor x02 5 x 6 DUNN 3 X 2 X o Simplify expr performs all kinds of simplifications on the expression expr I135 A x x l x 1 x X X 0u35 7 a 7 r 1 X 1 X In36 SimplifyA 2 X muse 7 r 1 X2 Last Updated September 2 2008 Lecture 2 MATHEMATICA IN 45 MIN 2 of 5 24 Lists and Functions 0 lf you want to add all the elements of a list L use Total L The list of the same length as L but whose 16 element is 0 If L is a list its length is giVen by Length L The 71th 61 given by the sum of the first k elements of L is given by ement of L can be accessed by Ln note the double Accumulate L brackets Ins1 L 1 2 3 4 5 quot1431 L 2 4 6 s 10 0mm 1 2 3 4 5 0mm 2 4 6 8 10 In9 Accumulate n44 L 3 0mm 1 3 6 10 15 OLIHM 6 0 Addition subtraction multiplication and division can be ap Inner To tal L plied to lists element by element 0mm 15 n139 L 1 3 4 K 3 4 2 25 Linear Algebra n2 L K 4 7 6 0 ln Mdfhemd ca matrix is a nested list ie a list whose elf 0mm l ements are lists By convention matrices are represented I 3 L K row by row inner lists are row vectors I 39 0 To access the element in the 139 row and j column of the 1 3 matrixAtypeAij orAij Ou3 7 2 3 4 In59 A 2 1 3 5 6 9 o lf the expression exp depends on a variable say 1 Table expimn produces a list of the values of the 0mm 2 1 3 5 6 9 expression exp as i ranges from m to n I 2 3 n50 I n37 TableiA2 1 0 5 0mm 9 017 0 1 4 9 1625 3 I161l39 A23 o The same works with two indices 7 you will get a list of lists Out61 9 W40 TableUAj39 139 139 3 U 239 3 o Matixfomexp displays exp as amatrix provided it is Gull40F 11 48 927 anestedlist o lt is possible to define your own functions in Mdfhemd ca mm A Tablei2Aj 1 2 5 j 1 2 lust use the underscore syntax f x exp where exp is some expression involving x Ou9 H4 8 6 12 8 16 10 20 In47 f x x A2 n10 MatrixFormA gt 2 Ou10lllIatrixForm Oul47 X 4 8 nILB39 fxy 6 12 8 16 0mm x y 2 10 20 0 To apply the function f either builtein like Sin or defined 0 Commands Tanepoee A Inveee A Det A T A and by you to each element of the list L you can use the come MatixRank A return the transpose inverse determinant mand Map with syntax Map f L trace and rank of the matrix A respectively 0 To compute the 71 power of the matrix A use n5o f x 3 x MatixPower An quot501 3X ln21 A 1 l 1 0 n51 L 1 2 3 4 0mm 1 l l 0 Out51 l 2 3 4 W22 MatrixFormMatrixPower13 5 n52 Map f L Ou22MalriltForg1 0mm 3 6 9 12 5 3 Last Updated September 2 2008 Lecture 2 MATHEMATICA IN 15 MIN ldentity matrix of order n is Ident ityMatrix n lf A and B are matrices of the same order AB and AB are their sum and difference lf A and B are of compatible orders AB that is a dot be tween them is the matrix product of A and B For a square matrix A CharacteristicPonnomial Ax is the characteristic poynomial detx 7 A in the variable x produced by n40 A 3 4 2 1 0mm 34 21 n42 CharacteristicPolynomial1L x om42 75 r 4 x x2 To get 39 l and 39 use quot quotH and Eigenvectors A The results will be the list contain ing the eigenvalues in the Eigenvalues case and the list of eigenvectors of A in the Eigenvectore case 1521 A 3 4 2 1 om52 34 21 n53 EigenvaluesA 0u1531 5 71 ln54 Eigenvectors Out54 2 1 71 1 26 Prede ned Constants A number of constants are predefined by Mafhema ca Pi I E 271828 Infinity Don t use I E or D for variable names 7 Mafhema ca will object A number of standard functions are built into Mafhemafi ica Sqrt E Exp E LOSE SinU ArcSinU Cos E th 27 Calculus DEfx gives the derivative of f with respect to x For the first few derivatives you can use f x f x etc ln66 D xAk x Oul66 k x hk DEfxn gives the nth derivative of f with respect to x DEfxy gives the mixed derivative of f with respect to x and y Integrate fx gives the inde nife integral of f with re spect to x ln67 IntegrateLogx x Out67 X X Log x Integrate fxab gives the de nife integral of f on the interval 11 a or b can be Infinity 00 or Infinity foo 30f5 ln72 IntegrateExp 2x x 0 Infinity l Out7 7 o NIntegrate fxab gives the numerical approxima tion of the definite integral This usually returns an answer when Integrate 1 doesn t work In75 Integratel xSinx x l 2 2 l OLIIU5 J 7 le 1 X S in X In77 NIntegratel x Sinx x l 2 Oul 7 0 414 085 o SnmEexprnab evaluates the finite or in nite sum Use NSum for a numerical approximation may Suml kA4 k 1 Infinity 4 7T 0mm 7 9 0 o DSoIve eqnyx solves given the general solution to an ordinary differential equation for function y in the variable ac In58 DSolveyquot x yx x YX x cums yx gtXCl Cosx C2 Sinx 0 To calculate using initial or boundary conditions use DSoIveEfeqncondeyx In93 DSolvey39 x yx 22 y0 1 yx x 0mm lel a 1 28 Solving Equations 0 Algebraic equations are solved with Solve Iherhex where x is the variable with respect to which you want to solve the equation Be sure to use 2 and not in equations Mafhema ca returns the list with all solutions Ina1 Solve x23 x x om31 Die 1 Xe 0 Xe 1 o FindRoot fxx0 is used to find a root when Solve 1 does not work lt solves for x numerically using an initial value of x0 In82 FindRootCos x x x 1 oma2 Xe 0739085 Lasf Updafed September 2 2008 Lecture 2 MATHEMATICA IN 15 MIN 4of5 29 Graphics 0 Plot expr xab plots the expression expr in the vari7 able x from a to wast P10Slnx x 1 3 in was 5 15 ID 25 I o PlotSDexprxabycd produces a 5D plot in 2 variables MM P103DSlnx 2y 2 x 2 3 Y 72 4 n 5 0mm o lfLisalistoftheformL x1y1x2y2 xmyn you can use the command ListPlot L to display a graph consisting of points 1111 xmyn lnllll LTab1e1 1A2 1 0 4 OmHH 00112439416H mm L15P10L own 21 210 Probability Distributions and Simulation 0 PDF distrx and GDP distrx return the pdf pmf in the discrete case and the cdf of the distribution dist in the variable x distr can be one o 7 NormalDistributionEm5 7 ExponentialDistributionEl 7 UniformDistributionab 7 BinomialDistribusion1113 and many many others see PDF and follow various links from there 0 Use ExpectedValue exprdi5trx to compute the expec7 tation where expr is the expression for the func7 tion f in the variable x In23 dis tr PoissonDistributionM Out23 PoissonDistributionM In25 PDFdistr x ea 1 amps In27 ExpectedValuexA3 distr x 0u27 A 3 A2 A3 0 There is n mmand for the generating func7 ion bu you n ge 1 by com uting te char7 acteristic function and changing the variable a bit CharacteristicFunctionEdistr I Log 5 In22 distr PoissonDistributionM Out22 PoissonDistributionM In23 CharacteristicFunctiondistr I Logs om23 e 1 A 0 To get a random number unifomly distributed between 0 and 1 use Randomkeal E A uniformly distributed ran7 dom number on the interval 11 can be obtained by Randomkeal ab For a list of n uniform random num7 bers on 11 Write Randomkeal abn In2 RandomReal 0mm 0168904 Ins RandomReal7 9 0mm 783027 n 397 RandomReal0 l 3 Oul5 0368422 0961658 0692345 0 lf you need a random number from a particular continw ous distribution normal say use Randomkeal distr or Randomkeal distrn if you need 71 draws 0 When drawing from a discrete distribution use Randomlnteger instead 0 lf L is a list of numbers HistogramEL displays a histogram of L you need to load the package Histograms by issuing the command ltltHi5togram5 before you can use it Last Updated September 2 2008 M 362 Probability Lecture Notes August 11 157 2008 Sasha Kocic 63 Sums of independent random variables X and Y are independent continuous random variables with probability density functions fX and fY Fxyltagt P X Y lt a Wafxltzgtfyltygt dandy 11yfxzfyydmdy 11yfxdx fYydyE7Fxaiyfyydy Thus Fxya is the convolution of FX and fy fXYa iFXJrYW 1 FXW yfYydy 1 fxa Micawiv Thus fxya is the convolution of g and fy EX 3a Let X and Y be independent random variables each uniformly distributed over 01 Find fxy 1 for 0 lt 71 lt1 fXW lei9 0 otherwise mm W M 7 yfyydy For 0 lt a lt1 1 a fXya fXW WdZ J dy a 0 0 For 1 lt a lt 2 1 1 mm fXaydy dy 27a 0 ail Therefore a Ogagl fxya 27a l a 2 0 otherwise If Xi 17 7n are normally distributed independent random variables with expectation values lul and variances 012 then 211 Xi is a normally distributed random variable with expectation value 21 W and variance 21 Tl2 Proof Consider two independent normally distributed random variables X and Y X has mean 0 and variance 02 Y has mean 0 and variance 1 We wish to nd the convolution fxya but consider rst the integrand f lt gt f ltgt 6 L a 7 7 X y yy 27m 27139 The exponent on the right hand side is a 7 202 v2 a2 2 v2 v 202 2 7 202 202 202 2 a2 1lt11gt 2 2ay a2 a2 7777 i y 7 202 2 02 02 1 U4 1 2 04 1 2 2 L2la21lt1i 2a2 2a41 2 a2 3quot a21 ma 7 24 My e112y7 27m Therefore7 mm Mai yfyydy 1 00 a2 a 2 feizowz e 1712y1o2 dy 27m 00 1 a2 W V 27rx1 02 Now consider the two variables above X1 and X2 We can write X1 1 X2 M2 X1X202lt7 gta1pz 02 02 Both terms within the parentheses contribute 0 to the mean The rst term in the parenthe 2 ses contributes to the variance and the second contributes 1 to the variance Therefore7 2 2 the rst term in the previous equation contributes 0 to the mean and 0 1 of 03 2 to the variance The proof can now be completed by induction for all n EX 3e X and Y are independent Poisson random variables with parame ters 1 and 2 X Y is a Poisson random variable with parameter 1 A2 EX 3f X and Y are binomial independent random variables with mp and m7p X Y is binomial with n m7p Conditional distribution discrete case Recall that PEF P provided that PF gt 0 If X and Y are discrete random variables7 it is natural to de ne the conditional probability mass function pxtltztgtPtxwtW655 FXYly P X S MY y pri aly agm If X and Y are independent7 then PXiYWy PX95 Conditional distribution continuous case If X and Y are continuous with joint probability density function m y7 then the condi tional probability density of X given that Y y is Jammy 365 Motivation f Hd fm7ydmdy Px X zdz7y Y ydy z m m y fyydy Py Y ydy Pz X xdzly Y ydg P X e AiY y AinYltlyd EX 5b Find P X gt1 Y y when 35w ay y 0ltmltoo0 ltyltoo 0 otherw1se fat y 7 Properties of expectations Recall EX 2m zpz and EX zfmdz co lfPa X b1 a a EX b 72 Expectation of sums of random variables Prop IfX and Y are jointly distributed with pmy or fxy then EtcXim Ezramitten and EigltXiYgti gltziigttltmiigtdmdi y w quot0 quot0 Proof Assume gX7 Y 2 0 EgXYOOPgXYgttdtOO fiiydydiidt 0 0 zygzygtt AA M waWmAA mM mwm a The rst equality used is Lemma 21 One particular result EmmmmEm EX 2c The sample mean X17Xn are independent and identically dis tributed with Ia and distribution F they are said to constitute a sample from the distribution Find ElX where X 21 is the sample mean EX 2e Expectation of a binomial random uariable X is binomial X X1 X2 X where each Xi is Bernoulli random uariable EXi1p017pp SO7 EXnp EX 21 A random walk in the plane Step one unit at a random angle uniformly distributed between 07 2 Find ED2 where D is the distance from the origin in n steps Let X cos 0i and Y1 sin 01 Then i1 Xi2 t Y ZZ XiXi wt 1 iti n Z 2 cos 0139 cos 0739 sin 0139 sin 0739 i j n i ED2 it since the 01 are independent and E cos 0i O EX 2d Boole s inequality For n euents A1 An we have X 7 1 if A occurs 1 0 if A doesn t occur then X Xi is the number of events that occur Let Yi 1 ifX21 7 0 otherwise Thus X 2 Y XiY 2 0 EX 71 2 0 EX 2 EY ZEPQ 2PM EY P at least one of the As occurs P Aigt i EX E EX Ten hunters are waiting for ducks to fly by When a flock of ducks flies ouerhead the hunters re at the same time but each chooses his target at random independently of the others If each hunter hits its target with probability p what is the expected number of ducks that escape unhurt when a flock of size 10 flies ouerhead 10 1 if the ith duck escapes X1 7 0 otherwise 2 7 1 Let X 2Xi then gt EX 10 17 gt10 73 Moments of the number of events that occur Defn Moments The kth moment of a random variable X is E Given the events Al7 7An7 let X be the number of events that occur Find indicator variable I 7 1 if Ai occurs 1 7 0 if A doesn t occur X ill i1 EiX E Zn Em 2PM i i1 i1 The number of pairs of events that occur is Wm ilti Then E 2 7 1117 giant1 X X 71 E X2 7 EX 2 Z PAl Aj ilti E X2 EX 2 mei Aj ilti Analogously7 Z PA1A2Ak i1ltmltik EX 3a Moments of binomial random variables Let Ai be the event that trial i results in a success with probability p PAlA7p2 Eilt gtirlt3gtr 74 Covariance variance of sums and correlations Prop 41 IfX and Y are independent then Proof 00 00 00 00 E 9X hm dd My m d dddd had My dd dltdgt Md dd a 00 00 00 00 Defn The covariance between X and Y is CovltX7Ygt E M EXY 7 EM Eva1 Eva Em IfX and Y are independent then COVX7 Y 0 The converse is not true however Defn The correlation ofX and Y is COVXY pltX7 Y VarX VarY 71 pXYlt1 p710rp1 ifdndonlyz39fYabX 77 Moment generating functions Defn The Moment generating function M R a R of a random variable X is de ned by Mt E 69X Given Mt we have MO EX M O E X2 or in general M 0EX n21 EX 7a X is a binomial random variable with mp 71p EX2 nn71p2np VarX np1p variable with parameter x 00 t 7A MtEetX Zy eAltet71 n0 EX 7d Normal distribution 2 Mzt et7 ltwgt2 t MXte 2 Limit Theorems 82 Chebyshev s inequality and the law of large numbers Prop Markov s Inequality IfX is a random variable that takes only nonnegative values then Va gt 0 EX PX2a a ProofLet 7 1 ifXZa 7 0 otherwise Then X EX EX 7 a EI l HPXZa D a a a Prop Chebyshev s Inequality If X is a random variable with mean a and variance 02 then for any value a gt 0 2 a PHX MEMS Proof Apply Markov s inequality with a a2 on X 7 u2 7 2 Puxeirznzisw EX 2a The number of items produced in a factory during a week is a random variable with mean 50 a Estimate the probability that this week s production will exceed 75 b If the variance is known to be 25 estimate the probability that this week s production will be between 40 and 60 a EX 50 2 lt 7 i 7 PXgt75 75 75 3 b 25 1 7 gt lt 7 7 PX 50 710 102 4 PX75Olt10217 i l 4 Prop 23 If VarX 0 then P X EX 1 Proof By ChebysheV s inequality 1 P lXilalgt7 0 forallngt0 71 1 1 0 lim PlX7algt7Plim lXiulgtiPX7 i D 71HOO 77 714700 71 Th 21 The weak law of large numbers X17X27 are a sequence of independent iden tically distributed random variables with E a for all i Then for all e gt 0 7111 26 H0 asnaoo 71 Proof Assume additionally that all of the random variables have the same variance 02 71 EX1Xn 71 2 M and Var lt X1 X 1 71 It follows from ChebysheV s inequality that 2 PM7M ZE D 71 71 E The weak law states that Xn a a in probability7 when n A gt07 ie for all e gt 07 7 M lt e 1 The central limit theorem Th 31 The central limit theorem X17X27 is a sequence of independent identically distributed random variables with mean a and variance 0392 Then the distribution of X14rtn7lu a n tends to standard normal as n gt 00 That is 7 1 x2 PM a ai e sz asnaoo 0 271 700 EX 3b The number of students in a psychology course is a Poisson random uariable with mean 100 The professor has decided that if the number of students enrolling is 120 or more he will teach two sections If it is less than 120 he will teach a single section What is the probability that he will teach two sectionsf2 00 7 7100 100V PX 2 120 7 2 e f i120 P X 2 120 P X 2 1195 continuity correction 7P X7100gt11957100 7P X71001gt195 7 100 100 7 1100 39 Recall that the Poisson variable with mean 100 can be seen as the sum of 100 independent Poisson variables with mean 1 Also7 the variance of a Poisson random variable is the same as its mean Applying the central limit theorem7 we obtain exact solution 17 195 EX 3c If 10 fair dice are rolled nd the approximate probability that the sum obtained is between 30 and 40 7 7 Let X be the value of the ith die 1 2 710 EXl and VarXi E X3 7 Em2 The probability is 295735 X735 405735 lt lt 7 E 12 12 12 184 7 1 z z 2 gt10 0692 P295 X g 405 P EX 3d Let Xi fori 1 10 be independent random uariables uniformly distributed ouer 01 Calculate an approximation to P Xi gt 6 and VarXl 10 PZXlgt6P 21Olimgt65 zli gtltmgtz01367 Th 32 Central limit theorem for independent random variables Let X1X2 be a sequence of independent random uariables with u and 02 VarX39 If a the Xi are uniformly bounded ie for some M P lt M 1 for all i and b 2 2392 00 then P Zi1Xi Milt a 221 71 2 i H gtaasnaoo The strong law of large numbers Th 41 Let X1X2 be a sequence of independent and identically distributed random uariables with mean u The with probability 1 X1X2 Xn 71 gt u as 77 H 00 The strong law states that Xn a a strongly almost surely when n a 00 ie P lim Xn a 1 Hoe An application Consider a sequence of independent trials E appears with X 1 if E occurs on the ith trial 1 0 if E does not occur on the ith trial With probability 1 the law of large numbers implies X1 Xn 771 gt EX PE 85 Other inequalities Defn One sided ChebysheV inequality Giuen X with EX 0 and VarX 02 then for allagt 0 02 X Z a S m Corrol 51 If EX a and VarX 02 then for all a gt 0 02 P X gt lt 7 7Ma 02a2 Defn Chernov bounds PX 2 a e mMa Vtgt 0 PX a e mMa Vtgt 0P Defn Jensen s inequality If fX is a conuex function then ElfXl Z fEle Lecture 6 RANDOM WALKS ADVANCED METHODS 1 of 7 Course M562K Stochastic Processes I Term Spring 2008 Instructor Gordan Zitkovic Lecture 6 RANDOM WALKS e ADVANCED METHODS 61 Stopping times The last application of generating functions dealt with sums evaluated between 0 and some random time Ni An especially interesting case occurs when the value of N depends directly on the evolution of the underlying stochastic process Even more important is the case where time39s arrow is taken into account If you think of N as the time you stop adding new terms to the sum it is usually the case that you are not allowed able to see the values of the terms you would get if you continued addingi Think of an investor in the stock market Her decision to stop and sell her stocks can depend only on the information available up to the moment of the decision Otherwise she would sell at the absolute maximum and buy at the absolute minimum making tons of money in the process Of course this is not possible unless you are clairvoyant so the mere mortals have to restrict their choices to so called stopping times De nition 61 Let XMHENO be a stochastic process A random variable T taking values in N0 U 00 is said to be a stopping time with respect to Xn if for each n E No there exists a function G RR 4 0 1 such that nENO 1T n G X0X1HXn for all n E No The functions G are called the decision functions and should be thought of as a black box which takes the values of the process XHHEN observed up to the present point and outputs either 0 or 1 The value 0 means keep going and 1 means stop The whole point is that the decision has to based only on the available observations and not on the future ones Example 62 l The simplest examples of stopping times are non random deterministic times Iust set T 5 or T 725 or T no for any no 6 N0 U no matter what the state of the world a E Q is The family of decision rules is easy to construct l n no Gnxox1xn O n no Decision functions G do not depend on the values of X0X1 Xn at alli A gambler who stops gambling after 20 games no matter of what the winnings of losses are uses such a ruler Probably the most well known examples of stopping times are rst hitting times They can be de ned for general stochastic processes but we will stick to simple random walks for the purposes of this example So let Km 22 05k be a simple random walk and let Tl be the rst time X hits the level 1 E N More precisely we use the following slightly non intuitive but mathematically correct de nition E Tl minn E No Xn l The set n E No Km 1 is the collection of all time points at which X visits the level L The earliest one the minimum of that set is the rst hitting time of L In states of the world a E Q in which the level 1 just never get reached ie when n E No Km 1 is an empty set we set Tla oltgti In order to show that T1 is indeed a stopping time we need to construct the decision functions G n 6 NOT Let us start with n 0 We would have T1 0 in the impossible case Last Updated October 9 2008 Lecture 6 RANDOM WALKS ADVANCED METHODS 2of7 X0 1 so we always have 6000 0 How about 11 E N For the value of T1 to be equal to exactly 11 two things must happen a Km 1 the level 1 must actually be hit at time n and b Xvi l Xmg l X1 1 X0 l the level 1 has not been hit before Therefore 1 O 2 for a particular trajectory of a symmetric simple random xo lx1 lxn1 lxn l Gnx0x1xn otherwise The hitting time T2 of the level 1 walk is depicted below 6 4 How about something that is not a stopping time 7 Let no be an arbitrary time7horizon and let TM be the last time during 0 no that the random walk visits its maximum during 0n0 see picture above If you bought a stock at time t 0 had to sell it some time before no and had the ability to predict the future this is one of the points you would choose to sell it at Of course it is impossible to decide whether TM 11 for some n E 0 no 7 1 without the knowledge of the values of the random walk after n More precisely let us sketch the proof of the fact that TM is not a stopping time Suppose to the contrary that it is and let G be the family of decision functions Consider the following two trajectories 01 2 5 n 71 n and 012 5 H 7111 7 2 The differ only in the direction of the last step They also differ in the fact that TM 11 for the rst one and TM 11 7 l for the second one On the other hand by the de nition of the decision functions we have 1TM n71 GnilOCOuwxrkl The right7hand side is equal for both trajectories while the left7hand side equals to O for the rst one and l for the second one A contradiction 62 Wald s identity 11 Having de ned the notion of a stopping time let use try to compute something about it The random variables SnnEN in the statement of the theorem below are only assumed to be independent of each other and identically distributed To make things simpler you can think of SnnEN as increments of a simple random walk Before we state the main result here is an extremely useful identity Proposition 65 Let N be an N07Valued random Variable Then Em rm 2 u k l Last Updated October 9 2008 Lecture 6 RANDOM WALKS ADVANCED METHODS 5 of 7 Proof Clearly MN 2 k 27gt MN j so note what happens to the indices when we switch the sums ZPW 2 1e ZZPW 1 k1 lei39Zk oo 739 00 221 fl ZJ PW Jl 739 1k 1 f 1 1EN1 D Theorem 64 Wald39s Identity ll Let SnnEN be a sequence of independent identically distributed random Variables with lEHQH lt 00 Set X 2511 n 6 N01 16 1 If T is an thENOestopping time such that lET lt 00 then lElXTl lE 1llElTl Proof Here is another way of writing the sum 21 1 5k T 00 Z 5k Z Ski egm k 1 k 1 The idea behind it is simple add all the values of 16 for k g T and keep adding zeros since Skl eg O for k gt T after that Taking expectation of both sides and switching E and 2 this can be justi ed but the argument is technical and we omit it here yields T 00 El 51d Z El1 kg kl 64 k 1 k 1 Let us examine the term E k1k f in some detail We rst note that 71 1l T 1 1kgtT 1 1 e712T 1ZMT 0 f 0 Therefore 71 Elgki egml ElSkl 21mm l f 0 By the assumption that T is a stopping time the indicator 1g can be represented as 1 GlXo 1 1 1 Xi and because each X is just a sum of the increments we can actually write 1g as a function of 511 1 1 5 only say 1g Hl 111 1 By the independence of 511 11 5 from 16 because j lt k we have Elgki l ElSkHlt mrSll ElSAlElHKSLMISll ElSklElMT ElSklPlT fl Therefore eel El k1k Tl El zel ZElSAlPlT Jl El klPlT 2 kl lE 1llP lT 2 kl O f where the last equality follows from the fact that all le have the same distribution Going back to 611 we get T 00 00 EM E12516 213mm 2 kl El 1l ZlP lT 2 k1 13151113111 k1 k1 k1 Last Updated October 9 2008 Lecture 6 RANDOM WALKS ADVANCED METHODS 4 of 7 where we use Proposition 6 for the last equality D Example 65 Gambler39s ruin problem i A gambler start with x E N dollars and repeatedly plays a game in which he wins a dollar with probability and loses a dollar with probability He decides to stop when one of the following two things happens 1 he goes bankrupt iiei his wealth hits 0 or 2 he makes enough money iiei his wealth reaches some level a gt If The classical quotGambler39s ruin problem asks the following question what is the probability that the gambler will make 1 dollars before he goes bankrupt Gambler s wealth WMHEN is modelled by a simple random walk starting from at whose increments 5k Wk 7 W164 are coin7tossesi Then Wm 95 Xn where Km 22 05k 11 6 NOT Let T be the time the gambler stops We can represent T in two different but equivalent ways On the one hand we can think of T as the smaller of the two hitting times Tx and Ta x of the levels 795 and a 7 x for the random walk XMHENO remember that Wm 95 Xn so these two correspond to the hitting times for the process WMHENO of the levels 0 and a On the other hand we can think of T as the rst hitting time of the two7element set 7xa 7 x for the process X nENOi In either case it is quite clear that T is a stopping time can you write down the decision functions We will see later that the probability that the gambler39s wealth will remain strictly between 0 and a forever is zero so MT lt 00 1 Moreover we will prove that lET lt 00 What can we say about the random variable KT 7 the gambler39s wealth minus I at the random time T Clearly it is either equal to 7x or to a 7 x and the probabilities p0 and pa with which it takes these values are exactly what we are after in this problem We know that since there are no other values XT can take we must have p0 pa 1 Wald39s identity gives us the second equation for p0 and pa lElXTl El illElTl 0 ETl 0 so 0 lElXTl Potix Paw I These two linear equations with two unknowns yield a 7 x x a 39 pa E It is remarkable that the two probabilities are proportional to the amounts of money the gambler needs to make lose in the two outcomes The situation is different when p if P0 65 The distribution of the rst hitting time T1 631 A recursive formula Let XnnENO be a simple random walk with the probability p of stepping up Let T1 min 11 E No Km 1 be the rst hitting time of level 1 l and let pnnEN0 be its pmf iiei pn lP T1 n n 6 NOT The goal of this section is to use the powerful generating7function methods to nd pnhENd You cannot get from O to l in an even number of steps so pgn O n 6 NOT Also p1 p 7 you just have to go up on the rst step What about 11 gt 1 In order to go from O to l in n gt 1 steps and not beforel the rst step needs to be down and then you need to climb up from 71 to l in n 7 1 steps Climbing from 71 to l is exactly the same as climbing from 71 to O and then climbing from O to 1 If it took j steps to go from 71 to 0 it will have to take 11 7 l 7j steps to go from 1 to 2 wherej can be anything from 1 to n 7 2 in order to nish the job in exactly 11 7 1 steps So in formulas we have lP T1 n 2 62 q ZlP exactly steps to rst hit 0 from 71 and quotexactly 11 7 l 7 steps to rst hit 1 from 0 l f 1 Last Updated October 9 2008 Lecture 6 RANDOM WALKS ADVANCED METHODS 5 of 7 Taking j steps from 71 to O is exactly the same as taking j steps from O to 1 so lP exactly j steps to rst hit 0 from 71 lP T1 j pquot By the same token lP exactly n 7 1 7j steps to rst hit 1 from 0 lP T1 n 7 1 7 pn1u Finally I claim that the two events are independent of each otherti Indeed once we have reached 0 the future increments of the random walk behave exactly the same as the increments of a fresh random walk starting from zero 7 they are t of gr 1 a knowledge of everything that happened until the moment the random walk hit 0 for the rst time does not change our perception and estimation of what is going to happen later 7 in this case the likelihood of hitting 1 in exactly n 7 7 j steps This property is called the regeneration property or the strong Levy property of random walks More precisely but still not entirely precise we can make the following claim Let XHHENO be a simple random walk and let T be any N07valued stopping timer De ne the process YnnENO by Yn XTM 7X Then YnnENO is also a simple random walk and it is independent of X up to T1 In order to check your understanding try to convince yourself that the requirement that T be a stopping time is necessary 7 nd an example of a random time T which is not a stopping time where the statement above fails We can go back to the distribution of the hitting time T1 and use our newly7found independence together with 612 to obtain the following recursion n72 Pn QZP1Pn713971111 gt 11 P0 01 P1 p 615 139 1 615121 Generatingfunction approach This is where generating functions step in We will use 615 to derive an equation for the generating function Ps 20 O pkski The sum on the right7hand side of 615 looks a little bit like a convolution so let us compare it to the following expansion of the square Ps2 00 16 P15 1 Pipk7ilsk k 0 1 0 The inner sum O pipki needs to be split into several parts to get an expression which matches 65 k 71 le172 meka popze Zp1pki1 Plepo P1Pk17171 q lpkw k 2 2 1 0 V0 1 1 V0 1 1 Therefore since the coef cients of 1352 start at 52 we have 00 00 qSPS2 qSZqilpkHSk Zpk15k1 P15 PS 16 2 k 2 which is nothing but a quadratic equation for R It admits two solutions for each s 135 1i 117 4qu 2qs One of the two solutions is always greater than 1 in absolute value so it cannot correspond to a value of a generating function so we conclude that 7 V 7 2 135 for SI g 2qs 21pq 1A demanding reader will object at this point since my de nitions of the two events are somewhat loose I beg forgiveness Last Updated October 9 2008 Lecture 6 RANDOM WALKS ADVANCED METHODS 6 of 7 It remains to extract the information about pnnENO from P The obvious way to do it is to compute higher and higher derivatives of P and s 1 There is an easier way though The square root appearing in the formula for P is an expression of the form 1 I and the generalized binomial formula can be used a 00 a a a aa7la7kl 1x 1 ltkgtx where ltkgt f kENaER Therefore 1 L 00 i 12 2k 00 2k71i k 71 12 1 2q572q5 2qltk gtlt74pqsgt 2plt4pqgt 71 k 6 4 andso 1 12 p2k71 Z4PQk 1lk71lt k gt1P21e72 0 k E N This expression can be simpli ed a bit further the formula for 112 evaluates to 12 Dbl l l 2167 k 4164 k k 7 2 2 21675 p2k71 quk lgltk72 Thus This last expression might remind you of something related to the re ection principle And it isl Can you derive the formula for pgk1 from the re ection principle How would you deal with the fact that the random walk here is not symmetric 655 Do we actually hit 1 sooner or later What happens if we try to evaluate PM We should get 1 right In fact what we get is the following PM 17v174pq lilpit 1 p2 2q 2q 3 p lt Clearly PM lt 1 when p lt q The explanation is simple 7 the random walk may fail to hit the level 1 at all if p lt q In that case PM 20 Opk lP T1 lt 00 lt 1 or equivalently lP T1 00 gt 0 It is remarkable that if p the random walk will always hit 1 sooner or later but this does not need to happen if p lt What we have here is an example of a phenomenon known as criticality many physical systems exhibit qualitatively different behavior depending on whether the value of certain parameter p lies above or below certain critical Value p pc 1 2 l 2 634 Expected time until we hit 1 Another question that generating functions can help up answer is the following one how long on average do we need to wait before 1 is hit When p lt lP T1 00 gt 0 so we can immediately conclude that lET1 00 by de nition The case p 2 1 is more interesting Following the recipe from the lecture on generating functions we compute the derivative of Ps and get PS 2p 7 l 7 Ml 7 4pqu Ml 7 4pqu 2q52 When p we get l l 7 1 7 52 PEP E W T 0 and conclude that lET1 00 or p gt 5 the situation is less severe lim P s 5 1 P t q Last Updated October 9 2008 Lecture 6 RANDOM WALKS ADVANCED METHODS We can summarize the situation in the following table PT1lt 00 Em p lt 3 00 p 1 00 p gt 1 piiq 7of7 Last Updated October 9 2008 s6 Eda 635quot W 49m gt Csm wm 0W4 e w am x f mm k0 WK W CAN0i B q 3quot N Mb quot39 at MZM PM 33quot x mm quotA hwy hum WW w x I M f F4quot 9 0L VRU 7 r 7 Mr egg fax v f 7 Exem g wq 7 op Q a I 0543 w MM 39 07 wwv QR x M a L4 4543fftgtlt1 3 gal 9a m g a 1ng 9531 xng C x gh v kc53 gt o PM 2 4 4amp2 Jxk 133 lt x j M u x t l 39 KKLE X L J l N W r X011 quot C I Fr 9 WW 1x 53 731 Ex gm 2 025601 n V I 0 wk Naming 39 7 gm x i Rf as M nd 5 1 7 Ewrx 3 1 17 quot E UL 13 i 413 a 1 aim om aw i GEAM Em 7 85M 5 or L ayuL q XE g WC M COAUuLg VLuobC au 7quot 39 q cw Casi 219 W 9M U ZUCV E9 bi igxs 7f 19 M R W 2 954392992 KS if C E Qz i g go39QYga fidS r 392 0 7531530 0 3 M W gt N K H t X x 60 EQQgFC39BVE 39 39 7 gt Eva 96 l any U 1 v xv P rgtaoirgtH gt QXQU T gt I 25mg l S fstp SA Q 39 W cm erij 05 a 401W gas f V quot quot Luwwt gf fcfi dig l 5 T jquot x 2 w 43 01 it V Kn6699M 6723 39 Lev ng 7 LR g Ms 5 Lu 4 as W LCMk L Mirx1 gt H 4 7w A 0554 L lt E 9L W aw 30 Q 9429 h 2 va fo 39 4quotth MFRx T l gt P43 u a k ECKfF fpf gt0 Etcwa wmc L gtlt 33 er Cmdmgw i f sh 2amp5 7 XRLK KKK XLK 31ltk lt 7 13 x31 CK ER eo EEEAXQQCWI 0 Er e92 ya 34 fofw xu 596 1 EN 1 ECm mer 3 2 7 39 4 L Al 39 cr39n Z39I T l 9 9M m ff Mimi 52 S EGG gt am 4 7 mik 39 m m M 4 mi j OASich QKH m ri e A990 J V MA w weak 45 41 3 wywew v 397 b 72 we W M397Bs w1ctv rsmr ed quot x o WV M 7 mar k WM w we 1 kew a a ewte M h MRS W3 V 0an 0ccm7IV Ca 39 gram 4Mquot quot 19 MM mm A n3 1 at If u T 9 3 59 L1 s H w S a w E i 3 M 362K Fall 03 RANDOM VARIABLES In calculus you learned things like If an object is tossed straiglit up from initial lieiglit lig Witb initial velocity v0 tben its lieiglit at time tseconds after being released is lit 110 vat gt2 If you read the book carefully andor listened carefully in lecture you might have heard a quali cation like quotif you ignore air resistancequot In reality air resistance can be important in most cases it results in a quotterminal velocityquot which the object cannot exceed However nding an equation of motion taking air resistance into account presents real problems since the air resistance depends on the mass and also the shape of the object And if the object is a parachutist their shape may be changing as they fall In fact can you really know the initial velocity exactly The initial height And what if there s wind As this example shows in real life we often don t have deterministic that is exact formulas like ht ho vot gt2 Instead we have to deal with appr0Ximations and uncertainty So we need stocliastic that is probabilistic methods A key idea in probabilistic methods is the idea of random variable The height of our real falling object can be considered as a random variable we may be able to find a formula taking into account air resistance that will give an approximate description of the object39s motion but there are still uncertain factors such as wind and our ability to measure the initial velocity or determine the effect of air resistance exactly One way to define a random variable is a variable tbat depends on a random process To help understand this idea and the related idea of random process here are some examples 1 Toss a die and look at what number is on the side that lands up Tossing tbe die is an example of a random process tbe number on top is tlie random variable 2 Toss two dice and take the sum of the numbers that land up Tossing tlie dice is tbe random process tlie sum is tbe random variable 3 Toss two dice and take the product of the numbers that land up Tossing tbe dice is tlie random process tlie product is tbe random variable Examples 2 and 3 together show that the same random process can be involved in two different random variables 4 Randomly pick a UT student and measure their height Picking tlie student is tlie random process tlieir lieiglit is tbe random variable 5 Randomly pick a student in this class and measure their height Picking tlie student is tbe random process tlieir lieiglit is tbe random variable Examples 4 and 5 illustrate that using the same variable height but different random processes gives different random variables 6 Measure the height of the third student who walks into this class Man39s the random process In Example 5 the random process was done deliberately in Example 6 the random process is one that occurs naturally Can you explazn 120de the dr rent random processes make these two random variables d1 lerent 7 Toss a coin and see whether it comes up heads or tails T ossmg the com Is the random process the variable 1s heads or ta1391s This example shows that a random variable doesn39t necessarily have to take on numerical values However in this class we will mainly study random variables that take on numerical values Many nonnumerical random variables can be turned into numerical random variables In this example we could say the variable has the value 0 if the coin shows heads and has value 1 if the coin shows tails 8 The time it takes for an IF shuttle bus to get from 45Lh and Speedway to the Dean Keaton stop is a random variable Whoa you may say where39s the random process I39ve given this example this way precisely because random variables are often de ned in this way The random process here is quotimplicitquot at least for those used to defining random variables in this way What is really meant is quotRandomly pick an IF shuttle bus run and measure the time it takes to get from 45th and Speedway to the Dean Keaton stop quot So the random process 1s pickmg the shuttle has run and the random variable Is the t1me measured 9 a The height t minutes after its release of an object tossed straight up from initial height ho with initial velocity v0 b The height we measure t minutes after its release of an object tossed straight up from initial height ho with initial velocity v0 c The formula we obtain taking everything we can into account for the height t minutes after its release of an object tossed straight up from initial height ho with initial velocity v0 Howare these three random variables d1 lerent What are the random process Involved In each Hints They re compleX These examples should give you some feel for what a random variable is but they only make a start at answering the question What is a random process In particular what do we mean by quotrandomquot My dictionary gives as the rst de nition of random quotHaving no speci c pattern or objective haphazardquot This is NOT the meaning of random that we will use in this class We will use a more technical meaning that you will come to understand gradually For starters A We consider a process such a tossing a die or a coin to be random B When we talk about randomly picking a UT student or randomly picking an IF shuttle run we mean using a process that gives each possible UT student or IF shuttle run an equal chance of being chosen We can imagine but only imagine for example numbering all UT students 1 2 3 etc and having a huge die with as many sides as there are UT students Tossing the die and taking the student whose number came up would be a way of randomly picking a UT student In practice random selections such as this are made by replacing our imaginary huge die by a computer program called a pseudo random number generator or random number generator for short that is designed to give essentially the same result These examples however might lead one to believe that a random process always has to give every outcome and equal chance of happening This is not the case We could imagine for example a quotloadedquot or quotbiasedquot die that was made so that one ofthe sides came up more frequently than the others We would still consider tossing this die to be a random process One important aspect ofa random process is that although there may be and usually is a pattern in the long run there is no way of knowing in advance the result of one occurrence of the process In other words the result of one occurrence ofa random process is uncertain but we can at least in theory say something about the longterm behavior thatis what happens over many many occurrences of the process Examples of random variables that people are interested in studying include the following using the convention mentioned in example 8 o The time to breakdown of a computer The yield per acre ofa eld of wheat The birth weight of a child The length of time a person lives The DowJones Index The concentration of ozone in the air How much time is required to read a disk sector into main memory One way to give information about that is to say something about the longterm behavior of a random variable is to show its distribution To begin we will talk about what are called empirical distributions That means that they are based on collecting some actual examples of values of the random variable and plotting them in a histogram Example Here is a histogram that gives the empirical distribution of the heights of students in a particular beginning statistics class Figure l The rst bar shows the number of people whose height is between 58 and 595 inches inclusive the second bar the number of people whose height is between 60 and 615 inches and so on Please note There are lots of different conventions for drawing histograms so please use caution in interpreting them Based on this histogram and What you knowahout peop1es heights What would you guess the proportion ofmales and females in this class to he 7 How would you expect the histogram to differ iftha t proportion were different Although frequeneyhistograms such as the one above are used a lot in statistics for probability we usually use a densityhistogram o In a frequeneyhistogram the bar above each interval shows the number of values in the interval 0 In a densityhistogram the bar above each interval shows the proportion of values in the interval Here is a density histogram for the heights of students in the same class 010 005 I I I I I I I I I 59 61 63 65 67 69 71 73 75 ht Figure 2 What is the same about the two histograms 7 Why be on i CWQCHWLS w Kim 9 T ww Miaquot wow Sc Cgfx v R Cple ECgoltgt3 cm d 4 W EWW a gj mx nga g xz wcgy A 39 E C61 32lt93 if wwan mgr gt6 s 1 CNN 97 quot7ff Cr WW T E am quot xxx 14 waw FEMwwvwwwxw h Hm wrl 66 CAEXEO 9 9S QQWfEG quot156 765 aEQVH 39 E rhci if Sag tel E x 6 r 0 2f W a EC quot992 gt415 7 tquot 6 1 HWC 5 i 3 0 5 g 2 CN QQ T N 057 CM KY3 7 v f sham 6C8JCKJXQA M N 7 7 d0 cov lta f zgt CW N 39f CF cwCSlt 313 EC KAW ffquot 6558 A 7 1quot a ECKT17 9 EBlt7 EEiiqlt vgtlt VHF 7 49 W ij 7 Virrr39i iljA 77 7 E CZ375ampampamw 3915V 7 hmmmm jw w AMAM Cox C Tira t g i 339 a 7 E 2951 19329 9 rm quotTIL v33 23eAtlt1lt mltgm 3 21 v A 4 vx fu Wl mvgt m VWWW VWHWHH V7 W 4 H MHWW WW 4607 f xw r i a 73 we 03 Z W r 12 wxf K A M LI Z 1 g 1 Q m 63 L51 Xv vxr uc n kind wswiWcA 7243 35 g ewtfa tefd Wang cm dam G Z L 6 W G mmmvampng hfiwampw m V k 2 V dc Maig sx f Q14 A 2 quot 2 MV N 3 WM ww 25 s t r gt2 Qltawm 39nx39fii mh Zn gm 1 23001 3993 r ser C511 3 f f M Wgtw gtlt f 271 w e m 451 ng 4 7 t 6 Cucceu 39 gtltgt 9 511m quotItSQ K1 D 9a frmm Ck 1812341 15 hLm Dvx M Pea e gawk like my 42gt Vina5m OM lt5391gtu39cu V n x M Pesgu 0amp7 E I qumg 41K gum M WWW A MewQ 1231 am rcd S Q Tatle n5 he 24 WM gt8 5 4 W3 r O wwman M Q 7 EM 1 E CL 5 k wax E CD 13 Ci ii w WVquot 1 x MC 11 awEICk WM 39 quot quot399 Z V3I VI VI W935 quot Am L A ECG Nif vDM n5 I N WF MI 2 39 Cow a x a 1 3 Q S V39 EVCY3 2 Emax CwCYaIO m 39 O y Aw Aa lt 3979 4 N M 4 A a A 1 2 l L 2 w WW MM L 2v w 7 v wag W NE 9 39 7U 7quot j V WE Weeewgdeg 54 sf Wm 23 Lexi f rgt CoJXTl WI ltW ac0ltw7fcv f WM l 22 v75 HV 6 V Huwhwwwv h 7 v k hkf amQIQHWM og M 5amp5 5 58 snafu v flt ualtxgt3 kgt eX Tsa r w 1 k u a CM 7 6113 W 5 Z v ir f spg gt 569W 5 Kffg 7 7 HLWAMWWMW 3 tf cw gi k 23 0WMQMQ GWUMMQQ 9y g gt QanX n irvl It 3 V l 67 4 3 C 4 Ws W Lecture 8 MARKOV CHAINS 1 of 10 Course M562K Stochastic Processes I Term Spring 2008 Instructor Gordan Zitkovic Lecture 8 MARKOV CHAINS 81 The Markov property Simply put a stochastic process has the Markov property if its future evolution depends only on its current position not on how it got there Here is a more precise mathematical de nition It will be assumed throughout this course that any stochastic process XMHENO takes values in a countable set S the state space Usually S will be either No as in the case of branching processes or Z random walks Sometimes a more general but still countable state space S will be needed A generic element of S will be denoted by i or J2 De nition 81 A stochastic process XHHENO talking values in a countable state space S is called a Markov chain or said to have the Markov property if Pan1 ln1Xn inXn1 151 iX1 l1X0 to Pan1 n1an In for all n E No all i0i1 inin1 E S whenever the two conditional probabilities are well de ned ie when lP Xn inX1 i1Xo fol gt 0 The Markov property is typically checked in the following way one computes the left hand side of 8 1 and shows that its value does not depend on in1 iwg I I i1 i0 why is that enough A condition lP Xn in I I Xo id gt 0 will be assumed without explicit mention every time we write a conditional expression like to one in 81 All chains in this course will be 7 ie the quotquot quotquot IP XH1 len i will not depend on the current time n E No iiei PDQ len i lP Xm1 jIXm i for mn 6 NO Markov chains are relatively easy to work with because the Markov property allows us to compute all the probabilities expectations etc we might be interested in by using only two ingredients 1 Initial probability 10 19 i E S ago IP X0 i the initial probability distribution of the process and 2 Transition probabilities pi IP XH1 len i the mechanism that the process uses to jump around Indeed if one knows all a and all pi and wants to compute a joint distribution lP Xn inXn1 in1X0 i0 one needs to use the de nition of conditional probability and the Markov property several times the multiplication theorem from your elementary probability course to get Ipan lnw wxo to Ipan lHIXn1 151 WXO lollPXn1 in1uiX0 to Ipan iannil inilllplxnil inilr X0 l 0l PiHian anil in rmX0 l 0l Repeating the same procedure we get 0 IPan mmX0 lol Punquot X P15721571 X X P1011 X ale Last Updated October 28 2008 Lecture 8 MARKOV CHAINS 2 of 10 When S is nite there is no loss of generality in assuming that S 12 n and then we usually organize the entries of 10 into a row vector 10 MOMS anol and the transition probabilities pi into a square matrix P where P11 P12 pin p Pal Pea Pan Pm an Pnn In the general case S possibly in nite one can still use the vector and matrix notation as before but it becomes quite clumsy in the general case For example if S Z P is an in nite matrix P7171 P710 P711 P w po 1 poo p01 P171 P10 P11 82 Examples Here are some examples of Markov chains for each one we write down the transition matrix The initial quot quot is quot left because it does not really change anything I 1 Random walksi Let XHHENO be a simple random walk Let us show that it indeed has the Markov property 84 Remember rst that Km 22 15k where 16 are independent coin tossesi For a choice of i0in1 such that 0 O and he 7 he i1 we have Plxml 15an iannrl inilr 1X1 firXO lfol PanH Xn in1 inlxn iannrl inilr v Xl i1X0 l Ol Pl nl in1 inlxn iannrl in v Xl i1X0 fol Pl nl n1 lnl where the last equality follows from the fact that the increment Sn is idependent of the previous incre ments and therefore of the values of X1 X2 Xni The last line above does not depend on 1 i1 i0 so X indeed has the Markov property The state space S of Kn is the set Z of all integers and the initial distribution 10 is very simple we start at O with probability 1 so that ago E0 simple to write down nENO l and a O for i 0 The transition probabilities are p j i l Pif q j i 7 l 0 otherwise Last Updated October 28 2008 Lecture 8 MARKOV CHAINS 5 of 10 These can be written down in an in nite matrix ooooaow oooaotgw ooa ouow oa ouoow gt ouooow but it does not help our understanding much 2 Branching processes Let anneNo be a simple Branching process with the branching distribution pnhENd As you surely remember it is constructed as follows X0 1 and XHH quot1Xnk where XnkhENOkEN is a family of independent random variables with distribution pm dif cult to show that XMHENO is a Markov chain Plxml in1Xn iannrl nilH er firXO fol nENOi It is now not very Xquot Plzxnk in1Xn iannrl nilH er firXO fol k1 Plzxnk in1Xn iannrl nilH er firXO fol k1 in1lr where just like in the random walk case the last equality follows from the fact that the random vari ables Xnk k E N are independent of all ka m lt n k E N In particular they are independent of XnXn1i i i X1X0 which are obtained as combinations of Xmlk m lt n k E N The computation above also reveals the structure of the transition probabilities pi i j E S No Pi Plzxnk Jl k 1 There is little we can do to make the expression above more explicit but we can remember generating functions and write 1315 Zfoo p175f remember that each row of the transition matrix is a probability distribution Thus 1315 Psl why7 where Ps Zzoopksk is the generating function of the branching probability Analogously to the random walk case we have 0 1 i 1 a 1 O i 1 I Gambler s ruini ln Gambler s ruin a gambler starts with 95 where 0 g x g a E N and in each play wins a dollar with probability p E O 1 and loses a dollar with probability q l 7 p When the gambler reaches either 0 or a the game stops The transition probabilities are similar to those of a random walk but differ from them at the boundaries 0 and L The state space is nite S 01 l i i a and the matrix Last Updated October 28 2008 Lecture 8 MARKOV CHAINS 4 of 10 P is therefore given by 1 O O O O O 0 q 0 p O O O O 0 q 0 p O O O O 0 q 0 O O O P 1 O O O O O p O O O O 0 q 0 p O O O O O O 1 The initial distribution is deterministic 0 1 i x a 1 0 i 1 4 Regime Switching Consider a system with two different states think about a simple weather for cast rainno rain highlow water level in a reservoire highlow volatility regime in a nancial market highlow level of economic growth etc Suppose that the states are called 0 and 1 and the probabilities P01 and p10 of switching states are given The probabilities p00 1 P01 and p11 1 P10 correspond to the system staying in the same state The transition matrix for this Markov with S 01 is P00 P01 P10 P11 When p01 and p10 are large close to 1 the system nervously jumps between the two states When they are small there are long periods of stability staying in the same state P 5 Deterministically monotone Markov chain A stochastic process XMHENO with state space S No such that Km 11 for n E No no randomness here is called Deterministically monotone Markov chain DMMC The transition matrix looks something like this 0 1 O O O O 1 O P 0 0 0 1 6 Not a Markov chain Consider a frog jumping from a lotus leaf to a lotus leaf on in a small forest pond Suppose that there are N leaves so that the state space can be described as S 12 N The frog starts on leaf 1 at time n O and jumps around in the following fashion at time 0 it chooses any leaf except for the one it is currently sitting on with equal probability and then jumps to it At time n gt 0 it chooses any leaf other than the one it is sitting on and the one it Visited immediately before with equal probability and jumps to it The position XHHENO of the frog is not a Markov chain Indeed we have 1 lP X5 1X2 2X1 5 W while lP X5 1X2 2X1 1 O A more dramatic version of this example would be the one where the frog remembers all the leaves it had visited before and only chooses among the remaining ones for the next jump Last Updated October 28 2008 Lecture 8 MARKOV CHAINS 5 of 10 7 Making a non Markov chain into a Markov chain How can we turn the process of Example 6 into a Markov chain Obviously the problem is that the frog has to remember the number of the leaf it came from in order to decide where to jump next The way out is to make this information a part of the state In other words we need to change the state space Instead of just S 12 l i N we set S 11 i2 ij 6 12 l i i ln words the state of the process will now contain not only the number of the current leaf ie 1 but also the numer of the leaf we came from Le j There is a bit of freedom with the initial state but we simply assume that we start from 11 Starting from the state i j the frog can jump to any state of the form k i k i j with equal probabilities Note that some states will never be visited like i i for i 1 so we could have reduced the state space a little bit right from the start 8 A more complicated example Let anneNo be a simple symmetric random walk The absolute value process Yn XHL n E No is also a Markov chain This processes is sometimes called the re ected random walk In order to establish the Markov property we let to i i i in1 be non negative integers with he 7 ik ii for all 0 g k g n the state space is S No We need to show that the conditional probability PHXHJAI n1 an in l l l IXol l0 does not depend on in 1 0i We write PHXH1 in1anl tinlexol fol Plxml in1 W in a a XOI fol as Hm 1 4MXn imwlxol 0 and concentrate on the rst conditional probability on the right hand side asumming that in gt O the case in O is easier and is left to the reader who needs practice Let us use the law of total probability see Probleml in HW 6 with A1 Xn in A2 Xn in and B an in i i i X0 tel Since A2 0 B Xn fin m B we have Plxrwl iH1BI Plxrwl in1B 0 All Plxrwl in1B l AQl PanH in1lB Xn inlllP an inlBl Pan1 in1lB Xn finlllP an inlBl Conditionally on Xn in the probability that XHH in does not depend on the extra information B might contain Therefore PDQ in1B Xn in PDQ in1Xn in and similarly PanH in1lB Xn infl Plxrwl 5amp1an inl How about the term lP Xn inlB with lP Xn finlB being completely analogous If we show that rm inlB Mxn in an in 84 we would be making great progress There is a rigorous way of doing this but it is quite technical and not very illuminating The idea is simple though for every path OX1 i i i Xn the ipped path 0 7X1w i i i 7Xnw is equally likely and gives exactly the same sequence of absolute values Therefore the knowledge of B does not permit us to distinguish between themi In particular for every possible sequence of absolute values x0 0 951 1 1 i i 950 in there are as many actual paths 950951 i i i xn that end up in in as those that end up in it Therefore PDQ inlB PDQ in MA in Similarly lP Xn finlB Last Updated October 28 2008 Lecture 8 MARKOV CHAINS 6 of 10 Going back to the initial expression 84 we have Pan1 in1Bl lP lei in1B Xn inl lP lei in1B Xn infl Given that in gt 0 the rst conditional probability above equals to lP Xn1 in because of the Markov property as far as XHH is concerned the information B 0 Km in is the same as just Km 5 The second conditional probability is 0 it is impossible for XHH to be equal to in1 gt 0 if Xn Him lt Therefore lP Xn1 HHIB 14 An essential identical argument shows that lP Xn1 HinHIB 14 Therefore lP HXnHI in B l which is independent of M1 i1 0 it looks like it is also independent of in but it is not this probability is equal to 0 unless in H in i1 9 A more realistic example In a game of tennis the scoring system is as follows both players let us call them Amelie and p P Ammm Bjorn start with the score of 0 Each time m R p P Amelie wins a point her score moves a p D1 Jim s 40 step up in the following hierarchy an mquot 1 W m W Him 0 H 15 H 30 H 40 f P H F H F q Once Amelie reaches 40 and scores a m is m n m point three things can happen Y Y 1 if Bjorn39s score is 50 or less quot 1 P 1 P H Amelie wins the game mm mm in Am 2 if Bjorn39s score is 40 Amelie39s q q n1 mm score moves up to advantage and m m MW q E 5 1f Bgorns score is qdvgntage qA f4 nothing happens to Amelie s score mu q Biamwms but Bjorn39s score falls back to 40 I I g n Flnally 1f Amehes score 15 advantage Figure Markov chains with a nite number of states are and She Wins a mint She Wins the game usually represented by directed graphs like the one in the The situation is entirely symmetric for gure above The nodes are states two states 17 are linked Bj rn We suppose that the probability that by a directed edge if the transition probability pg is non Amelie wins each point is p 6 01 inde zero and the number pg is writen above the link If pg 0 pendently of the current score 39 no edge is drawn A situation like this is a typical example of a Markov chain in an applied setting What are the states of the process We obviously need to know both players39 scores and we also need to know if one of the players has won the game Therefore a possible state space is the following S quotAmelie wins quotBjorn wins 0 00150 50 0 40 15 0151515 50 15 40 85 50 0 30 15 50 50 50 40 40 0 40 15 40 50 40 40 40 Adv Adv 40 It is not hard to assign probabilities to transitions between states Once we reach either quotAmelie wins or quotBjorn wins the game stops We can assume that the chain remains in that state forever ie the state is absorbing The initial distribution is quite simple we aways start from the same state 00 so that 180 1 and a 0 for all i e S 0 How about the transition matrix 7 When the number of states is big 5 20 in this case transition matrices are useful in computer memory but not so much on paper lust for the fun of it here is the Last Updated October 28 2008 Lecture 8 MARKOV CHAINS 7 of 10 transition matrix for our game of tennis chain with the states ordered as in 85 l O O O O O O O O O O O O O O O O O O O O l O O O O O O O O O O O O O O O O O O O O 0 q 0 O p O O O O O O O O O O O O O O O O 0 q 0 O p O O O O O O O O O O O O O O O O 0 q 0 O p O O O O O O O O O O O 0 q 0 O O O O O O p O O O O O O O O O O O O O O O O 0 q 0 O p O O O O O O O O O O O O O O O O 0 q 0 O p O O O O O O O O O O O O O O O O 0 q 0 O p O O O O O O O p 0 q 0 O O O O O O O O O O p O O O O O O O O O O O O O O O O 0 q 0 O p O O O O O O O O O O O O O O O O 0 q 0 O p O O O O O O O O O O O O O O O O 0 q 0 O p O O O 0 q 0 O O O O O O O O O O O O O O p O O p O O O O O O O O O O O O O 0 q 0 O O O p O O O O O O O O O O O O O O 0 q 0 O O p O O O O O O O O O O O O O O O 0 q 0 O O O O O O O O O O O O O O O O O O 0 q p 0 q 0 O O O O O O O O O O O O O O p O O p O O O O O O O O O O O O O O O 0 q 0 O Here is a question we will learn how to answer later Question 82 Does the structure of a game of tennis make is easier or harder for the better player to win i7 In other words if you had to play against Roger Federer I am rudely assuming that he is better than you would you have a better chance of winning if you only played a point or if you actually played the whole game i7 85 Chapman Kolmogorov relations The transition probabilities pi i j E S tell us how a Markov chain jumps from a state to a state in one step How about several steps ie how does one compute the the probabilities like lP Xkn jIXk i n E N Since we are assuming that all of our chains are homogeneous transition probabilities do not change with time this probability does not depend on the time k and we set pf Mth lek i rm le0 391 It is sometimes useful to have a more compact notation for this last conditional probability so we write EMA MAIXO i for any event A Therefore pf EDan 11 Porn 0 we clearly have 0 1 i JV p 1 0 i 1 Once se have de ned the multi step transition probabilities pign ij E S n E No we need to be able to compute them This computation is central in various applications of Markov chains they relate the small time one step behavior which is usually easy to observe and model to a longtime multi step behavior which is really of interest Before we state the main result in this direction let us remember Last Updated October 28 2008 Lecture 8 MARKOV CHAINS 8 of 10 how matrices are multiplied When A and B are n X n matrices the product C AB is also an n X n matrix and its ij entry C17 is given as C ZA eBk k 1 There is nothing special about niteness in the above de nition If A and B were in nite matrices A AHMES B BiMES for some countable set S the same procedure could be used to define C ABi Indeed C will also be an quotS X S matrix and C ZAHQBZQ legs as long as the in nite sum above converges absolutely In the case of a typical transition matrix P convergence will not be a problem since P is a stochastic matrix ie it has the following two properties why 1 p172 O for all ij E S and 2 ZfES pg 1 for all i E S in particular pg 6 01 for all iji When P p1WES and P pgVMES are two S X S stochastic matrices the series ZkES pikpfef converges absolutely since 0 g pfef g l for all h E S and so Z pikpgefl g Zpik 1 for all ij E S lees lees Moreover a product C of two stochastic matrices A and B is always a stochastic matrix the entries of C are clearly positive and by Tonelli s theorem ZCW ZZAikBki ZZAikBki ZAik ZBM ZA e 1 ES 3955 lees legs 3955 legs 3955 legs 1 Proposition 85 Let P be the nth matrix power of the transition matrix P Then pg PWV for i j e S Proof We proceed by induction For n l the statement follows directly from the de nition of the matrix P Supposing that pf P iv for all ij we have pi Plxmi JIXO i1 ZPDQ kIXo tiPanu JIXO uxi kl lees Zrm le0 rm jlx1 h lees ZIP DQ kIXo 1 11an J lXo kl lees V n Zplkpk lees where the second equality follows from the law of total probability the third one from the Markov property and the fourth one from homogeneity The last sum above is nothing but the expression for the matrix product of P and P and so we have proven the induction stepi Using Proposition 85 we can write a simple expression for the distribution of the random variable Km for n E Noi Remember that the initial dsitribution the distribution of X0 is denoted by 10 ai iEsi Analogously we define the vector am afnigs by af rm i i e s Last Updated October 28 2008 Lecture 8 MARKOV CHAINS 9 of 10 Using the law of total probability we have all rm 1 Z MXO arm iX0 k 2119ng lees lees We usually interpret 10 as a row vector so the above relationship can be expressed using vector7matrix multiplication The following corollary shows a simple yet fundamental relationship between different multi7step transition probabilities pign Corollary 84 Chapman7Kolmogorov relations For n m E No and ij E S we have p3 Zplpz lees Proof The statement follows directly from the matrix equality pmn D It is usually dif cult to compute P for a general transition matrix P and a large n We will see later that it will be easier to nd the limiting values limns00 pigYR In the meantime here is a simple example where this can be done by hand Example 85 In the setting of a Regime Switching chain Example 4 let us write a for p01 and b for p10 to simplify the notation so that the transition matrix looks like this 7 a a p b 1 7 b The characteristic equation det l 7 P O of the matrix P is 9 7 l a a O detM7P 7b A7 b A7la 7lb 7ab A71 717a 7b The eigenvalues are therefore A1 1 and Ag 1 7 a 7 b The eigenvectors are v1 and V2 a l a M O l 0 so that With V 1 7b and D 0 A2 0 1 7 a 7 m we have PV VD ie P VDV l This representation is very useful for taking matrix powers 71 pm VDV 1VDV 1VDV 1 VDHV 1 vl 14107 an Assumingabgt0ieP we have V 1 11and n n4 1 a l O 1 b a P VDV 1 7b 01iaibn m1 1 l b a l7a7b a 7a ab b a ab b 7b lt17a7bgtnxj wwzw wvwbr 1abgt Last Updated October 28 2008 Lecture 8 MARKOV CHAINS 10 of 10 The expression for P above tells us a lot about the structure of the multi step probabilities pg for large IL Note that the second matrix on the right hand side above comes multiplied by l 7 a 7 b which tends to O as n a 00 unless we are in the uninteresting situation a b or a 1 Therefore 1 PHN 1ng forlargeni The fact that the rows of the right hand side above are equal points to the fact that for large n pig does not depend much on the initial state it In other words this Markov chain forgets its initial condition after a long period of time This is a rule more than an exception and we will study such phenomena in the following lectures Last Updated October 28 2008 Lecture 5 STOCHASTIC PROCESSES AND SIMULATION i of 7 Course M562K Stochastic Processes I Term Spring 2008 Instructor Gordan Zitkovic Lecture 3 STOCHASTIC PROCESSES AND SIMULATION 51 Stochastic Processes De nition 51 Let T be a subset of 07 00 A family of random variables Xtheqv indexed by T is called a stochastic or random process When T N or T N0 Xthg is said to be a discrete time process and when T 07 00 it is called a continuous time process When T is a singleton say T the process Xthg E X1 is really just a single random variable When T is nite erg T 127 l i i n we get a random vector Therefore stochastic processes are generalizations of random vectors The interpretation is however somewhat different While the compo nents of a random vector usually not always stand for different spatial coordinates the index it E T is more often than not interpreted as time Stochastic processes usually model the evolution of a random system in time When T 07 00 continuoustime processes the value of the process can change every instant When T N discrete time processes the changes occur discretely In contrast to the case of random vectors or random variables it is not easy to define a notion of a density or a probability mass function for a stochastic process Without going into details why exactly this is a problem let me just mention that the main culprit is the in nity One usually considers a family of discrete quot etc nit quot ice the joint distributions of random vectors XENXEWHWXM for all n E N and all choices t1 i i tn 6 T The notion of a stochastic processes is very important both in mathematical theory and its applications in science engineering economics etc It is used to model a large number of various phenomena where the quantity of interest varies discretely or continuously through time in a non predictable fashion Every stochastic process can be viewed as a function of two variables t and or For each xed it w gt gt Xtw is a random variable as postulated in the de nition However if we change our point of view and keep w xed we see that the stochastic process is a function mapping w to the real valued function it gt gt XE These functions are called the trajectories of the stochastic process X i Figures on the left show two different trajectories of a simple random walk ie each one corresponds to a different frozen w E 9 but it goes from 0 to 30 We will de ne the simple random walk later For now let Us just say that is behaves as follows It starts at m 0 for t 0 After that a fair coin is tossed and we move up to m 1 if heads is observed and down 0 m 71 is we see tails The procedure is repeated at t 1 2 and the position at t 1 is determined in the same way independently of all the coin tosses before note that the position at t k can be any of the followingz 7k m k2 k7 m k Last Updated September 8 2008 Lecture 5 STOCHASTIC PROCESSES AND SIMULATION 2 of 7 Unlike with the gures above the two m D pictures on the right show two time7 5 15 slices of the same random process 2 mm in each graph the time t is xed 5 39 5 t 15 vs t 25 but the various um m values random variables X15 and X25 ans m can take are presented through the 39 39 probability mass functions 52 The canonical probability Space When one deals with in nite7index T 00 stochastic processes the construction of the probability space 517 to support a given model is usually quite a technical matter This course does not suffer from that problem because all our models can be implemented on a special probability space We start with the sample7space Q Q 01 gtlt 01 gtlt 01 and any generic element of Q will be a sequence w 01004 mg of real numbers in 01 For n E No we de ne the mapping 7 9 7 01 which simply chooses the n7th coordinate 7 w wn The proof of the following theorem can be found in advanced probability books Theorem 52 There exists a aalgebra f and a probability P on 9 such that 1 each 77 n E No is a random Variable with the uniform distribution on 01 and 2 the sequence y neNo is independent Remark 55 One should think of the sample space 9 as a source of all the randomness in the system the elementary event w E Q is chosen by a process beyond out control and the exact value of w is assumed to be unknown All the other parts of the system are possibly complicated but deterministic functions of w random variables When a coin is tossed only a single drop of randomness is needed 7 the outcome of a coin7toss When several coins are tossed more randomness is involved and the sample space must be bigger When a system involves an in nite number of random variables like a stochastic process with in nite T a large sample space 9 is needed 55 Constructing the Random Walk Let us show how to construct the simple random walk on the canonical probability space 517 from Theorem 52 First of all we need a de nition of the simple random walk De nition 54 A stochastic process Xn 1 X0 0 2 the increment X7 7 X7 is independent of X0X1 Xn for each n E No and 5 the increment X7 7 X7 has the coin7toss distribution ie lP Xn1 7 X7 1 lP Xn1 7 X7 71 neNo is called a simple random walk if For the sequence 77 random variables neN given by Theorem 52 de ne the following new sequence nely of 5n17 712 71 otherwise Last Updated September 8 2008 Lecture 5 STOCHASTIC PROCESSES AND SIMULATION 5 of 7 Then we set X00 Xng neN k1 lntuitively we use each in to emulate a coin toss and then de ne the value of the process X at time n as the cumulative sum of the rst n coin tossesi Proposition 55 The sequence anneNo de ned above is a simple random walk Proof 1 is trivially true To get we rst note that the 5nneN is an independent sequence as it has been by an quot o function to each element of an independent sequence 7nneNi Therefore the increment Xn1 7 Xn n is independent of all the previous coin osses 51 i i i 5W What we need to prove though is that it is independent of all the previous values of the process X i These previous values are nothing but linear combinations of the coin tosses 51 i i 5n so they must also be independent of n1i Finally to get we compute 1 Xn 1 mm 1 mu 2 a g A similar computation shows that lP Xn1 7 Xn 71 D 54 Simulation Another way of thinking about sample spaces and randomness in general is through the notion of simulation Simulation is what I did to produce the two trajectories of the random walk above a computer tossed a fair coin for me 50 times and I followed the procedure described above to construct a trajectory of the random walk If I asked the computer to repeat the process I would get different 50 coin tosses1i This procedure is the exact same one we imagine nature or casino equipment follows whenever a non deterministic situation is involved The difference is of course that if we use the random walk to model out winnings in a fair gamble it is much cheaper and faster to use the computer than to go out and stake and possibly loose large amounts of money Another obvious advantage of the simulation approach is that it can be repeated a simulation can be run many times and various statistics mean variance etc can be computed More technically every simulation involves two separate inputs The rst one if the actual sequence of outcomes of coin tossesi The other one is the structure of the model l have to teach the computer to quotgo up if heads shows and to quotgo down if tails show and to repeat the same procedure several times In more complicated situations this structure will be more complicated What is remarkable is that the rst ingredient the coin tosses will stay almost as simple as in the random walk case even in the most complicated models In fact all we need is a sequence of so called random numbers You will see through the many examples presented in this course that if I can get my computer to produce an independent sequence of uniformly distributed numbers between 0 and 1 these are the random numbers I can simulate trajectories of all important stochastic processes Iust to start you thinking here is how to produce a coin toss from a random number declare heads if the random number drawn is between 0 and 05 and declare tails otherwisei 344 Random number generation Before we get into intricacies of simulation of complicated stochastic processes let us spend some time on the seemingly simple procedure of the generation of a single random numberi In other words how do you teach a computer to give you a random number between 0 and 1 Theoretically the answer is You can tli In practice you can get quite close The question of what actually constitutes a random number is surprisingly deep and we will not even touch it in this course Suppose we have written a computer program a random number generator RNG call it rand which produces a random number between 0 and 1 every time we call it So far there is nothing that prevents rand from always returning the same number 04 or from alternating between 03 and 083 1Actually I would get the exact same 30 coin tosses with probability 0000000001 Last Updated September 8 2008 Lecture 5 STOCHASTIC PROCESSES AND SIMULATION 4 of 7 Such an implementation of rand will however hardly qualify for an RNG since the values it spits out come in a predictable order We should therefore require any candidate for a random number generator to produce a sequence of numbers which is as unpredictable as possible This is admittedly a hard task for a computer having only deterministic functions in its arsenal and that is why the random generator design is such a dif cult field The state of the affairs is that we speak of good or less good random number generators based on some statistical properties of the produced sequences of numbers One of the most important requirements is that our RNG produce uniformly distributed numbers in 0 1 namely the sequence of numbers produced by rand will have to cover the interval 0 1 evenly and in the long run the number of random numbers in each subinterval a b of 0 1 should be proportional to the length of the interval b 7 a This requirement if hardly enough because the sequence 0 01002 i i 0 0809 1 005015025 0 i 085095 002500750012500175 i M will do the trick while being perfectly predictablei To remedy the inadequacy of the RNGs satisfying only the requirement of uniform distribution we might require rand to have the property that the pairs of produced numbers cover the square 01 gtlt 01 uniformly That means that in the long run the proportion of pairs falling in a patch A of the square 01 gtlt 01 will be proportional to its area Of course one could continue with such requirements and ask for triples quadruples of random numbers to be uniform in 013 014 l i H The highest dimension n such that the RNG produces uniformly distributed numbers in 01 is called the order of the RNGT A widely used RNG called the Mersenne Twister has the order of 625 Another problem with RNGs is that the numbers produced will start to repeat after a while this is a fact of life and niteness of your computer39s memory The number of calls it takes for a RNG to start repeating its output is called the period of a RNGT You might have wondered how is it that an RNG produces a different number each time it is called since after all it is only a function written in some programming language Most often RNGs use a hidden variable called the random seed which stores the last output of rand and is used as an invisible input to the function rand the next time it is called If we use the same seed twice the RNG will produce the same number and so the period of the RNG is limited by the number of possible seeds It is worth remarking that the actual random number generators usually produce a random integer between 0 and some large number RANDiMAX and report the result normalized divided by RANDMAX to get a number in 01 7142 Simulation of Random Variables Having found a random number generator good enough for our purposes the one used by Mathematica is just fine we might want to use it to simulate random variables with distributions different from the uniform on 01 coin tosses normal exponential This is almost always achieved through transformations of the output of a RNG and we will present several methods for dealing with this problemi A typical procedure see the Box Muller method below for an exception works as follows a real deterministic function f 01 A R called the transformation function is applied to rand The result is a random variable whose distribution depends on the choice of f Note that the transformation function is by no means unique In fact 7 N U01 then 167 and where f1 7 have the same distribution why What follows is a list of procedures commonly used to simulate popular random variables 1 Discrete Random Variables Let X have a discrete distribution given by XNlt11 12 In 101 102 M 1 For discrete distributions taking an infinite number of values we can always truncate at a very large n and approximate it with a distribution similar to the one of X i We know that the probabilities In fig p7 add up to 1 so we define the numbers 0 10 lt 41ltltqn1by 4007 411017 42p1p2 qnp1p2pn1 Last Updated September 8 2008 Lecture 5 STOCHASTIC PROCESSES AND SIMULATION 5of7 E To simulate our discrete random variable X we call rand and then return r1 if 0 g rand lt 41 return I if ql g rand lt 12 and so on It is quite obvious that this procedure indeed simulates a random variable X The transformation function f is in this case given by 117 0SIlt41 I2 41SIlt42 x 7 1m 4n71SIS1 The Method of Inverse Functions The basic observation in this method is that for any continuous random variable X with the distribution function FX the random variable Y FX X is uniformly distributed on 01 By inverting the distribution function FX and applying it to Y we recover X Therefore if we wish to simulate a random variable with an invertible distribution function F we rst simulate a uniform random variable on 01 using rand and then apply the function F 1 to the result In other words use f F 1 as the transformation function Of course this method fails if we cannot write F 1 in closed form Example 56 Exponential Distribution Let us apply the method of inverse functions to the simulation of an exponentially distributed random variable X with parameter A Remember that the density fX of X is given by xexp7z7 z gt 07 and so FXI 17 exp7z7 z gt 07 and so Fgl 7 log17y Since 17rand has the same U07 1 distribution as rand we conclude that fr 7 logz works as a transformation function in this case ie that 7 log rand A has the required Exp distribution Example 57 Cauchy Distribution The Cauchy distribution is de ned through its density func t Ion f if X I 7 7r 1 12 The distribution function FX can be determined explicitly in this example 7T 1 E l 1 F 7 7 d n wml12 I 7rlt2 D7 yielding that fx tan7rz 7 is a transformation function for the Cauchy random variable ie tan7r rand 7 05 will simulate a Cauchy random variable for you arctanzgt and so F 1y tan lt7ry 7 The Box Muller method This method is useful for simulating normal random variables since for them the method of inverse function fails there is no closed form expression for the distribution function of a standard normal Note that this method does not fall under that category of transfor mation function methods as described above You will see though that it is very similar in spirit It is based on a clever trick but the complete proof is a bit technical so we omit it Proposition 58 Let 71 and 72 be independent U01distributed random Variables Then the random Variables 210g71 00527W2L X2 210g71 511127W2 X1 are independent and standard normal N01 Last Updated September 8 2008 Lecture 5 STOCHASTIC PROCESSES AND SIMULATION 6of7 5 01 Therefore in order to simulate a normal random variable with mean u 0 and variance 02 1 we produce call the function rand twice to produce two random numbers rand1 and rand2i The numbers X1 72 logrand1 00527r rand27 X2 M72 logrand1 sin27r rand2 will be two independent normals Note that it is necessary to call the function rand twice but we also get two normal random numbers out of it It is not hard to write a procedure which will produce 2 normal random numbers in this way on every second call return one of them and store the other for the next call In the spirit of the discussion above the function f f17 f2 01 gtlt 01 A R2 given by 16117 y 72 logz 30527ry7 f2zy x72logz sin27ryi can be considered a transformation function in this case Method of the Central Limit Theorem The following algorithm is often used to simulate a normal random variable a Simulate 12 independent uniform random variables rands 71 72 i i i b SetX717271276i The distribution of X is very close to the distribution of a unit normal although not exactly equal eigi lP X gt 6 0 and lP Z gt 6 y 0 for a true normal Z The reason why X approximates the normal distribution well comes from the following theorem Theorem 59 Let X1X2i H be a sequence of independent random Variables all having the same squaredntegrable distribution Set u lEX1 lEX2 i i i and a2 VarX1 VarX2 i i i The sequence of normalized random Variables X1X2quot39Xnn 039 7 712 5 converges to the normal random Variable in a mathematically precise sense The choice of exactly 12 rands as opposed to 11 or 55 comes from practice it seems to achieve satisfactory performance with relatively low computational cost Also the standard deviation of a U01 random variable is so the denominator a conveniently becomes 1 for n 12 It might seem a bit wasteful to use 12 calls of rand in order to produce one draw from the unit normal If you try it out you will see however that it is of comparable speed to the Box Muller method described above while Box Muller uses computationally expensive cos7 sin7 and log this method uses only addition and subtraction The nal verdict of the comparison of the two methods will depend on the architecture you are running the code on and the quality of the implementation of the functions cossini i H Other methods There is a number of other methods for transforming the output of rand into random numbers with prescribed density rejection method Poisson trick You can read about them in the free online copy of Numerical recipes in C at http www library comell edunrbookcpdf html 55 Monte Carlo Integration Having described some of the procedures and methods used for simulation of various random objects variables vectors processes we turn to an application in probability and numerical mathematics We start off by the following version of the Law of Large Numbers which constitutes the theory behind most of the Monte Carlo applications Last Updated September 8 2008 Lecture 5 STOCHASTIC PROCESSES AND SIMULATION 7 of 7 Theorem 510 Law of Large Numbers Let X1 X2 l l i be a sequence of identically distributed random variables and let 9 R A R be function such that p lEgX1 lEgX2 exists Then gX1 902 I I I 9Xn A p gzfX1dz as n A 00 n foo The key idea of Monte Carlo integration is the following Suppose that the quantity y we are interested in can be written as y 5 91 fX dx for some random Variable X with density fX and come function g and that 1112 l H are random numbers distributed according to the distribution with density fX Then the average 1 g11 yr2 gzn7 will approximate y It can be shown that the accuracy of the approximation behaves like 1 so that you have to quadruple the number of simulations if you want to double the precision of you approximation Example 511 1 numerical integration Let g be a function on 01 To approximate the integral 0191d1 we can take a sequence of n UO1 random numbers 11 12 l l l 2 hmgoummyrmul because the density of X N U01 is given by L ogzgi fX 7 0 otherwise 2 estimating probabilities Let Y be a random variable with the density function fyl If we are interested in the probability MY 6 ab for some a lt b we simulate n draws yhyg l l yn from the distribution Fy and the required approximation is My 6 am m number of yn39s falling in the interval a bl TL One of the nicest things about the Monte Carlo method is that even if the density of the random variable is not available but you can simulate draws from it you can still preform the calculation above and get the desired approximation Of course everything works in the same way for probabilities involving random vectors in any number of dimensions 5 approximating 7r We can devise a simple procedure for approximating 7r m 3141592 by using the Monte Carlo method All we have to do is remember that 7r is the area of the unit disk Therefore 7r4 equals to the portion of the area of the unit disk lying in the positive quadrant and we can write 7T 1 1 1 9Iyydrdy7 0 0 1 12 y S 1 I g y 0 otherwise where So simulate n pairs ziyii 1i i l n of uniformly distributed random numbers and count how many of them fall in the upper quarter of the unit circle ie how many satisfy g 1 and divide by n Multiply your result by 4 and you should be close to 7L How close l7 Well that is another story in Experimentl Last Updated September 8 2008 A quot C1021 V299er 624p Jc quotqucg 77 D X Rude saws rafeleae at7 39 a MW 1 mam Igt Fatalmi m 49 15 0 x a w by w XLM RE My 1 me 7L u M MM W A r H d W4 a 991516 mic wa 39Hw 4 I X i9kw fgtltv was gt0 awi WPo a 97 39 gt f quot I fP x m gt k c gt Ad 391 23 V2 QE KW eaquot 39339 yd f 1 we gtlt jaw e1 quotT v y dim e 55 a u 2 cmmm fxeagff fjj b52994 I r 39 r 549 f 39 95 few L A vv 7amp4quot a 7 am New m x zk zsf i i gtlt lgt gtltlt91gtw x39z 397 7 I s gtC 4MM6LJCV gist 74 67 m r g my W ch H23 1 39 fl Kr o C lt F6 4 K441 f 1501 78Vlt 7ltltr 7 quot K3167 f awe my 39 fquot 77 V 99 x a 424311 Wk a N 15 W m 7 e eecvrTa jw Cr r w VANCE E CK 3 Z7 KHz I X F0930 53 VOW WD39R EfxgtqibjJkyfgpef w M f Vcgt2fg 39 Efkj din f i Lecture 5 GENERATING EUNCTIONs 1 of 6 Course M562K Stochastic Processes I Term Fall 2008 Instructor Gordan Zitkovic Lecture 5 GENERATING FUNCTIONS The path counting method used in the previous lecture only works for computations related to the rst n steps of the random walk where n is given in advance We will see later that most of the interesting questions do not fall into this category For example the distribution of the time it takes for the random walk to hit the level 1 y 0 is like that There is no way to give an a priori bound on the number of steps it will take to get to l in fact the expectation of this random variable is 00 To deal with a wider class of properties of random walks and other processes we need to develop some new mathematical tools 51 De nition and first properties The distribution of an N0 valued random variable X is completely determined by the sequence p7neNo of numbers in 01 given by pn lP X n n E NOI As a sequence of real numbers p7neNo can be used to construct a power series 51 PX s ipkski k0 It follows from the fact that En lpnl g 1 that the radius of convergence1 of p7neNo is at least equal to 1 Therefore PX is well de ned for s E 711 and perhaps elsewhere tool De nition 51 The function PX given by PX s 2201016516 is called the generating function of the random variable X or more precisely of its pmf p7neNOI Before we proceed let us nd an expression for the generating functions of some of the popular N0 valued random variables Example 52 l Bernoulli bp Here p0 q p1 p and p7 0 for n 2 2 Therefore PX s ps q 2 Binomial bnp Since pk pkqn k k 0 I I n we have n TL P k nik k n X8 1 ltkgtp q s psq by the binomial theoremi 5 Geometric gp For k E No pk qkp so that 00 00 p PX8 241681610 10201316 177A k0 k0 13 1Remember that the radius of convergence ofa power series 220 2ka is the largest number R e 0 co suchthat 220 2ka converges absolutely whenever lt R Last Updated September 25 2008 Lecture 5 GENERATING FUNCTIONS 2 of 6 4 Poisson p Given that pk e k E No we have Do Ak 00 3A k PXltSgt Zeik sk A eikesk eMsil 0 Some of the most useful analytic properties of PX are listed in the following proposition Proposition 55 Let X be an Noevalued random variable let p7neNo be its pmf and let PX be its generating function Then 1 Px8 lE8X1S 6 71711 2 PX s is convex and nondecreasing with 0 g PX s g 1 for s E 01 3 PX s is in nitely differentiable on 711 with d 52 gPX3 Zkki1kin1skinpk n6 N kn In particular pn PX s 50 and so 3 gt gt PX s uniquely determines the sequence pnneNo Proof Statement 1 follows directly from the formula lEgX 220 gkpk applied to 91 s As far as 5 is concerned we only note that the expression 52 is exactly what you would get if you differentiated the expression 51 term by term The rigorous proof of the fact this is allowed is beyond the scope of these notes With 5 at our disposal 2 follows by the fact that the rst two derivatives of the function PX are non negative and that PX 1 1 D Remark 54 1 If you know about moment generating functions you will notice that PXs MXlogs for s E 01 where MX A lEexpX is the moment generating function of X 2 Generating functions can be used with sequences anheNo which are not necessarily pmf39s of random variables The method is useful for any sequence anneNo such that the power series 220 aksk has a positive non zero radius of convergence 5 The name generating function comes from the last part of the property The knowledge of PX implies the knowledge of the whole sequence p7neNo Put differently PX generates the whole distribution of X Remark 55 Note that the true radius of convergence varies from distribution to distribution It is infinite in 2 and and equal to 14 gt 1 in in Example 52 For the distribution with pmf given by pk Hug where C 220 kingfl the radius of convergence is exactly equal to 1 Can you see 7 52 Convolution and moments The true power of generating functions comes from the fact that they behave very well under the usual operations in probability De nition 56 Let p7neNo and qnneNo be two probability mass functions The convolution p 6 q of p7neNo and qnneNo is the sequence what where n M ZPanikyn E No This abstractly de ned operation will become much clearer once we prove the following proposition Proposition 57 Let XY be two independent Noevalued random variables with pmfs pnneNo and qnneNo Then the sum Z X Y is also Noevalued and its pmf is the convolution of pn and qnneNo in the sense of De nition 56 nENo Last Updated September 25 2008 Lecture 5 GENERATING FUNCTIONS 5 of 6 Proof Clearly Z is N0 valued To obtain an expression for its pmf we use the law of total probability HZ n ZMX MHZ an k k0 However lP Z an k lP X Y an k lP Y n 7 le k lP Y n 7 k where the last equality follows from independence of X and Y Therefore W n flux klIP lY n e k imam 160 160 Corollary 58 Let p7neNo and p7neNo be any two pmfs 1 Convolution is commutative ie p 6 q q 6 p 2 The convolution of two pmfs is a pmf ie Tn 2 0 for all n E No and 220 m 1 for 7 p 6 q Corollary 59 Let p7neNo and p7neNo be any two pmfs and let 133 21715116 031d Q3 ZQkSk k0 k0 be their generating functions Then the generating function Rs 220 rksk of the convolution 7 pm is given by 38 P8Q81 Equivalently the generating function Pxy of the sum of two independent Noevalued random variables is equal to the product PXY8 PX8PYS7 of the generating functions PX and R of X and Y Example 510 1 The binomial bn7 p distribution is a sum of n independent Bernoullis 1210 Therefore if we apply Corrolary 59 n times to the generating function q p5 of the Bernoulli bp distribution we immediately get that the generating function of the binomial is q p5 q p5 q ps 2 More generally we can show that the sum of m independent random variables with the bnp distribution has a binomial distribution 127an7 17 If you try to sum binomials with different values of the parameter p you will not get a binomial 5 What is even more interesting the following statement can be shown Suppose that the sum Z of two independent N0 valued random variables X and Y is binomially distributed with parameters n and 17 Then both X and Y are binomial with parameters nXp and nyp where nX my n In other words the only way to get a binomial as a sum of independent random variables is the trivial one Another useful thing about generating functions is that they make the computation of moments easier Proposition 511 Let p7neNo be a pmf of an Noevalued random variable X and let Ps be its genere ating function For n E N the following two statements are equivalent 1 lEX lt oo dquotPs 2 1 exists in the sense that the left limit limsl d5 exists In either case we have EXX1X 2gtMltXe M 1 5713 a Last Updated September 25 2008 Lecture 5 GENERATING FUNCTIONS 4 of 6 The quantities lEX lEXX71 lEXX71X 72m are called factorial moments of the random variable X i You can get the classical moments from the factorial moments by solving a system of linear equations It is very simple for the rst few ElX l 13le7 lEX2 lEXX 71 lEX lEX3 lEXX 71X 7 3lEXX 71 lEXm A useful identity which follows directly from the above results is the following Varle P 1 P 1P 127 and it is valid if the rst two derivatives of P at 1 exist Example 512 Let X be a Poisson random variable with parameter 1 Its generating function is given by PX3 6MP Therefore 57 an 1 1 and so the sequence lEX lEXX 7 1 lEXX 7 1X 7 2i i of factorial moments of X is just 123 i i i It follows that lEX 1 lEX2 A2 1 VarX 1 1EX3 A332m 55 Random sums Our next application of generating function in the theory of stochastic processes deals with the so called random sums Let nely be a sequence of random variables and let N be a random time a random time is simply an N0 U oo value random variable We can de ne the random variable N 0 N w 0 Y k by Yw w for w E 9i 2 2128me Nltwgt 21 More generally for an arbitrary stochastic process Xnn6ND and a random time N with lP N 00 0 we de ne the random Variable XN by XNw XNww for w E 9 When N is a constant N then XN is simply equal to Xni In general think of XN as a value of the stochasti process X taken at the time which is itself random lf X7 221 Q then XN 221 Ski Example 515 Let 5nneN be the increments of a symmetric simple random walk coin tosses and let N have the following distribution 0 1 2 N N 13 13 13 which is independent of EnheN it is very important to specify the dependence structure between N and 5nneN in this settingl Let us compute the distribution of Y 20516 in this case This is where Last Updated September 25 2008 Lecture 5 GENERATING FUNCTIONS 5 of 6 we typically use the formula of total probability MY m MY mlN 0PN 0 lP Y mlN 1lPN 1 MY mlN 2PN 2 N N PlZEk mlN OllPlN Ol lP lZEk mlN 1lPlN 1l k0 k0 N Png mlN 2mm 2 170 ltPlomiue miusle mi When m 1 for example we get 0 l 0 rY116 Perform the computation for some other values of m for yourself What happens when N and EnheN are dependent This will usually be the case in practice as the value of the time N when we stop adding increments will typically depend on the behaviour of the sum itself Example 514 Let EnheN be as above we can think of a situation where a gambler is repeatedly playing the same game in which a coin is tossed and the gambler wins a dollar if the outcome is heads and looses a dollar otherwise A smart gambler enters the game and decides on the following tactic Let s see how the rst game goes If I lose Ill play another 2 games and hopefully cover my losses and If I win I ll quit then and there The described strategy amounts to the choice of the random time N as follows NW 17 51 17 37 51 1 Then YOU 17 51 17 7152537 51 1 Therefore Fly 1 Ply 1l51 1llP 151 1llP lY 1l51 illplgl ill 1 17151 1 19152 53 251 71 lt1 i 3 Similarly we get MY 71 i and MY 73 The expectation lEY is equal to 1 g 71 i 73 0 This is not an accident One of the rst powerful results of the beautiful martingale theory states that no matter how smart a strategy you emply you cannot beat a fair gamble We will return to the general non independent case in the next lecture Let us use generating functions to give a full description of the distribution of Y 2160 5k in this case Proposition 515 Let nely be a sequence of independent Noevalued random Variables all of which share the same distribution with pmf p7neNo and generating function Pg Let N be a random time independent of 5nneN Then the generating function Py of the random sum Y 210516 is given by Py8 PNP 5 Proof We use the idea from Example 51 and condition on possible values of N We also use the following fact Tonelli39s theorem without proof 5 If my L1 2 0 then 22 160 i0 139 00 E aij k0 M8 H o Last Updated September 25 2008 Lecture 5 GENERATING FUNCTIONS 6 of 6 160 i irp klN 2mm 2 160 i0 j llP N by independence 223 5 PgtN 239 by Tonelli 10 k0 10 ZEN Z39l ZSkIP lZEj kl by 55 i0 k0 j0 By iteration of Corollary 59 we know that the generating function of the random variable 220 Ej which is exactly what the second sum above represents is Pg Therefore the chain of equalities above can be continued as ZIP lN ziltPgltsgtgti PNP 8A D Corollary 516 Wald39s Identity 1 Let EnheN and N be as in Proposition 515 Suppose also that lEN lt 00 and lE 1 lt 00 Then N IElZski ElNl El ll k0 Proof We just apply the composition rule for derivatives to the equality Py PN 0 Pg to get PfS PMPASDPsSl After we let 3 1 we get lEYl Pfl vaP 1P 1 PN1Pg 1 ElNl El ll D Example 517 Every time Spring eld Wildcats play in the Superbowl their chance of winning is p E 07 1 The number of years between two Superbowls they get to play in has the Poisson distribution p A gt 0 What is the expected number of years Y between the consequtive Superbowl wins i7 Let 5nneN be the sequence of independent p random variables modeling the number of years between consecutive Superbowl appearances by the Wildcat Moreover let N be a geometric gp random variable with success probability p Then N Y 25kt k 0 Indeed every time the Wildcats lose the Superbowl another 5 years have to pass before they get another chance and the whole thing stops when they nally wini To compute the expectation of Y we use Corollary 5 16 Em Wine 1 Last Updated September 25 2008 04096 45mg wqWgiyj zcgseg 39 76 JM w in D CGVLM 9x UTUC 7 622 iM qo tz Lugatateaggwm 1459 1 L jjf If 391 2 7 SR L 0 lt aquot 2X 2 A C E Haida V a 5 m e cv ezz 22 Wyw fx a T39gk em M Vol K3 F149 v 2F a M I k b BQom Wg Kggtltgtwgt2kltgt h Irvf1 a SX7a2uu i EC RX 12 GET lt oRFD Efx s a r Efs ox x v Fx 9 6514 R9421 U L f r d g 1 uje e f 33 1 3g WWW I a a QM m a i gkmafsxg S ake W jw aqsw a Em 59 U 207m W01 3Q JL ma m gm 2M WAamp Wm th V x H Tot wzkv z a Ov Wg o e r f Vfojoys 3 Po Fo2zj gt N gtnvwrgtgt wMW P0225473X 39 T Q 0 fms 5717 quot g9 a H 2 391 29 y quot 2amp4quot k 4 3 7 0 0 V 230 O 2 w r 5 xgt 39 I e ML X if pawl cm rgr 2 gtF IIML f 9W r x e w w Y H X D Camp C 13 1 A 9m7 350 4mg quot EA 02103 anti c153 a s E 414Zlt94 M V 9f b 53950th H p 29A LEE Mafng Wth a A e waxes Teeaae339 3e if 3 w lea7 lt p o od esh 7196 gstiqr ol age EJb qguie L as Sm quot NKAAj w at M45 quot 39 39 A W m I f 2 314 iecmgayga a 3917 on 3 i V Ema Skew 401m rm quot f 5Mquot C 402473 39 jififf 2 S 0 x attrbjr L ampVJ1Mc is c A i if f 5 d9 Ma 15 ELL 4 w ex ry 392 M quotEf gy M El 14 EJLEZQQL x A3 S 2c v61 4 1 grze39w 3263 1 f4 31km if Ze39gm ird chcr femgtx v6 301wh Md 3 Oltgtltltbq C To Ol39 em l D 3 t aquot i2amp773d1 e39ltvc zj 7 1 r V Li 60 Zerx p 4K4 1 cl j fa 2349 7quot ch D 111 quot i 39 E j LKu e A If R L1 5 a 2 gt624 1 f iii and E 4 x 1 b n C r00 n 1 1 OLJM a Aquot 1m f xgufs 9 Ac 2 a u m ESEL UF R f 1 stk 3quot 396 K i 2 eHulvh e 2 r E V quot3981 iTiZ 39 39 7515 Q b 4M N 7 2Q 5 r I mquot 7 a 7 n Pre x arw W L QLampampzmm mmg gg 9 1 5 S 4 Cab Kgtxo 9 gtlt 4437 0 em ctm r 5 IVOK39T6QCLK 5 WK m pm SW S ETX s 1 Q Pu ONUA haCE Wu VanKL V af M jL a 6 2 K E vgtl w If i fi s wwng 39 BL LV 4 E flm quot56 2i 7 l E quot Q N S EL M WW a 5 95 track 5 7 SVFAX x quotgmima mm Hmn 6L9 BQsz s it quot Ti f h LT w 0 WNQf gga 90 9 VAL n y x p 0 Tkw l at g x blagquot cm 5 Mg ts 3f XVVJK H GQJ Egto b Luge gem Em mAMV339 12 0 P9152 Mama 44 EREGCK39JEH H92 v w WNW 91xvetgt lt 2 quotsw gt E xw quot f ECX cu P g l w t m U gtg gtD 01 whriE Ez LQwalm d lL 7t Ba 3P as 0 FE KgfhngJMiQmmw 7 amp 254 W meQ WW u h L g f 12 39 Mb 8Mk g 3 Q JL WltK Uamp7AVJAGM GK 09 A 6Q quotHodquot PS I Pu vmlta 045 I X t mbm C Y I g r gt m ak F 2 MALL39H Lama 77 by 0 K39 9 AJK V0 40 17790 WT M K Com LL 0 Palsy Viki L ac Emma ms W umzo quot B Lltr 4i 9 1 qr f if 2 337 395 M i Awl WV L Iquot r KgtD Yt P gtltgt T DV 2 xviWW a L 1 Liaiw Mrs c 33655 N M w 0 Mia Pt saa 1 a OWL WM Pew XIV ftsi aayw 7 Af f 39bz v quot 397 quotW gt x 39 55 M 39 4 30 w quot V A quottag 2 W g 7 L up wztw 7 g1 clay WNW 7quot 35474 49 if M pc39zica 3 Sf ogf wZK SQ h 2 33 f 39 39 A M x C Ff y L g w x WHAqu s Z ltaltM2 clt or at Some continuous and discrete distributions Table of contents 1 Continuous distributions and transformation rules A Standard uniform distribution U07 1 B Uniform distribution Ua7 b C Standard normal distribution N07 1 D Normal distribution NW7 0 E Standard exponential distribution F Exponential distribution With mean A G Standard Gamma distribution PO 1 H Gamma distribution PO A ll Discrete distributions and transformation rules A Bernoulli random variables B Binomial distribution C Poisson distribution D Geometric distribution E Negative binomial distribution F Hypergeometric distribution 1 Continuous distributions Each continuous distribution has a standard version and a more general rescaled version The transformation from one to the other is always of the form Y 1X b7 With a gt 07 and the resulting identities Lb fyy fl 1 my FX lt2 EY aEXb 3 VarY anaMX 4 Myt ethth 5 11 Standard uniform UO71 This distribution is pick a random number between 0 and 1 fxx V l l 1 if 0 lt x lt 1 0 otherwise 0 ifo0 m if0 1 1 if21 EX 7 12 VarX 112 et71 Mxlttgt T 12 Uniform Ua7 b This distribution is pick a random number between a and b To get a random number between a and b7 take a random number between 0 and 17 multiply it by b 7 a and add a The properties of this random variable are obtained by applying rules 175 to the previous subsection 1b7a ifaltxltb me 0 otherwise 0 if m S a 1amp1ng 1 1f m gtb EX a b2 VarX b 7 1212 ebt at M tb a 13 Standard normal NO71 This is the most important distribution in all of probability because of the Central Limit Theorem7 which states that the sums or averages of a large number of independent random variables is approximately normal7 no mat ter what the original distributions look like Speci cally7 if X is a random variable with mean u and standard deviation lt77 and if X17X27 are inde pendent copies of X7 and if Sn X1 Xn7 then for large values of n Sn is approximately normal with mean my and standard deviation 0 and Squot 7 i is well approximated by the standard normal distribution Fi Wm fxm is given in the table at the back of the book EX 0 VarX 1 MXt 622 14 Normal NW7 0 To work with a normal random variable X7 convert everything to Z scores777 where Z X 7 ua Z is then described by the standard normal distribu tion7 which you can look up in the back of the book Here are the formulas for X fX eiwil yZ 271172 is computed from Z scores EX M VarX lt72 Mt a2t22 15 Standard exponential The exponential distribution describes the time beween successive events in a Poisson process How long until the next click on my Geiger counter How long until this lightbulb burns out How long until the next campaign contribution comes in A key feature is that it is memoryless a one year old lightbulb has the same change of burning out tomorrow as a brand new lightbulb fX T 6 5vgt0 176 7 gt0 X 1 X 1 MXt 114 16 Exponential with mean A This is obtained by multiplying a standard exponential by A Unfortunately the letter A is used di erently for Poisson and exponential distributions If a Poisson distribution has an average rate of 7 then the waiting time is exponential With mean 17 When talking about the Poisson distribution we7d be inclined to say A rt While When talking about the exponential distribution we7d be inclined to say A 17 fX A le wA m gt 0 17 e zA m gt 0 EX A VmX A2 Mat 117 At 17 Standard Gamma distribution The sum of 7 independent standard exponential random variables is called a Gamma random variable It describes the time you need to wait for r Poisson events to happen eg the time it takes for 10 light bulbs to burn out for the Geiger counter to record 10 clicks or for 10 people to send in campaign contributions The formula for fX isn7t obvious and that for FX is complicated but the others are directly related to those of the exponential distribution fX le wr 71 m gt 0 complicated EX r VarX 7 Mat 7 lt1 e t 18 Gamma distribution F03 A This is a standard Gamma variable multiplied by A or equivalently the sum of 7 independent exponential variables7 each with mean A fX A TmTile zAr 71 m gt 0 complicated7 EX Ar VarX A27 MXt 1 A734 2 Discrete distributions and transformation rules The discrete random variables we will consider always take on integer values7 so we never rescale them Also7 the cdf is rarely useful7 with the notable exception of the geometric distribution The transformations that are more relevant are those for adding two independent random variables lf Z X Y7 with X and Y independent7 then fz2 ZfXWViZ 6 EZ EXEY 7 VarZ VarXVarY 8 M205 MXtMYt 9 21 Bernoulli A Bernoulli random variable is a variable that can only take on the values 0 and 1 We let p be the probability of 17 and 1 7 p the probability of 0 This example is easy to analyze7 and MANY interesting random variables can be built from this simple building block 17 p if m 0 fX T 1 if x 1 0 otherwise 5 1900 p VWX M171 MW 7 1pet71 22 Binomial A binomial random variable is the sum of n independent Bernoulli random variables all With the same value of p The usual application is counting the number of successes in n independent tries at something Where each try has a probability p of success It also applies to sampling With replacement and is a very good approximation to sampling Without replacement from very large populations When np and n1 7 p are both large say 20 or bigger then the binomial distribution is well approximated by the normal distribution When n is large at least 30 and p is small less than 01 then the binomial distribution is approximated well by the Poisson distribution fxtr pz1ipm EX np VWX 7114171 Mxt 1pet1 23 Poisson The Poisson distribution pronounced pvvah SON is the limit of binomial When n is large and p is small The correspondence is A np The Poisson distribution replicates itself in that the sum of a Poisson random variable and a independent Poissonu random variable is a Poissonu random variable Anything that involves the sum of many many long shot events eg number of people hit by lightning in a year or number of broken bones from playing soccer or number of clicks on a Geiger counter Will be described by the Poisson distribution Closely related to the Poisson distribution is a Poisson process In a Pois son process events accumulate at a certain average rate 7 The number of events in time t is then given by Poissonrt Examples include radioac tivity Where you have clicks per unit time typos Where 7 is errorspage and t is the number of pages7 industrial defects 7 equals the average number of defects per foot of sheet metal and t is the number of feet fX Aze Aml EX A VarX A MXt xet71 24 Geometric The geometric distribution describes the waiting time between successes in a Binomial process For instance7 ip coins and count the turns until the rst head7 or roll dice and count the turns until you get a 6 It is very much like the exponential distribution7 With A corresponding to 117 except that the geometric distribution is discrete While the exponential distribution is continuous fX pqg quotl7 1727 Whereq17p EX 11 VWX qp2 pet M t liqet 2 5 Negative binomial The sum X of 7 independent geometric random variables is given by the discrete analog of the Gamma distribution Which describes the sum of 7 independent exponential random variables 71 fX lt 1gtquziT7 7 7 17 Whereq17p 4 1900 rp VarX rqp2 prert M t X liqetV w 43 01 it V Kn6699M 6723 39 Lev ng 7 LR g Ms 5 Lu 4 as W LCMk L Mirx1 gt H 4 7w A 0554 L lt E 9L W aw 30 Q 9429 h 2 va fo 39 4quotth MFRx T l gt P43 u a k ECKfF fpf gt0 Etcwa wmc L gtlt 33 er Cmdmgw i f sh 2amp5 7 XRLK KKK XLK 31ltk lt 7 13 x31 CK ER eo EEEAXQQCWI 0 Er e92 ya 34 fofw xu 596 1 EN 1 ECm mer 3 2 7 39 4 L Al 39 cr39n Z39I T l 9 9M m ff Mimi 52 S EGG gt am 4 7 mik 39 m m M 4 mi j OASich QKH m ri e A990 J V MA w weak 45 41 3 wywew v 397 b 72 we W M397Bs w1ctv rsmr ed quot x o WV M 7 mar k WM w we 1 kew a a ewte M h MRS W3 V 0an 0ccm7IV Ca 39 gram 4Mquot quot 19 MM mm A n3 1 at If u T 9 3 59 L1 s H w S a w E i 3 3 39 4 i arAHM haw X2 H V S quotquotquotV A 3 t xf A A C g 92c Wgt 43 W tmm 2 V S a VOK SEO W PaJZwv pit1w gum Andra banx1912 W JS amp T H 39j 39139 z qquot39f f rgtK39 0quotquotquot VI M L Paw gum T sxl39 1 W WA 02 gt lt L 7W 1oquotl39 39 j X m Pquot V W VWRMWC LA X Q 0 PMME 1M M ABC 16 Comquot Ag d wfwd Ww o was w gtltX7 Y 1 450M 7quot PM W meg m WWW 31921 quot X 2 MW f f Kl Iww w I I f A CalCL r 7 my c lt3 D S 9c fox M W 4 Kai mw f n 7v E me 17 QXLTQIWET c n 01401 of A p Eur C613 0 25 13 H 2 T 77999 M wmlmd x mm4mfwww 1ng tgf hf 44c gtltgt 379014 Igtltgtltfrmf quot r C UAzirigth j 39 I A Alw4c In 7 m h L ML lt H4 ifi gk W A A1 Ex 3a m 2s QM 0433 7 f 5414 7 V7 HMM Arm UH 39 L quotgtWm am 4 43 9 g3 EWZAKQ quotma 6044 cm fwa Cam E f a EltW HM 39 quot 1ampow w H tgm W 9 7 fWmwmAm Ww4m w quotmwwwww f 3 g x39 H 2 and 2 x90 M A lt 7 71 4 3LVq my 341 V it Ex vfw quot if WML Am VT if Au ij lt 4EEJ2 C D W a qr Lacgggw Yl l l All 26 quot quot g1c fampw 7 64 1 quot Q 76uv C lt 1 93 T quotf 30 gw Mawbl 3 m Ed 43 5913 9 13qu 2 g f 29 5 g5 g393353f r as WW2ng Mm c g wm WW migtquot 7 w m w eg 04 13 2 680 62 Mom m a 9amp9 Z mw F U Km aw m a rquot wwmE1J lf g g 39 Rim 3 A 1 F36 f l4 quot39 quot 9 456 7 a 5 M111 vaw 1 Xud sucfe V We ESL v Q Ximcix 2 Y CM 71gt be M 64 X40 394 71 MC gag a We em a Z x1 Pm 7 V J XC FOCgt0 I 22 Frw a Zm xwa V vv D 7J 2ltf qoa 3 Mae 5 r f 3 x quot quot7 N7 k 1 VLF QM 44 v 1 C 6Q I 356 51A 3 1 if aim33 mi 3 47 1 quot 9 Wk K mud XT ZD E Mfw fw Cf ES 3 K 5 Lit if H 3 gmiw dr 1 M iii quot 1 EH a 2 WW T7 am A L Emmy f5 If WM 15 wan 6 3 gb qm 47 Cf git 7 7 C L 2 r39 L E 4 x1 JV 4 V pg Citawwfmw H M 953 39 k1 r V Loo H L9 7 7 7 gs 7 LA 7 50 MEI effmdv W 04 9 Cf 3 w 94gt 60 gt K gt0 7 7 V A em ET TJLo a Em gtEi fj Hick iv quot quot1quot 3 53 4quot f i 7 1 Lest Adan AM 6me Vagika Xazf if AA mug A10 LQMQb I 15 504 if diam w J 0quot X n L V 1W Lecture 7 BRANCHING PROCESSES 1 of 6 Course M562K Stochastic Processes I Term Spring 2008 Instructor Gordan Zitkovic Lecture 7 BRANCHING PROCESSES 71 A bit of history In the mid 19th century several aristocratic families in Victorian Eng land realized that their family names could become extinct Was it just unfounded paranoia or did something real prompt them to come to this conclusion They decided to ask around and Sir Francis Gal ton a quotpolymath anthropologist eugenicist tropical explorer geogra pher inventor meteorologist proto geneticist psychometrician and statistician and half cousin of Charles Darwin posed the following question 1875 Educational Times How many male children on average must each generation of a family have in order for the family name to continue in perpetuity The rst complete answer came from Reverend Henry William Wat son soon after and the two wrote a joint paper entitled One the prob ability of extinction of families in 1874 By the end of this lecture you will be able to give a precise answer to Galton39s question Sir Francis Galton 72 A mathematical model The model proposed by Watson was the following I A population starts with one individual at time n 0 Z0 1 2 After one unit of time at time n I the sole individual produces Z1 identical clones of itself and dies Z1 is an N0 valued random variable 5 a If Z1 happens to be equal to 0 the population is dead and nothing happens at any future time n gt 2 b If Z1 gt 0 a unit of time later each of Z individuals gives birth to a random number of children and dies The rst one has ZM children the second one Z12 children etc The last th one gives birth to Z121 children We assume that the distribution of the number of children is the same for each individual in every generation and independent of either the number of individuals in the generation and of the number of children the others have This distribution shared by all va and Z1 is called the offspring distribution The total number of individuals in the second generation is now Z1 22 2211 k 1 Last Updated October 14 2008 Lecture 7 BRANCHING PROCESSES 2 of 6 c The third fourth etci generations are produced in the same way If it ever happens that Zn 0 for some n then Zn 0 for all In 2 n the population is extinct Otherwise Zn Znl Zr 16 1 De nition 71 A stochastic process with the properties described in 2 and 5 above is called a simple branching process The mechanism that produces the next generation from the present one can di er from application to application It is the offspring distribution alone that determines the evolution of a branching process With this new formalism we can pose Galton39s question more precisely Under what conditions on the offspring distribution will the process Zn extinct ie when does HENO never go Mzn 2 1 for all n e No 1 74 hold 75 Construction and simulation of branching processes Before we answer Galton39s question let us gure out how to simulate a branching process for a given offspring distribution pnnEN0 Pk lP Z1 kl The distribution pnnENO is N0 valued we have learned how to simulate such distributions in Lecture 5 We can therefore assume that a transformation function F is known ie that the random variable 77 H7 is N0 valued with pmf pm where 7 N UO ll Some time ago we assumed that a probability space with a sequence 7nnENO of independent UO 1 random variables is given We think of 7HHENO as a sequence of random numbers produced by a computer Let us rst apply the function F to each member of 7HHENO to obtain an independent sequence nnnENO of N0 valued random variables with pmf pnhENU In the case of a simple random walk we would be done at this point an accumulation of the rst n elements of nnnENO would give you the value Xn of the random walk at time n Branching processes are a bit more complicated the increment ZHH 7 Zn depends on Zn the more individuals in a generation the more offspring they will produce In other words we need a black box with two inputs randomness and Zn which will produce Zn1i What do we mean by randomness Ideally we would need exactly Zn unused elements of nnnENO to simulate the number of children for each of Zn members of generation n This is exactly how one would do it in practice given the size Zn of generation n one would draw Zn simulations from the distribution pm nENO and sum up the results to get ZnHi Mathematically it is easier to be more wasteful The sequence m can nENo nENO be rearranged into a double sequence1 ZninENO1ENi ln words instead of one sequence of independent random variables with pmf pnnENO we have a sequence of sequences Such an abundance allows us to feed the whole row ZnJEN into the black box which produces ZHH from Zni You can think of Zm as the number of children the ith individual in the nth generation would have had he been born The black box uses only the rst Zn elements of ZnJEN and discards the rest Zn zO 1zn1 Zn 139 1 where all ZninENOiEN are independent of each other and have the same distribution with pmf pm Once we learn a bit more about the probabilistic structure of Zn simulate it nENo HENO we will describe another way to 1Can you nd a one tobne and onto mapping from N into N x N Last Updated October 14 2008

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I made $350 in just two days after posting my first study guide."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.