Probability Theory and Stochastic Processes
Probability Theory and Stochastic Processes ECE 541
Popular in Course
Popular in Engineering Electrical & Compu
This 81 page Class Notes was uploaded by Roel Green on Wednesday September 23, 2015. The Class Notes belongs to ECE 541 at University of New Mexico taught by Majeed Hayat in Fall. Since its upload, it has received 42 views. For similar materials see /class/212142/ece-541-university-of-new-mexico in Engineering Electrical & Compu at University of New Mexico.
Reviews for Probability Theory and Stochastic Processes
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/23/15
ECE 541 NOTES ON PROBABILITY THEORY AND STOCHASTIC PROCESSES MAJ EED Mi HAYAT These notes are made available to students of the course EOE 541 at the University of New Mexico They cannot and will not be used for any type of pro table repro duction or sale Parts of the materials have been extracted from texts and notes by other authors These include the text of Probability Theory by Y Chow and H Te icher7 the notes on stochastic analysis by T G Kurtz7 the text of Real and Complex Analysis by W Rudin7 the notes on measure and integration by A Beck7 the text of Probability and Measure by P Billingsley7 the text of Introduction to Stochastic Processes by P G Hoel7 S C Port7 and C J Stone7 and possibly other sources In compiling these notes I tried7 as much as possible7 to make explicit reference to these sources If any omission is found7 it is due to my oversight for which I seek the authors7 forgiveness I am truly indebted to these outstanding scholars and authors that I mentioned above and whom I obtained most of the materials from I am truly privileged to have been taught by some of them I also like to thank Dr Bradley Ratliff for helping in the tedious task of typing these notes This material is intended as a graduate level treatment of probability and stochastic processes It requires basic undergraduate knowledge of probability7 random variables7 probability distributions and density functions7 and moments A course like ECE340 will cover these topics at an undergraduate level The material also requires some knowledge of elementary analysis concepts such as limits and continuity7 basic set theory7 some basic topology7 Fourier and Laplace transforms7 and elementary linear systems theory Date December 177 2008i 2 MAJEED M HAYAT 1 FUNDAMENTAL CONCEPTS 11 Experiments The most fundamental component in probability theory is the notion of a physical or sometimes imaginary experiment whose outcome is revealed when the experiment is completed Probability theory aims to provide the tools that will enable us to assess the likelihood of an outcome or more generally the likelihood of a collection of outcomes Let us consider the following example Example 1 Shooting a dart Consider shooting a single dart at a target board represented by the unit closed disc D which is centered at the point 00 We write D 6 1R2 2 y2 S 1 Here 1R denotes the set of real numbers same as foooo and R2 is the set of all points in the plane 1R2 1R gtlt R where gtlt denotes the Cartesian product of sets We read the above description ofD as quotthe set of all points zy in R2 such that or with the property 2 y2 S 1 12 Outcomes and the Sample Space Now we de ne what we mean by an outcome An outcome can be missing the target or missed for short in which case the dart misses the board entirely or it can be its location in the scenario that it hits the board Note that we have implicitly decided or chose that we do not care where the dart lands as whenever it misses the board The de nition of an outcome is totally arbitrary and therefore it is not unique for any experiment It depends on whatever makes sense to us Mathematically we form what is called the sample space as the set containing all possible outcomes If we call this set 9 then for our dart example 9 missed U D where the symbol U denotes set union we say z E AUB if and only if z E A or z E B We write w E Q to denote a speci c outcome from the sample space 9 For example we may have u missed u 00 and w 01 02 however according to our de nition of an outcome w cannot be 11 13 Events An event is a collection of outcomes that is a subset in 9 Such a subset can be associated with a question that we may ask about the outcome of the experiment and whose answer can be determined after the outcome is revealed For example the question Q1 Did the dart land within 05 of the bulls eye can be ECE 541 3 associated with the subset of Q or event given by E 6 IR2 x2y2 S 14 We would like to call the set E an event Now consider the complement of Ql7 that is Q2 Did the dart not land within 05 of the bulls eye77 with which we can associate the event E07 were the superscript c denotes set complementation relative to Namely7 E0 Q E7 which is the set of outcomes or points or members of 9 that are not already in E The notation 77 represents set difference or subtraction ln general7 for any two sets A and B7 A B is the set of all elements that are in A but not in B Note that E0 my 6 R2 x2 y2 gt 14Umissed The point here is that if E is an event7 then we would want E0 to qualify as an event as well since we would like to be able to ask the logical negative of any question In addition7 we would also like to be able to form a logical or77 of any two questions about the experiment outcome Thus7 if E1 and E2 are events7 we would like their union to also be an event Here is another illustrative example for each n 127 7 de ne the subset En x7y 6 IR2 2y2 S 17171 and let US En be their countable union Notation we say u E Ufa En if and only if u 6 En for some n Similarly7 we say u E f En if and only if u 6 En for all It is not hard to see prove it that Uf1En E x7y E R2 2 y2 lt 17 which corresponds to the valid question did the dart land inside the board7 Thus7 we would want E which is the countable union of events to be an event as well Example 2 For each n 2 1 take An fln1n Then f1An 0 and Ufa An 711 Prove these Finally7 we should be able to ask whether or not the experiment was conducted or not7 that is7 we would like to label the sample space 9 as an event as well With this hopefully motivating introduction7 we proceed to formally de ne what we mean by events De nition 1 A collection f of subsets of Q is called a U algebra reads as sigma algebra if 1 Q E f 4 MAJEED M HAYAT 2U1 En E f whenever En E fn 12 3 E0 E f whenever E E f If f is a o algebra then its members are called events Note that f is a collection of subsets and not a union of subsets thus f itself is not a subset of 9 Here are some consequences of the above de nition you should prove all of them 1 Q E f 2 Hill En E f whenever En E fn 12 Here the countable intersection f En is de ned as follows w 6 oat En if and only if u 6 En for all n 3 A B E f whenever AB E f First prove that A B A BC Generally members of any o algebra not necessary corresponding to a random ex periment and a sample space 9 are called measurable sets Measurable sets were rst introduced in the branch of mathematics called analysis They were adopted to probability by a great mathematician named Kolmogorov Clearly in probability theory we call measurable subsets of 9 events De nition 2 Let Q be a sample space and let f be a o algebra of events We call the pair 9 a measurable space De nition 3 A collection D of subsets of Q is called a sub o algebra of D if 1 D C f this means that if A E D then automatically A E f 2 D is itself a o algebra Example 3 0 Q is a sub o algebra of any other o algebra Example 4 The power set of 9 which is the set of all subsets of Q is a o algebra In fact it is a mapirnal o algebra in a sense that it contains any other o algebra The power set of a set 9 is often denoted by the notation 2 Interpretation Once again we emphasize that it would be meaningful to think of a o algebra as a collection of all valid questions that one may ask about an ex periment The collection has to satisfy certain self consistency rules dictated by the ECE 541 5 requirements for a o algebra but what we mean by valid77 is really up to us as long as the self consistency rules de ning the collection of events are met Generation of o algebras Let M be a collection of events not necessarily a o algebra that is M C 7 This collection could be a collection of certain events of in terest For example in the dart experiment we may de ne M missed 6 IR2 14 S x2 y2 S 12 The main question now is the following Can we con struct a minimal or smallest o algebra that contains M If such a o algebra exists call it fM then it must possess the following property If D is another o algebra containing M then necessarily fM C D Hence fM is minimal The following Theorem states that there is such a minimal o algebra Theorem 1 Let M be a collection of events then there is a minimal o algebra containing M Before we prove the Theorem let us look at an example Example 5 Suppose that Q 70000 and M 700 1 0 It is easy to check that the minimal o algebra containing M is m 09 7001 000 01 7000 100 7000 u 100 Eccplain where each member is coming from Proof of Theorem Let CM be the collection of all o algebras that contain M We observe that such a collection is not empty since at least it contains the power set 29 Let us label each member of CM by an index 04 namely DD where 04 E I where I is some index set De ne fM D0 We need to show that 1 fM is a ad o algebra containing M and 2 that fM is a minimal o algebra Note that each DD contains 9 since each DD is a o algebra thus fM contains 9 Next if A E fM then A 6 D0 for each 04 E I Thus A0 6 D0 for each 04 E I since each DD is a o algebra which implies that A0 6 ag D0 Next suppose that A1A2 E fM Then we know that A1A2 6 D0 for each 04 E I Moreover Uf1An 6 D0 6 MAJEED M HAYAT for each 04 E I again since each Du is a o algebra and thus Uf1An 6 ag DD This completes proving that TM is a o algebra Now suppose that flu is another o algebra containing M we will show that TM C flu First note that since CM is the collection of all o algebras containing M then it must be true that f DM for some of E I Now if A 6 TM then necessarily A 6 DM since DM is one of the members of the countable intersection that de nes TM and consequently A E f since f D0496 This establishes that TM C flu and completes the proof of the theorem D Example 6 Let U be the collection of all open sets in R Then according to the above theorem there epists a minimal o algebra containingU This o algebra is called the Borel o algebra B and its elements are called the Borel subsets of IR Note that by virtue of set complementation union and intersection 8 contains all closed sets half open intervals their countable unions intersections and so on Reminder A subset U in IR is called open iffor every p E U there epists an open interval centered at x which lies entirely in U Closed sets are de ned as complements of open sets These de nitions eptend to R in a straightforward manner Restrictions of o algebras Let f be a o algebra For any measurable set U we de ne f U as the collection of all intersections between U and the members of 7 that is f U V U V E f It is easy to show that f U is also a o algebra which is called the restriction of f to U Example 7 Back to the dart eatperiment What is a reasonable o algebra for this experiment I would say any such o algebra should contain all the Borel subsets ofD and the missed event So we can take M missedl D It is easy to check that in this case 1 fM 8 W D U missedgt U 8 D where for any o algebra f and any measurable set U we de ne TUU as the collection of all unions between U and the members off that is TU U V U U V E 7 Note that TU U is not always a o algebra contrary to f U but in this eccample it is because missed is the complement of D ECE 541 7 14 Random Variables Motivation Recall the dart experiment7 and de ne the following transformation on the sample space X 9 a R de ned as XW 17 iwaD 07 if u missed where as before D 6 1R2 x2 y2 S 1 Consider the collection of outcomes that we can identify if we knew that X fell in the interval foo7 7 More precisely7 we want to identify w E Q Xw E 700777 or equivalently7 the set X 17oor7 which is the inverse image of the set 7007quot under the transformation or function X It can be checked that you must work these out carefully For 7 lt 07 X 17oor Q for 0 S r lt17X 17oor missed and for 1 S r lt ooX 17oor Q The important thing to note is that in each case7 X 17oor is an event that is7 X 17oor E fM where fM was de ned earlier for this experiment This is a direct consequence of the way we de ned the function X Here is another transformation de ned on the outcomes of the dart experiment De ne 10 if u missed 7 xz2y2 ifw E D 2 WW Let7s consider the collection of outcomes that correspond to Y lt 127 which we can write as w E QYw lt12 Y 17oo12 Note that Y l7oo7 12 6 1R2 2 y2 lt14 E fM Moreover7 we can also show that Y 1ioo7 2 D E fM Y 17oo11 D U missed Q E fM Y 1ioo7 0 0 E fM 8 MAJEED M HAYAT We emphasize again that Y 17oo r is always an event that is Y 17oo r E fM where fM was de ned earlier for this experiment Again this is a direct consequence of the way we de ned the function Y Motivated by these examples we proceed to de ne what we mean by a random variable in general De nition 4 Let 9 be a measurable space A transformation X Q a R is said to be f measurable if for every r E lRX 17oor E 7 Any measurable X is called a random variable Now let X be a random variable and consider the collection of events M w E Q Xw E foorr 6 R which can also be written more conveniently as X 17oo r r 6 IR As before let fM be the minimal o algebra containing M Then fM is the information that the random variable X conveys about the experi ment We de ne such a o algebra from this point on as UX the o algebra generated by the random variable X In the above example of X oX 0 missed 9 D This concept is demonstrated in the next example Also see HW1 for another example of oX Example 8 Back to the dart epample Let M 0missed 2 then it is easy to check that fM 0missed9 D which also happens to be a subset of fM as de ned in However we observe that in fact X 17oor E fM for any r 6 IR Intuitively sz which also calloX can be identi ed as the set of all events that the function X can convey about the epperiment In particular fM consists of precisely those events whose occurrence or nonoccurrence can be determined through our observation of the value of X In other words fM is all the information that the mapping X can provide about the outcome of the eccperiments Note that fM is much smaller than the original o algebra 7 which was B DUmissed UB D As we have seen before 0 missed 9 D C 8 D U missed U 8 D Thus X can only partially inform us of the true outcome of the experiment it can only tell us if we hit the target or missed it nothing else which is precisely the information contained in sz ECE 541 9 Facts about Measurable Transformations Theorem 2 Let 9 be a measurable space The following statements are equiv 00 alent 1 X is a measurable transformation 2 X 17oor E ffor all r 6 1R 3 X 1roo E ffor all r 6 1R 4 X 1roo E ffor all r 6 IR 5 X 1ab E ffor all a lt b 6 X 1ab E ffor all a lt b 7 X 1ab E ffor alla lt b lt gt X lt lt gt X lt K The equivalence amongst 17 2 and 3 is left as an exercise see HW1 The equivalence amongst 3 and 4 through 8 can be shown using the same technique used in HW1 To show that 1 implies 9 requires additional work and will not be shown here In fact7 9 is the most powerful de nition of a random variable Using 97 we can equivalently de ne oX as X 1B B E B which can be directly shown to be a o algebra see HW1 10 MAJEED M HAYAT 15 Probability Measure De nition 5 Consider the measurable space 97f7 and a function7 P7 that maps f into R P is called a probability measure if 1 PE 2 07 for any E E f 2 PQ 1 3 If E1E2 E f and if E Ej Q whent 31 j then PU1En 231 PEn The following properties follow directly from the above de nition Property 1 P 0 Proof Put E1 97E2 En Q in 3 and use 2 to get 1 PQ PQ UQJUQJU PQ 222 Pm 1 222 P Thus7 22 P 07 which implies that P 0 since P 2 0 according to Property 2 IfE1E2 7E E f and ifEl Ej Q whenz 31 j then PU1 221 WED Proof Put En En Q and the result will follow from 3 since P 0 from Property 2 Property 3 If E1E2 E f and E1 C E2 then PE1 S PE2 Proof Note that E1 U E2E1 E2 and E1 E2E1 Q Thus7 by Property 2 use 71 2 Z PltEHgt7 since Z Property 4 IfA1 C A2 C A3 6 7 then naoo 3 lim PAn PU A Proof Put B1 A1 B2 A2A1 Bn AnAn17 Then7 it is easy to check that Ufa An Ufa Bn and U711 An U2 Bn for any m 2 17 and that Bl Bj Q when t 31 j Hence7 PU1 Ai 21 PBZ and P 1 Al PBi ECE 541 11 But7 U2 Ai An7 so that PAn 21 PBi Now since 21 PBi converges to PBi7 we conclude that PAn converges to PBi7 which is equal to RUE1 Al Property 5 If A1 3 A2 3 A3 then limiH00 PAn P 1 An Proof See HW Property 6 For any A E f 0 S PA S 1 The triplet 97 P is called a probability space Example 9 Recall the dart epperiment We now de ne P on GMTM as follows Assign Pmissed 05 and for any A E D 8 assign PA areaA27T It is easy to check that P de nes a probability measure For epample PQ PD U missed PD Pmissed areaD27T 05 05 05 1 Check the other requirements as an epercise 16 Distributions and distribution functions Consider a probability space 9 f P7 and consider a random variable X de ned on it Up to this point7 we have devel oped a formalism that allows us to ask questions of the form what is the proba bility that X assumes a value in a Borel set E Symbolically7 this is written as Pw E Q Xw E B7 or for short7 PX E B with the understanding that the set X E B is an event ie7 member of f Answering all the questions of the above form is tantamount to assigning a number in the interval 01 to every Borel set Thus7 we can think of a mapping from B into 017 whose knowledge will provide an answer to all the questions of the form described earlier We call this mapping the distribution of the random variable X7 and it is denoted by uX Formally7 we have uX B a 01 according to the rule uXB PX E B7 B E 8 Proposition 1 x de nes a probability measure on the measurable space R7 8 Proof See HW 12 MAJEED M HAYAT Distribution Functions Recall from your undergraduate probability that we often associate with each random variable a distribution function de ned as PX S This function can also be obtained from the distribution of X Mg by evaluating uX at B focal which is a Borel set That is uX7oox Note that for any x S y uXxy 7 Property 7 FX is nondecreasing Proof For 1 3 27oo1 C OO2 and uX7oo1 S uX7oo2 since MX is a probability measure see Property 3 above Property 8 limmH00 1 and limmn m 0 Proof Note that 700 00 Uf17oo n and by Property 4 above limH00 uX7oo uXU1700n uX7oooo 1 since MX is a probability measure Thus we proved that limH00 1 Now the same argument can be repeated if we re place the sequence n by any increasing sequence x Thus limbH00 1 for any increasing sequence x and consequently limmH00 1 The proof of the second assertion is left as an exercise Property 9 FX is right continuous that is limH00 for any mono tonic and convergent sequence xn for which xn l y Proof For simplicity assume that xn yn 1 Note that uX7ooy and fooy f17ooyn 1l So by Property 5 above limH00 uX7ooyn 1 uX 1700yn 1 uX7ooy Thus we proved that limH00 FXyn 1 In the same fashion we can generalize the result to obtain limH00 FXy z for any sequence for which xn l 0 This completes the proof Property 10 FX has a left limit at every point that is limH00 ewists for any monotonic and convergent sequence x Proof For simplicity assume that xn y 7 n 1 for some y Note that 700 Uf17ooyin 1 So by Property 4 above limH00 FXy7n 1 limH00 uX7ooy7 ECE 541 13 n ll MXU17ooy 7 n ll MX7ooy In the same fashion we can generalize the result to obtain limH00 MX7ooy for any sequence at T 1 Property 11 FX has at most countably many discontinuities Proof Let D be the set of discontinuity points of FX We rst show a simple fact that says that the jumps or discontinuity intervals corresponding to distinct discontinuity points are disjoint More precisely pick 046 6 D and suppose without loss of generality that 04 lt B Let ID FXoFFXoz and I FXB FXB represent the discontinuities associated with 04 and B respectively Note that FXa lt FXB this follows from the de nition of FX the fact that 04 lt B and the fact that FX is nondecreasing Hence ID and I are disjoint From this we also conclude that the discontinuities jumps associated with the points of D form a collection of disjoint intervals Next note that D Uff Dn where Dn x E R 7 FXz gt n l ln words Dn is the set of all discontinuity points that have jumps greater than n l Since the discontinuities corresponding to the points of Dn form a collection of disjoint intervals the length of the union of all these disjoint intervals should not exceed 1 why Hence if we denote the cardinality of Dn by fo then we have n lfo S 1 or fo S n Hence D is countable since it is a countable union of nite sets this is a fact from elementary set theory Note We could have nished the proof right after by associating the points of D with a disjoint collection of intervals each containing a rational number In turn we can associate the points of D with distinct rational numbers which proves that D is countable Discrete random variables If the distribution function of a random variable is piecewise constant then we say that the random variable is discrete Note that in this case the number of discontinuities is at most countably in nite and the random variable may assume at most countably many values say a1a2 14 MAJEED M HAYAT Absolutely continuous random variables If there is a Borel function fX R a 0 00 with the property that ft dt 1 such that fix fXt dt then we say that X is absolutely continuous Note that in this event we term fX the probability density function pdf of X Also note that if Fm is differentiable then such a density exists and it is given by the derivative of FX Example 10 Let X be a uniformly distributed random variable in 02 Let the function g be as shown in Fig 1 below Compute the distribution function on 900 12 FIGURE 1 Solution If y lt 0 Fyy 0 since Y is nonnegative lf 0 S y S 1Fyy PX S y2UX gt1iy2 05y05 Finally ify gt lFyy 1 since Y 3 1 The graph of Fyy is shown above Note that Fyy is indeed right continuous everywhere but it is discontinuous at 0 In this example the random variable X is absolutely continuous why7 On the other hand the random variable Y is not absolutely continuous because there is no Borel function fy so that Fyz fix fyt dt Observe that Fy has a jump at 0 and we cannot reproduce this jump by integrating a Borel function over it we would need a delta function7 which is not really a function let alone a Borel function It is not a discrete random variable either since Y can assume any value from an uncountable collection of real numbers in 01 ECE 541 15 17 Independence Consider a probability space 977 P7 and let D1 and D2 be two sub U algebras of 7 We say that D1 and D2 are independent if PA B PAPB for every A 6 D1 and B 6 D2 For example7 if X and Y are random variables in 97f7 then we say that they are independent if 0X and 0Y are independent Note that in this case we auto matically have an nxny FXyx7y FXFYy7 and fXYW fXfYy if these densities exist 16 MAJEED M HAYAT 2 EXPECTATIONS Recall that in an undergraduate probability course one would talk about the ex pectation7 average7 or mean of a random variable This is done by carrying out an integration in the Riemann sense with respect to a probability density function pdf It turns out that the de nition of an expectation does require having a pdf It is based on a more or less intuitive notion of an average We will follow this general approach here and then connect it to the usual expectation with respect to a pdf whenever the pdf exists We begin by introducing the expectation of a nonnegative random variable7 and will generalize thereafter Consider a nonnegative random variable X7 and for each n 2 17 we de ne the sum Sn lt X S We claim that Sn is nondecreasing If this is the case to be proven shortly7 then we know that Sn is either convergent to a nite number or Sn T 00 In any case7 we call the limit of Sn the empectatz39on ofX7 and symbolically we denote it as EX Thus7 we de ne EX limyHoe Sn To see the monotonicity of Sn7 we follow Chow and Teicher 1 and observe that 2ltXi71 ltXltW ltXlt2 1U2 1ltX 2 2n1 7 2n1 2n1 7 2n1 2n1 thus7 00 239 239 239 1 239 1 239 2 Sn Zi1 2quot1P2Tb1 lt X S 231 P21 lt X S 221 1 1 2 00 239 239 239 1 00 239 1 239 1 S W1 Pb lt X S W1 271 ZnilP nL lt X S 22 271 721P211 lt X lt 2i2 7 2n1 00 i i i1 00 i i i1 Elem odd 2nT P2n1 lt X S 2n1 Elem even 2n1P2n1 lt X 3 WM 2n391P2n391 lt X S Sn1 lt lt X s 231 2 lt X g 321 lt X s 32 lf EX lt oo 7 we say that X is integrable General Case If X is any random variable7 then we decompose it into two parts a positive part and a negative part More precisely7 let X4r max0X and X max07 7X then clearly X X4r 7 X 7 and both of X4r and X are nonnegative Hence7 we use our de nition of expectation for a nonnegative random ECE 541 17 variable to de ne EX EX 7 EX whenever EX lt 00 or ElX lt 00 or both In cases where EX lt 00 and EX 00 or EX lt 00 and EX 00 we de ne EX foo and EX 00 respectively Finally EX is not de ned whenever EX ElX 00 Also EHXH EXJr X exists if and only if EX does Special Case Consider any event E and let X IE a binary random variable IEw 1 if u E E and IEw 0 otherwise it is called the indicator function of the event Then EX Proof Note that in this case 4 5 2 21p2 21ltX31 and the second quantity is simply Thus Sn a When PE 1 we say that X 1 with probability one or almost surely In general if X is equal to a constant c almost surely then EX c Another Special Case A random variable that takes only nitely many values that is X EXAM1A where A i 1n are events It is straightforward 221 xiPm Discrete random variables It is easy to see that for a discrete random variable X Ele 21 aiPX ai 2 a FXa i FXagt See HW 5 prove it following the approach as in the previous case that EX Notation and Terminology EX is also written as fQXwPdu which is called the Lebesgue integral ofX with respect to the probability measure P Often cumbersome notation is avoided by writing fQXPdw or simply f9 XdP Linearity of Expectation The expectation is linear that is EaX bY aEX bEY This can be seen for example by observing that any nonnega tive random variable can be approximated from below by functions of the form 21 zileXngQu where for any event E the random variable IEw 1 if u E E and IEw 0 otherwise Recall that IE is called the indicator function of the set Indeed we have seen such an approximation through our de nition of the 18 MAJEED M HAYAT expectation Namely if we de ne i lt5 mm Z Wadggagwi i1 then it is easy to check that Xnw a Xw as n a 00 In fact EX was de ned as limH00 Ean where Ean precisely coincides with Sn described above recall that E1Ew Now to prove the linearity of expectations we note that if X and Y are random variables with de ned expectations then we can approximate them by Xn and Y respectively Also Xn Yn would approximate X Y Next we observe that for nonnegative X and Y Eanl ElYnl 221 W lt X S 1 221 213 lt Y S 1 Zzlzglg 2 lt X 71o2i lt Y 3 gt ZZ121P lt X71 2iltY72t71gt 22121Ps ltX ewe ltY 2 But 2121Plt2i lt X g 71 m 2 lt Y gt EX Ha Think about it Thus we have shown that EXn EYn EXnYn Now take limits of both sides and use the de nition of EX EY and EX Y as the limits of EXn ElYn and EXn Yn respectively to conclude that EX EY EX Y The homogeneity property ElaX aEX can be proved similarly It is easy to show that if X 2 0 then EX 2 0 in addition if X S Y almost surely this means that PX S Y 1 then EX S EY Expectations in the Context of Distributions Recall that for a nonnegative random variable X EX limH00 lt X S But we had seen earlier that lt X S So we can write lt X g as We denote the limit of the latter by flR zdiiX which is read as the Lebesgue integral ofz with respect to the probability measure or distribution MX In summary we have EX f9 XdP flR xdiiX Of course we can extend this notion to a general X in the usual way ie splitting X to its positive and negative parts ECE 541 19 A pair of random variables Suppose that X and Y are rv7s de ned on the measurable space 97 For any B1 B2 6 B we use the convenient notation X E B1Y 6 B2 for the event X 6 B1 Y 6 B2 Moreover7 we can also think of the pair X and Y as a vector X7 Y in R2 Now consider any B E 82 the set of all Borel sets on the plane7 ie7 the smallest U algebra containing all open sets in the plane of the form B1 gtlt B2 where B1 B2 are open sets in IR ie7 rectangles with open sides7 or simply open rectangles We note that X7 Y E B is always an event because X7 Y E B X E B1 Y 6 B2 In fact7 X7 Y E B is an event for any B E 82 not just those that are open rectangles Here is the proof Let D be the collection of all Borel sets in 82 for which X7 Y E B E f As we had already seen7 D contains all open rectangles ie7 Open sets of the form B1 gtlt B2 Further7 it is easy to show that D quali es to be called a Dynkin class of sets de nition to follow below Also7 the collection 8 of all Borel sets in the plane that are open rectangles is closed under nite intersection this is because B1 gtlt B2 01 gtlt 02 B1 Cl gtlt B2 02 To summarize7 the collection D is a Dynkin class and it contains the collection 87 that is closed under nite intersection and contains all open rectangles in the plane Now by the Dynkin class Theorem see below7 D contains the U algebra generated by 87 which is just 82 why Hence7 XY E B E f for any B E 82 Thus7 for any Borel set E in the plane7 we can de ne the joint distribution of X and Y as MXyB PX7 Y E B This can also be generalized in the obvious way to de ne a joint distribution of multiple say 71 random variables over 8n the Borel subsets of IR Now back to the de nition of a Dynkin class and the Dynkin class Theorem Any collection D in any set 9 is called a Dynkm class if 1 Q 6 D7 2 E2 E1 6 D whenever E1 C E2 and E17E2 6 D7 and 3 if E1 C E2 C then Uf1En E D In the same way that proved that for any collection of sets M there is a minimal U algebra containing M we can prove that there is a minimal Dynkin class7 DM7 20 MAJEED M HAYAT that contains M It is easily shown that any o algebra is automatically a Dynkin class prove this Any collection 8 in any set 9 is called a 7T class if A B E 8 whenever A7 B E S The Dynhin Class Theorem see Chow and Teicher7 for example states Theorem 3 US C D where S is a 7T class andD is a Dynhin class then D contains 08 the o algebra generated by 8 As a special case D5 08 Before we prove the Theorem7 we prove the following Lemma Lemma 1 IfD is a Dynhin class as well as a 7T class then it is automatically 0 algebra Proof For any E 6 D7 9 E E0 E D since D is a Dynkin class Now suppose that A1A2 6 D Put B1 A1B2 A1 U A27 Bn U1Ai Note that Bfl LIAf E D since Af E Di 2 17 and D is a 7T class hence7 Bn 6 D7 n 2 1 Note that B1 C B2 C and Al Bi 6 D Hence7 D is a o algebra Proof of the theorem from Chow amp Teicher We will prove D 3 08 by showing that D is a o algebra7 which can be established7 in turn7 by using the above Lemma if we can show that D is a 7T class ln fact7 we will show that D5 is a o algebra and use this fact to conclude that D 3 08 since D5 C D To this end7 de ne l1 A C Q A S E D5for everyS E 8 Note that S C U automatically since 8 is a 7T class We now show that U is a Dynkin class Clearly7 Q E U Suppose that E1 C E2 and E1E2 EU For any S68E2E1 SE2 SE1 SED5 since E2 SE1 S E U and since E2 S 3 E1 S Now suppose that E1 C E2 6 M Then for any S E S UfilEi S Uf1Ei S E U since E1 S C E2 S C 6 D5 This establishes that U is a Dynkin class7 in which case we must have M 3 D5 Consequently for any A 6 D5 and any S E S A S 6 D5 Now de ne l1 A C Q A S E D5foreveryS 6 D5 Note that If D S because of our earlier observation that for any A 6 D5 and any S E S A S 6 D5 We can also show in a similar way to that for L that If is a Dynkin class Hence7 l1 3 D57 which implies A B 6 D5 whenever A 6 D5 and B 6 D5 ECE 541 21 Hence D5 is a 7T class which together with the above Lemma establish the fact that D is a 0 algebra containing 8 Hence D 3 D5 3 08 To prove the last conclusion ofthe Theorem recall that since 08 is also a Dynkin class because it is a 0 algebra containing 8 08 3 D5 since D5 is the minimal Dynkin class containing 8 However if we apply the above theorem to D D we conclude D5 3 08 Thus D5 08 D Expectation in the context of distribution function Note that we can use the de nition of a distribution function to write for a non negative rv X EX limH00 7 Now without being too fussy we can imagine that if FX is differentiable with derivative fX then the above limit is the Reimann integral ff foz dz which is the usual formula for expectation for a random variable that has a probability density function Expectation of a function of a random variable Now that we have EX we can de ne EgX in the obvious way where is g is a Borel measurable transformation from R to IR Recall that this means that g 1B E B for any B E 8 In particular EgX ngXdP fleduX Also in the event that X has a density function we obtain the usual expression ff gfXz dz Markov Inequality If 05 is a non decreasing function on 0oo then Ple gt 6 S In particular if we take gtt t2 and consider X 7 X in place of X where X EX then we have PlX 7 Xl gt 6 S Note that the numerator is simply the variance of X Also if we take t t then we have Ple gt 6 S EHXH 5 Proof Since 05 is nondecreasing gt 6 gt Further note that 1 2 X gt5w for any 40 E 9 since any indicator function is either 0 or 1 Now note that le 2 X gt5 le 2 X gt5 e Now take expectations of both ends of the inequalities to obtain 22 MAJEED M HAYAT E le 2 EEI X gt5l 6Ple gt 6 and the desired result follows See the rst special case after the de nition of expectation Chernoff Bound This is an application of the Markov inequality and it is a useful tool in upperbounding error probabilities in digital communications Let X be a non negative random variable and let 6 be nonnegative Then by taking gtt 69 we obtain PX gt x S e QEEleeXl The right hand side can be minimized over 6 gt 0 to yield what is called the Chemo bound for a random variable X ECE 541 23 3 ELEMENTARY HILBERT SPACE THEORY Most of the material in this section is extracted from the excellent book by W Rudin Real 55 Complecc Analysis A complex vector space H is called an inner product space if VLy E H we have a complex valued scalar lt x y gt read the inner product between vectors x and y such that the following properties are satis ed 1 2 3 4 ltzy gt ltyz gtVzy E H ltzyz gtlt zz gt ltyz gtVzyz E H ltozxy gta lt Ly gtVzy E Hoz E C ltxxgt20Vx Handltzxgt00nlyifx0 Note that 3 and 1 together imply lt Lay gt o lt my gt Also 3 and 4 together imply lt zx gt 0 if and only if z 0 It would be convenient to write lt zx gt as HzHZ which we later use to introduce a norm on H 31 Schwarz Inequality It follows from properties 1 4 that l lt my gt l S ll ll llyltv ay E H Proof Put A lle2 0 and B l lt my gt By 4 lt z7ozryx7ozry gt2 0 for any choice of 04 E D and all r 6 R which can be further written as using 233 6 7 T04 lt yx gt 7m lt my gt r2HyH2lal2 2 0 Now choose 04 so that 04 lt yx gtl lt my gt Thus 6 can be cast as 72rl lt my gt l rzllyllz 2 0 or A 7 2TB r20 2 0 which is true for any 7 6 IR Let r12 W denote the roots of the equation A72rBTZC 0 Since A 7 2TB r20 2 0 it must be true that B2 7 AC 3 0 since the roots cannot be real unless they are the same which implies l lt my gt F S or llt my gt l S M M Note that l lt my gt l whenever z By B E D Also if l lt my gt l then it is easy to verify that lt z 7 Wyz 7 gt 0 which implies using z 7 0 Thus l lt my gt l if and only if z is proportional to y 24 MAJEED M HAYAT 32 Triangle Inequality yH S HyHNmy E H Proof This follows from the Schwarz inequality Note that NZ 2 0 and expand to obtain W HyH2 lt w gt W W W 2i lt w gt w New by MHzH from which the desired result follows Note that HyH2 foan if lt my gt 0 why Schwarz inequalityy H96H2H11H22l lt 9671 gt i S H96H2H9H22H96HHyH This is a generalization of the customary triangle inequality for complex numbers7 which states that 7 S W i i 2H 7yz E D 33 Norm We say is a norm on H if 1 M 2 o 2 0 only if z 0 3 H969H S Hill HyH 4 HWH M WW 6 D AAAA With the triangular inequality at hand7 we can de ne on members of H as follows lt Lx gt12 You should check that this actually de nes a norm This yields a yardstick for distance77 between the two vectors x y 6 H7 de ned as We can now say that H is normed space 34 Convergence We can then talk about convergence a sequence xn E H is said to be convergent to x written as xn 7 x or limH00 xn x if for every 6 gt 07 there exists N E lN such that 7 lt 6 whenever n gt N 35 Completeness An inner product space H is complete if any Cauchy sequence in H converges to a point in H A sequence yn il in H is called a Cauchy if for any 6 gt 03 N E lN such that 7ymH lt E for all mm gt N Now7 if H is complete7 then H is called a Hilbert space Fact If H is complete7 then it is closed This is because any convergent sequence is automatically a Cauchy sequence 36 Convex Sets A set E in a vector space V is said to be a convex set if for any my 6 E7 and t E 017 the following point Zt tz 17ty E E In other words7 ECE 541 25 the line segment between z and y lies in E Note that if E is a convex set7 then the translation of E7 E z y z y E E7 is also a convex set 37 Parallelogram Law For any x and y in an inner product space7 yllz Hxiyllz Qllxllz 2HyH2 This can be simply veri ed using the properties of an inner product See the schematic below FIGURE 2 38 Orthogonality lf lt my gt 07 then we say that z and y are orthogonal We write x l y Note that the relation l is a symmetric relation that is7 x l y gt y l x Pick a vector x 6 H7 then nd all vectors in H that are orthogonal to x We write it as xi y E H lt my gt 0 and xi is a closed subspace To see that it is a subspace7 we must show that xi is closed under addition and scalar multiplication these are immediate from the de nition of an inner product see Assignment 4 To see that xi is closed7 we must show that the limit of every convergent sequence in xi is also in xi Let xn be a sequence in xi and assume that xn a 0 We need to show that 0 6 xi To see this7 note that l lt no gt l l lt 0 7 xn xmm gt l l lt 0 xnw gt lt zmm gt l l lt x0 iznw gt l S HmoimnHHzH by Schwarz inequality but7 the term on the right converges to zero7 so l lt no gt l 07 which implies x0 l x and 0 6 xi Let M C H be a subspace of H We de ne ML meMxiNz E M It is easy to see that ML is actually a subspace of H and that ML M It is also true that ML is closed7 which is simply because it is an intersection of closed sets recall that xL is closed 26 MAJEED M HAYAT Theorem 4 Any non empty closed and conuer set E in a Hilbert space H contains a unique element with smallest norm That is there erists 0 6 E such that lt MW e E Proof see also Fig 3 Emistence Let 6 infHzH z E In the parallelogram law replace x by w2 and y by y2 to obtain z y H96 yllz 2ll96ll2 Qllyllz 4HTH2 Since wz E E because E is convex 2 62 Hence 7 llziyllz 2llll22ll92ll 7452 By de nition of 6 there exists a sequence ynf 1 E H such that limH00 6 Now replace x by yn and y by ym in 7 to obtain 8 Hence lyn 7 ymllz S 2llynll2 2llymll2 7 46239 lim My 7 ymHZ 2 lim HynHZ 2 lim HymHZ i 462 262 262 i 462 0 and we conclude that ynf 1 is a Cauchy sequence in H Since H is complete there exists 0 6 H such that yn a 0 We next show that 6 which would complete the existence of a minimal norm member in E Note that by the triangle inequality S ll o 7 leOH 7 Hull Now take limits of both sides to conclude that limH00 But we already know that limH00 6 which implies that 6 Uniqueness suppose that x 31 x Lz E E and 6 Replace y in 7 by x to obtain z i z HZ 262 262 i 462 0 which implies that z x Hence the minimal norm element in E is unique D Example 11 For any cced n the set C of all n tuples 0f complecc numbers z 12 xnx E C is a Hilbert Space where with y y1y2 yn we de ne A 7 lt y gt 221 ECE 541 27 Mx Shortest distance Qx FIGURE 3 Elevating M by p P X FIGURE 4 Correction The upper M should read Mi Example 12 L2ab f fp2dp lt 00 is a Hilbert space with lt g gt ffp pdp lt ff gt flzdpPZ Actually we need Schwarz inequality for eppectations before we can see that we have an inner product in this case More on this to follow Theorem 5 Projection Theorem IfM C H is a closed subspace of a Hilbert space H then for every p E H there epists a unique decomposition p Pp Qp where Pp E M and Qp 6 Mi Moreover 1 M2 HPrHZ HQ cH2 2 If we think of Pp and Qp as mappings from H to M and ML respectively then the mappings P and Q are linear 28 MAJEED M HAYAT 3 Pa is the nearest point in M to x and Qa is the nearest point in ML to x Pa and Qa are called the orthogonal projections of z into M and ML respectively Proof Emistence Consider Mzg we claim that Mx is closed Recall that M is closed7 which means that if xn E M and xn 7 0 then 0 6 M Pick a convergent sequence in z M7 call it 2 Now 2 z yn for some yn E M Since 2 is convergent7 so is y but the limit of yn is in M7 so z limtH00 yn E z M We next show that z M is convex Pick 1 and 2 6 z M We need to show that for any 0 lt04 lt170w117a2 xM But 1 xy1y1 6 M7 and 2 zy2 yg E M So ax117ax2 zay117ay2 E Msince 9117ay2 E M By the minimal norm theorem7 there exists a member in z M of smallest norm Call it Qx Let Pa z 7 Qx Note that Pa 6 M We need to show that Qa 6 Mi Namely7 lt Qxy gt 07 Vy E M Call Qa z and note that S Vi E M Pick 7 z 7 04y where y 6 M7 1 S 7 oarHZ lt z 7 ayz 7 04y gt7 or 0 S 704 lt yz gt 7o lt zy gt llall2 Pick oz lt zy gt We obtain 0 S 7 lt zy gt 2 This can hold only if lt zy gt 07 ie7 z is orthogonal to every y E M therefore7 Qa 6 Mi Uniqueness Suppose that z Pa Qa Px Qx 7 where Px Px E M and Qx Qx 6 Mi Then7 Pa 7 Px Q 7 Qx where the left side belongs to M while the right side belongs to Mi Hence7 each side can only be the zero vector why7 and we conclude that Pa Px and Qx Qx Minimum Distance Properties To show that Pa is the nearest point in M to a pick any y E M and observe that yllz HPz Q 7yll2 HQz Px 7yH2 HQzHZ HP 7 MP Since Pa 7 y E The right hand side is minimized when HPz 7 07 which happens if and only if y Px The fact that Qa is the nearest point in ML to x can be shown similarly Linearity Take my 6 H Then7 we have x Pa Qa and y Py Qy Now aby anaQbebe On the other hand7 aby PabyQazby ECE 541 29 Thus7 we can write Pa by 7 aPz 7 be 7Qa by an be and observe that the left side is in M while the right side is in Mi Hence7 each side can only be the zero vector Therefore7 Pa by aPz be and Qa by an be D 3O MAJEED M HAYAT 4 CONDITIONAL EXPECTATIONS FOR L2 RANDOM VARIABLES 41 Holder inequality for expectations Consider the probability space 9 f P7 and let X and Y be random variables The next result is called Holder inequality for mpectatwns Theorem 6 pr gt1q gt1f0rwhichp 1q 1 1 then EHXYH S EHXIZ WPEHYI leq Proof See for example I p 105 Note that if EHXIpllp 07 then X 0 as7 and hence EHXYHI 0 and 6 holds If EHXIpllp gt 0 and EHY qllq 007 then 6 also holds trivially Next7 consider the case when 0 lt EHXIpllp lt 00 and 0 lt EHqullq lt 00 Let U IXIEHXIpllp and V IYIEHqullq and note that EHUV EHVM 1 Using the convexity of the logarithm function7 it is easy to see that for any 17 b gt 07 1 bq 1 1 logltti 7 2 7 loga 7 logbq7 10 q 10 q and the last term is simply log ab From this it follows that a bq ab 3 7 i p q Thus7 EUV lEmlenq 331 q p q 10 from which the desired follows B When p q 27 we have EHXYH S EX2 l2EY2l27 and this result is called Schwarz inequality for expectations Let H be the collection of square integrable random variables Now EaXbY2 CREW b2EY2 2abEXY a2EX2 b2EY2 2abEX212EY212 lt 00 where we have used Schwarz inequality for expectations in the last step Hence7 H is a vector space Next7 Schwarz inequality for expectations also tells us that if X and Y are square integrable7 then EXY is de ned ie7 nite We can therefore de ne lt XY gt EXY and see that it de nes an inner product Technically we need to consider the equivalence class of random variables that are equal as otherwise7 ECE 541 31 we may not have an inner product Can you see why We can also de ne the norm HXH2 lt XX gt12 This is called the LgHOFHI associated with 97 P Hence we can recast H as the vector space of all random variables that have nite LgHOI HI This collection is often written as L2Qf P It can be shown that L2Qf P is complete see 2 pp 67 Hence L2Qf P is a Hilbert space In fact L2QD P is a Hilbert space for any o algebra D 42 The L2 conditional expectation Let X Y E L2Qf P Clearly L2 oX P is a closed why subspace of L2 9 f We can now apply the projection theorem to Y with L2QUX P being our closed subspace and obtain the decomposition Y PY QY where PY E L2QUXP and QY E L2QUXPL We also have the property that HY 7 PYH2 S HY 7 Y Hg V Y E L2QUX P We call PY the conditional expectation of Y given oX which we will write from this point an on as EYl0X Exercise 1 I Show that ifX is a D measurable rv then so is aXa 6 3E Show that ifX and Y are D measurable so is X Y See Homework The above exercise shows that the collection of square integrable D measurable random variables is indeed a vector space Theorem 7 Consider Q 7 P and let X be a random variable A random variable Z is a oX measurable random variable if and only ifZ hX for some Borel function h D The proof is based on a corollary to the Dynhin class Theorem We omit the proof which can be found in Since EXloY is oY measurable by de nition we can write it explicitly as a Borel function of Y For this reason we often time write EXlY in place of EleoY 43 Properties of Conditional Expectation Our de nition of conditional ex pectation lends itself to many powerful properties that are useful in practice One of the properties also leads to an equivalent de nition of the conditional expectation which is actually the way commonly doneisee the comments after Property 433 32 MAJEED M HAYAT However proving the existence of the conditional expectation according to the alter native de nition requires knowledge of a major theorem in measure theory called the Radon Nikodym Theorem which is beyond the scope of this course That is why in this course we took a path for de ning the conditional expectation that is based on the projection theorem which is an important theorem to signal processing for other reasons as well Here are some key properties of conditional expectations 431 Property 1 For any constant a EalD a This follows trivially from the observation that Pa a why7 432 Property 2 Linearity For any constants ab EaX bYlD aEXlD bElYlD This follows trivially from the linearity of the projection operator P see the Projection Theorem 433 Property 3 Let Z EleD then EXY EZY for all Y E L2QD P lnterpretation Z contains all the information that X contains that is relevant to any D measurable random variable Y Proof Note that EXY EPXQXY EPXYEQXY EPXY EZY The last equality follows from the de nition of Z Conversely if a random variable Z 6 L2QD P has the property that EZY EXYVY E L2QD P then Z EleD To see this we need to show that Z PX almost surely Note that by assumption EYX 7 Z 0 for any Y E L2QD P Therefore EYPXQX7Z 0 or EYPXEjYQX7EYZ 0 or EYZ 7 PX 0V Y E L2QD P since ElYQX 0 In particular if we take Y Z 7 PX we will conclude that EZ 7 PX2 0 which implies Z PX almost surely Thus we arrive at an alternative de nition for the L2 conditional eppeetation ECE 541 33 De nition 6 We de ne Z EleD if 1 Z 6 L2QD7 P and 2 EZY ElXY v Y e L2 2D P We will use this new de nition frequently in the remainder of this chapter 434 Property 4 Smoothing Property For any X7 ElX EEXlDl To see this7 note that if Z EXlD7 then EZY ElXY for all Y E L2QD7 P Now take Y 1 to conclude that ElX EZ 435 Property 5 If Y E L2QDP7 then ElXYlD YEleD To show this7 we check the second de nition for conditional expectations Note that YEleD 6 L2 D P7 so all we need to show is that EXYW EYEXlDW for any W 6 L2 D P But we already know from the de nition of EleD that EWYEXlD ElVYX7 since WY 6 L2QD7 P As a special case7 we have ElXY EEXYlY EYEXlY 436 Property 6 For any Borel function g R a R EgYlD gY whenever Y E L2QD7 P For exarnple7 we have EgYlY gY To prove this7 all we need to do is to observe that gY E L2QD7 P and then apply Property 435 with X 1 437 Property 7 Iterated Conditioning Let D1 C D2 C 7 Then7 EleDll ElEleDZHDll Proof Let Z EEXlD2 l D1 First note that Z 6 L2QD1P7 so we only need to show that EZY EXYV Y e L2QD1 P Now EZY EYEEXlD2D1 Since Y is Dt rneasurable7 it is also Dg rneasurable thus7 we have E YEEXlD2 lDl EEEYXlD2lD1 for all Y e L2QD1 P But by property 434 EEEYXD2 D1 EEYXlD2 ElYX7 which completes the proof 34 MAJEED M HAYAT Also note that EleDll EEXlD1lDZ This is much easier to prove Prove it The next property is very important in applications 438 Property 8 If X and Y are independent rv7s7 and if g R2 a R is Borel measurable7 then EgXYlUY hY7 where for any scalar t ht EgX7 As a consequence7 EgXY EhY Proof We prove this in the case gxy 91xggy7 where 91 and 92 are bounded7 real valued Borel functions The general result follows through an application of a corollary to the Dynkin class theorem First note that EglXggYlUY 92ltYgtEiglltXgtialtYgti Also note that W gzlttgtEiglltXgtL so too 92ltYgtEiglltXgti We need to show that hY E1X2Y l UY To do so7 we need to show that EhYZ E91X92YZ for every Z 6 L2QUYP But EhYZ E92YE91XZ E91XE92YZ7 and by the independence of X and Y the latter is equal to E1X2YZ The stated consequence follows from Property 434 Also7 the above result can be easily generalized to random vectors 44 Some applications of Conditional Expectations Example 13 Photon counting radiation W W D current detection electric pulses Nphotons FIGURE 5 counter gt M ECE 541 35 It is know that the energy of a photon is hV where h is Planck7s constant and V is the optical frequency color of the photon in Hertz We are interested in detecting and counting the number of photons in a certain time interval We assume that we have a detector and upon the detection of the 2th photon it generates an instantaneous pulse a delta function whose area is random non negative integer Gi We then integrate the sum of all delta functions ie responses in a given interval and call that the photon count M We will also assume that the Gs are mutually independent and they are also independent of the number of photons N that impinge on the detector in the given time interval Mathematically we can write N M Z G i1 Let us try to calculate EM the average number of detected photons in a given time interval Assume that we know PM k k 0 1 2 3 and that PG k is also known k 0 1 23 By using Property 434 we know that EM EEMlNl To nd ElMlN we must appeal to Property 438 We think of the random sequence GilBil as X and the random variable N as Y With these the function h becomes With this ht tEG1 and hN EG1N Therefore EM EhN ENEG1 What is the variance of M Let us rst calculate ElMZ By Property 434 ElMZ EEM2lN and by Property 438 EM2lN is equal to h2N where 1w 2 EZt G02 E i G5 EZ 2 any E Z 2 G3 i1 i1 j1 1739 ij 1t2 tElG1l2 tElG1l2 0 and 0 ElG 7 EG12 is the variance of G1 which is common to all the other Gs due to independence 36 MAJEED M HAYAT Next EM2 Eh2N ENLNEG12 EG12agEN EG12av N2 7 IV EG12 UgN where IV EN and 0 EN2 7 N2 is the variance of N Finally if we assume that N is a Poisson random variable as in coherent light the variance of M becomes a EG12N2 EG12 Jam Exercise For an integer valued random variable X we de ne the generating function of X as ibXs E5X s E D lsl S 1 We can think of the generating function as the z transform of the probability mass function associated with X For the photon counting example described above show that rim15 ibN 70s Example 14 First Occurrence of Successive Hits Consider the problem of ipping a coin successively Suppose PH p and PT q 17p In this case we de ne Q XZIQi where for each i Q H T This means that members of Q are simply ofthe form in w1w2 where M E 91 For each 91 we de ne f HTTH We take f associated with Q as the minimal U algebra containing all cylinders with measurable bases in Q A an event E in is called a cylinder with a measurable base if E is of the form w1w2 E Q wi1 6 B1 wik E Bk k nite B E ln words in a cylinder only nitely many of the coordinates are speci ed For example think of a vertical cylinder in IRS where we specify only two coordinates out ofthree now extend this to IR Let X IMFH be a 01 valued random variable and de ne X on as X 3 X1X2 Finally we de ne P on cylinders in Q by forming the product of the probabilities of the coordinates enforcing independence For each 9137 de ne p and q 1710 We would like to de ne a random variable that tells us when a run of n successive heads appears for the rst time More precisely de ne the random variable T1 as A a function of X as follows T1 mini 2 1 X 1 For example if X 0010110111 then T1 3 More generally we de ne Tk mini Xik1 XFHZ X 1 For each k Tk is a rv on 9 For example if X 0010110111 then T2 6 and T3 10 ECE 541 37 Let yk ETkl In what follows we will characterize yk Special ease k It is easy to see that PT1 1 p PT1 2 qp PT1 3 121 Pm n qnilp Recall from undergraduate probability that above probability law is called the geo metric law and T1 is called a geometric random variable Also yl z pqi l General ease k gt 1 We begin by observing that Tk is actually an explicit function f say of Tk1 XTk711XTk712 Note however that Tk1 and XTk711XTk712 are independent Therefore by Property 438 ElTk EfTk1XTk711 XTki g EhTk1 where ht EftXt1Xt2 It is easy to check using the equivalent de nition of a conditional expectation that EftXt1 lXt1 t1IXt11t1 ykIXt10 This essentially says that if it took us It ips to see k 7 1 consecutive heads for the rst time then if the t 1st ip is a head then we have achieved k successive heads at time t 1 alternatively if the t 1st ip is a tail then we have to start all over again start afresh while we have already waisted t 1 units of time Now EftXt1 EEftXt1 prim Et 1IX11 1 1 yk m t1p t1 mg Thus ht t1p t1 m and in EhTzHl ETk71 1p Tk71 1 mg p pyk71 qyk gym q Finally we obtain 9 yk p lyz 194 We now invoke the initial condition yl i which completes the characterization of lik For example if p i then yg 2y1 2 6 and so on 38 MAJEED M HAYAT Example 15 Importance of the Independence Hypothesis in Property 438 Let Y XZ where Z 1X Now7 EY EXZ EX1X XF However7 if we erroneously attempt to use Property 438 by ignoring the dependence of Z on X we will have EXt EXt and EEXZ EXEZ EX1 EX X X2 Note that the two are different since X2 74 W in general 45 The L1 conditional expectation Consider a probability space 977 P and let X be an integrable randorn variable7 that is7 EHXH lt 00 We write X E L1Qf P Let D be a sub U algebra For each M 2 17 we de ne X M as X when X S M and M otherwise Note that X M E L2Qf P since it is bounded7 so we can talk about EX MD7 which we call ZM The question is can we sornehow use ZM to de ne EXD7 lntuitively7 we would want to think of EXD as some kind of limit of ZM7 as M a 00 So one approach for the construction of EXD is to take a limit of ZM and then show that the lirnit7 Z7 say7 satis es De nition 6 with L2 replaced by L1 It turns out that such an approach would lead to the following de nition7 which we will adopt from now on De nition 7 L1 Conditional Expectation If X is an integrable randorn variable7 then we de ne Z EXD if Z has the following two properties 1 Z is D rneasur able7 2 EZY EXY for any bounded D rneasurable random variable Y All the properties of the L2 conditional expectation will carry on to the L1 condi tional expectation We make the nal remark that if a random variable X is square integrable then it is integrable The proof is easy Suppose that X is square integrable Then7 EHXH EHXiIXSIl EHXiIXgt1l S Eifuxislil EX21Xgt1l 31 EiXZl lt 00 46 Conditional probabilities Now we will connect conditional expectations to the familiar conditional probabilities In particular we recall from undergraduate probability that if A and B are events7 with PB gt 07 then we de ne the conditional probability of A given that the event B has occurred as PAB PA BPB ECE 541 39 What is the connection between this de nition and our notion of a conditional ex pectation Consider the U algebra D 797B7B07 and consider ElIAlD Because of the special form of this D and because we know that this conditional expectation is D measurable7 we infer that ElIAlD can assume only two values one value on B and another value on Be That is7 we can write EIAlD aIB bIBc7 where a and b are constants We claim that a PA BPB and b PA B0PB0 Note that because PB gt 07 1 7 PB gt 07 so we are not dividing by zero As seen from a homework problem7 we can prove that PA BPBIB PA BcPBcIBc is actually the conditional expectation EIAlD by showing that PA BPBIB PA B0PB0IBC satis es the two de ning properties listed in De nition 6 or De nition 7 Thus7 ElIAlD encompasses both PAlB and PAch that is7 PAlB and PAch are simply the values of ElIAlD on B and B07 respectively Also note that PHB is actually a probability measure for each B as long as PB gt 0 47 Joint densities and marginal densities Let X and Y be random variables on 97 P with a distribution incB de ned on all Borel subsets B of the plane Suppose that X and Y have a joint density That is7 there exists an integrable function j m7 R2 a IR such that 10 incB nyzy dddy7 for any B E 82 B 471 Marginal densities Consider XiA gtlt IR7 where A E 8 Notice that an gtlt IR is a probability measure for any A E B Also7 by the integral representation shown in 107 11 incAXE AXR nyzy dddyAltfoyxy dy ddEAfXQ dz where is called the marginal density of X Note that quali es as the pdf of X Thus7 we arrive at the familiar result that the pdf of X can be obtained from the joint pdf of X and Y through integration Similarly7 fyy flR nyzy dx R R EYhX hfx 196 ECE 541 41 In the above we have used two facts about integrations 1 that we can write a double integral as an iterated integral and 2 that we can exchange the order of the integration in the iterated integrals These two properties are always true as long as the integrand is non negative This is a consequence of what is called Fubini7s Theorem in analysis 2 Chapter 8 To complete the proof we need to address the concern over the points over which 0 in which case we will not be able to cancel fX7s in the numerator and the denominator in the above development However it is straightforward to show that PfXX 0 0 which guarantees that we can exclude these problernatic point from the integration in the x direction without changing the integral D With the above results we can think of fXYW flR nym y dy as a conditional pdf In particular if we de ne fXY957l fXY957l 14 f y z M l JR may dy M96 then we can calculate the conditional expectation EYlX using the formula flRyfy Xle dy which is the familiar result that we know from undergraduate probability 42 MAJEED M HAYAT 5 CONVERGENCE OF RANDOM SEQUENCES Consider a sequence of random variables XLXZ7 7 de ned on the product space 977 P7 where as before7 Q Xil i is the in nite dimensional Cartesian product space and f is the smallest U algebra containing all cylinders Since each Xi is a random variable7 when we talk about convergence of random variables we must do so in the context of functions 51 Pointwise convergence Let us take 91 017 f 8 017 and take P as Lebesgue measure generalized length in 01 Consider the sequence of random variables fnw de ned as follows nzw7 nggn lQ 15 XnW n 7 nzw n 12 lt u g n71 07 otherwise It is straightforward to see that for any u the sequence of random variables Xnw converges to the constant function 0 We say that Xnw 7 0 poz39nth39se7 everywhere or surely Generally7 a sequence of random variables Xn converges to a random variable X if for every w E Q Xnw 7 Xw To be more precise7 for every 6 gt 0 and every w E 9 there exists no 6 lN such that anQu 7Xwl lt 6 whenever n 2 no Note that in this de nition no not only depends upon the choice of E but also on the choice of w Note that in the example above we cannot drop the dependence of no on 6 To see that7 observe that supw anw 7 0 n2 thus7 there is no single no that will work for every w 52 Uniform convergence The strongest type of convergence is whats called uni form convergence7 in which case no will be independent of the choice of w More precisely7 a sequence of random variables Xn converges to a random variable X uni formly if for every 6 gt 07 there exists no 6 lN such that anQu 7 Xwl lt E for any in E Q and whenever n 2 no This is equivalent to saying that for every 6 gt 07 there exists no 6 lN such that supwen anw 7 Xwl lt 6 whenever n 2 no Can you make a small change in the example above to make the convergence uniform ECE 541 43 53 Almostsure convergence or almosteverywhere convergence Now con sider a slight variation of the example in 15 nzw 0ltw n 12 n7n2w n l 2ltwltn 1 lt16 Xiltwgt 1 w n 11 Q 0 w E n 11 QC Note that in this case Xnw 7 0 for w E 01 QC Note that the Lebesgue measure or generalize length of the set of points for which Xnw does not converge to the function 0 is zero The latter is because the measure of the set of rational numbers is zero To see that let r1 r2 be an enumeration of the rational numbers Pick any 6 gt 0 and de ne the intervals Jn rn 7 2 16rn2 16 Now if we sum up the lengths of all ofthese intervals we obtain 6 However since Q C Uf1Jn the length of the set Q cannot exceed 6 Since 6 can be selected arbitrarily small we conclude that the Lebesgue measure of Q must be zero In this case we say that the sequence of functions Xn converges to the constant random variable 0 almost everywhere Generally we say that a sequence of random variables Xn converges to a random variable X almost surely if there exists A E f with the property PA 1 such that for every w E A Xnw 7 Xw This is the strongest convergence statement that we can make in a probabilistic sense because we dont care about the points that belong to a set that has probability zero Note that we can write this type of convergence by saying that Xn 7 X almost surely or as if Pw E Q limiH00 Xnw Xw 1 or simply when PlimiH00 Xn X 1 54 Convergence in probability or convergence in measure We say that the sequence Xn converges to a random variable X in probability also called in measure when for every 6 gt 0 limiH00 Pan 7 Xl gt 6 0 This a weaker type of convergence than almost sure convergence as seen next Theorem 8 Almost sure convergence implies convergence in probability Proof Let E gt 0 be given For each N 2 1 de ne EN an 7 Xl gt 6 for some n 2 N 44 MAJEED M HAYAT If we de ne E jovo EN we7ll observe that w E E implies that Xnw does not converge to X This is because if u E E then no matter how large we pick N No say there will be an n 2 No such that an 7 Xi gt Hence since we know that Xn converges to X as it must be true that PE 0 Now observe that since E1 3 E2 3 PE limNH00 PEN and therefore limNH00 PEN 0 Observe that ifw E an7Xi gt 6 and n 2 N thenw E an7Xi gt E for some n 2 N Thus for n 2 N w E an 7 Xi gt 6 C EN and Pan 7 Xi gt 6 S PEN Hence limH00 Pan 7 Xi gt 6 0 D This theorem is not true however in in nite measure spaces For example take 9 IR f B and the Lebesgue measure m Let Xnw wn Then clearly Xn 7 0 everywhere but 7 0i gt 6 00 for any 6 gt 0 Where did we use the fact that P is a nite measure in the proof of the Theorem 8 55 Convergence in the meansquare sense or in L2 We say that the se quence Xn converges to a random variable X in the mean square sense also in L2 if limH00 Eian 7 Xizi 0 Recall that this is precisely the type of convergence we de ned in the Hilbert space context which we used to introduce the conditional expectation This a stronger type of convergence than convergence in probability as seen next Theorem 9 Convergence in the mean square sense implies convergence in probabil ity Proof This is an easy consequence of Markov inequality when taking 2 which yields Pan 7 Xi gt 6 S Eian 7 XiZiEZ D Convergence in probability does not imply almost sure convergence as seen from the following example ECE 541 45 Example 16 The marching band function Consider Q 01 take P to be Lebesgue measure on 01 and de ne Xn as follows 17 271n1271nw n 6 12 272n121272n21w n 6 1 21 21 22 Zikm71721m72k7127kn21m2k71to n 6 1 21 2k71 21 2k71 2k Note that for any v 6 01 Xnw 1 for in nitely many n s thus Xn does not converge to U anywhere At the same time ifn E 121 2114 21 2k 12k then PXn 7 0 gt 6 2 11 hence limiH00 PXn 7 0 gt 6 0 and Xn converges to U in probability Note that in the above example limiH00 EXn 7 02 0 so Xn also converges to 0 in L2 Later we shall prove that any sequence of random variables that converges in prob ability will have a subsequence that converges almost surely to the same limit Recall the example given in 15 and lets modify it slightly as follows 713241 nggn lQ 18 XnW 7112 7 713241 n 12 lt v S n 1 0 otherwise In this case Xn continues to converge to 0 at every point nonetheless EXn 7 02 112 for every n Thus Xn does not converge to 0 in L2 56 Convergence in distribution A sequence Xn is said to converge to a ran dom variable X in distribution if the distribution functions FXT a converge to the distribution function of X FXz at every point of continuity of Theorem 10 Convergence in probability implies convergence in distribution 46 MAJEED M HAYAT Proof Adapted from T G Kurtz Pick 6 gt 0 and note that PXSz6PXSx6XngtzPXSz6XnSz and PXnSpPXnSxXgt6PXnSx7XSx6 Thus7 19 PXn 7PXSEPXn 7XgtEPX EXngt Note that since Xn S LX gt x6 C aniXl gt 67 PXn 7PX ESP XniXl gt6 By taking the limit superior of both sides and noting that lirnyH00 Pan iXl gt 6 07 we obtain End PXn S x S PX S x 6 Now replace every occurrence of 6 in 19 with 6 multiply both side by 1 and obtain 7PXnSxPXSp767PXnSx7Xgtx76PXSx76Xngtx Since XSz767Xn gtp C aniXl gt67 we have PXn PX E S Pan7Xl gt6 By taking the limit inferior of both sides7 we obtain liirn S00 PXn S p 2 PX S x 7 6 Combining7 we obtain PX S x 7 6 S himnnoo PXn S x S End PXn S x S PX S 6 Thus7 by letting 6 a 0 we conclude that lirnyH00 FXnp at every point of continuity of FX D Theorem 11 Bounded Convergence Theorem Suppose that Xn converges to X in probability and that anl S M as Vn for some coed M Then Ean gt EX ECE 5A1 A7 This example shows that the boundedness zequuement IS not supez uous Take 01 f 8n 01 and m as Lebesgue meastue 16 mA A d Considez the functtons Mt shown below 111 the guze Note that Mt a 0 Vt e Q But Ewt e 0 1 fox any n Thezefoze Em s EU 0 mm at Themm Adapted fmm A Back Let Em pm gt M and note that HEN 0 by the boundedness assumptton De ne E0 uden and note that PE0 g 2 NEW PE PEz 0 We will xst Show that X g M as Constdez Wm anxt gt 1k k 12 SmcechonVezges toX m pzobablhty WWW a 0 Let Ft W gt M1kgt Note that off E0 FL C Wu as zegazdless onfand P01 s NW Then NFL g IImMMPka 0 01 NFL 0 Thezefoze Nuga 0 Smce ugh w gt M X g M as Let 5 gt 0 be gwen Let F0 2 ung and an Xn ext gt 5 Note that n can be wntteh as the disjoint umon F0 u E0 u Gnu F0 u E0 u OVA FoU E0 Thus EXn XH S EHXn XU EUXn X Imma EHXn X1aanusa EHXn XVWUMW g 0 mans 5 ZMEUGA 5 m Hence fox Smce was a 0 theze extst no such that n 2 no lmplles PG s E Xasngtoo n n 2 not Epg r EX g 223 5 25 whteh estabhshes EXn a D 48 MAJEED M HAYAT Lemma 2 Adapted from T G Kurtz Suppose thatX 2 0 13 Then limMH00 EX M EX Proof First note that the statement is true when Xn is a discrete random variable Then recall that we can always approximate a random variable X monotonically from below by a discrete random variable More precisely7 de ne Xn S X as in 5 and note that Xn T X and EXn TEle Now since XnAM S X M 3 X7 we have EXn liim EXnM S liim EXM S lfiim EXM S EX Mace Mace H00 Take the limit as n a 00 to obtain EX S liim EXM S EXM S EX7 Mace H00 from which the Lemma follows E Theorem 12 Monotone Convergence Theorem7 Adapted from T Kurtz Suppose that 0 S Xn S X and Xn converges to X in probability Then limyH00 EXn EX Proof ForMgt07XnAM Xn X Thus7 EXn M EXn S EX By the bounded convergence theorem7 limyH00 EXn M EX Hence7 EXM Lin EXn m EXn EX ECE 541 49 Now by Lemma 2 limMH00 EX M EX and we nally obtain EiXl him Ean S E Ean EM and the theorem is proven D Lemma 3 Fatou7s Lemma Adapted from T G Kurtz Suppose that Xn 2 0 18 and Xn converges to X in probability Then lim EXn 2 EX naoo Proof For M gt 0 Xn M S X Thus EXM Lin EXLM Lin EX where the left equality is due to the bounded convergence theorem Now by Lemma 2 limMH00 EX M EX and we obtain EX 3 liim EXL D naoo Theorem 13 Dominated Convergence Theorem Suppose that Xn 7 X m proba bility anl S Y and EY lt 00 Then Ean 7 EX Pmof Since Xn Y 2 0 and Y Xn 7 Y X in probability we use Fatou7s lemma to write liimH00 EY Xn 2 EY X This implies EY himnnoo Ean 2 EY EX or lim Ean 2 ElX Similarly lim EY 7 Xn 2 EY 7 X which inaoo implies EY limnaoo7EXn 2 EY 7 EX or limnaoo7EXn 2 7EX But naoo lim 7 imwx x and therefore EH EXn S EX In summary we have EH EXn S EX 3 liim EXn which implies limH00 EXn EX D naoo 57 The L1 Conditional expectation revisited Consider a probability space 97 P and let X E L1Qf P Let D be a sub U algebra For each n 2 1 we de ne Xn X 71 Note that Xn E L2Q f P since it is bounded so we can talk about EXnlD as a projection of Xn onto L2QDP which we call Zn We will now show that Z is a Cauchy sequence in L1 which means limm noo EHZn7ZmH 0 We already know that EZn 7 ZmY EXn 7 XmY for any Y E L2Q D P In particular pick Y Znzmgt0 7 Znzmgo In this case Zn 7 ZmY lZn 7 Zml Thus we conclude that EHZn 7 Zml EXn 7 XmIZnZmgtO 7 Imwzmgom 50 MAJEED M HAYAT But the right hand side is no greater than EHXn 7 Xml why and we obtain EHZn7ZmH S EHXn7XmH However EHXn7XmH S EHXlI X ijnmm 7 0by the dominated convergence theorem verify this and we conclude that limm noo EHZn 7 Zml 0 From a key theorem in analysis eg see Rudin Chapter 3 we know that any L1 space is complete Thus there exists Z 6 L1QD P such that limH00 EHZn 7 Zl 0 Further Zn has a subsequence an that converges almost surely to Z We take this Z as a candidate for EXlD But rst we must show that Z satis es EZY ElXY for any bounded D measurable Y This is easy to show Suppose that lYl lt M for some M We already know that ElZnY EXnY and by the dominated convergence theorem EXnY 7 ElXY since anl S Also lEZnY 7 EZYH S EHZn 7 ZHYH S EHZn 7 ZlM 7 0 Hence EZnY 7 EZY This leads to the conclusion that EZY EXY for any bounded D measurable Y In summary we have found a Z 6 L1QD P such that EZY ElXY for any bounded D measurable Y We de ne this Z as EleD 58 Central Limit Theorems To establish this theorem we need the concept of a characteristic function Let Y be a rv We de ne the characteristic function of Y yu as Eleml where Eleml exists if ElReei Y lt 00 and Elmemy lt 00 Note that Reei Y S lemyl 1 and lmemy S 1 Therefore ElReei YH S 1 and Ellmemyl S 1 Hence Eemyl is well de ned for any n 6 R If Y has a pdf fyy then Heml ff eiuyfyydy which is the Fourier transform of fy evaluated at 7n Example 17 Y 6 01 and PY 1 p In this case yu Heml peiu 7p6iu0 peiu W2 Example 18 Y N N0 1 Then Heml ET 366 justi cation below We also need this Theorem that relates pdf7s to characteristic functions 59 Levy7s Inversion Lemma ECE 541 51 Lemma 4 If Xu is the eharaeteristiefunetion ofX then limeH00 f0 PXaPXb PaltXltb 2 See Chow and Teicher for proof As a special case7 if X is an absolutely continuous random variable with pdf fX7 then ff Ciium Xudu Theorem 14 Central limit theorem Suppose Xk is a sequence of iid random variables with EXk u and Vaer 02 Let Sn 21 Xk and Yn Then Yn converges to Z in distribution where Z N N01 zero mean Gaussian random variable with unit variance Sketch of the proof Observe that Dyn to LA ltXk7igti k1 W0 A ltIgtltXmlt gtgtr We now expand erhu as inp w Eej Xk 7 El1jWXk 7 u xk 7 m2 xk 7 3 i 11Lwale 7 o 7 gum 7 m jg EXk 7 m 17 gum 7 m 3 EM 7 m 52 Then we have LA ltXk7tgt 22 Then ltIgtynw 1 7 and limH00 ltIgtynw limnnoo 7 MAJEED M HAYAT m Em 7 m 7 WW 7 m 7w 7 m 7 3ampEiltxk 7 m Em m e w22 On the other hand7 the characteristic function of Z is ltIgtZw EWW Then ltIgtZw Hem ejwzfzzdz 0 l 2 e wzie 2dz 700 V27T 1 00 e 12 2iwzgtdz 1 00 6 Z gt2w21dz 1 57w22 57 ltz7jwgt2dz N 8 naoo Therefore7 limyH00 ltIgtyn w ltIgtZw and FY a Fz by the inversion lemma D 510 Central limit theorem for nonidentical random variables Suppose Xk is a sequence of random variables that are independent but non identical EXk Mk and VarXk 0 Let Sn 21 Xk Then ESn 21 EXk 21 pk n and VarSn 221 0 6 Let Yn LA and suppose Xk satis es the following two conditions a limH00 g limH00 21 0 00 b There exists 04 gt 2 and c lt 00 such that EHXk D ltch ECE 541 53 Then7 Yn converges in distribution to a zero mean unit variance Gaussian random variable For proof see7 for example7 Chow and Teicher Equivalent conditions to the above two conditions are a There is a constant 6 gt 0 such that 0 gt E Vk b7 All densities fkx if they exist are zero outside a nite interval 700 for some C 511 Strong Law of Large Numbers Suppose that X17 X27 7 are iid random variables7 EX1 M lt 007 and EX1 7 M2 02 lt 00 Let Sn 3 n 1 21 Xi Note that ESn M lt 00 and ESn 7 M2 n laz By Chebychev7s inequality7 02 P SF lti7 l Mlgt n6 and therefore we conclude that Sn converges to M in probability This is called the weak law of large numbers As it turns out7 this convergence occurs almost surely7 yielding what is called the strong law of large numbers To prove one version of the strong law7 one needs the notion of a sequence of events occurring in nitely often Roughly speaking7 if A17A2 7 is a sequence of events think for example of the sequence lSn 7 Ml gt 67 then one can look for the collection of all outcomes u each of which belonging to in nitely many of the events An For example7 if wo is such outcome7 and if we take An lSn 7 Ml gt 6 where E gt 07 then we would know that Snw0 cannot converge to M More generally7 we de ne the event An occurs in nitely often7 or for short An io7 as An 10 2 of oz Ak The above event is also referred to as the limit superior of the sequence An7 that is7 An io End An The terminology of limit superior makes sense since the Uf lAk is a decreasing sequence and the intersection yields the in mum of the decreasing sequence Before we prove a key result on the probability of events such as An io7 we will give an example showing their use 54 MAJEED M HAYAT Let Yn be any sequence and let Y be a random variable possibly being a candidate limit of the sequence provided that the sequence is convergent Consider the event limH00 Yn Y the collection of all outcomes in for which Ynw 7 Yw It is easy to see that 20 lirn Yn Yc U5gt056QlYn 7 Yl gt e io This is because if too 6 lYn 7 Yl gt E io for some 6 gt 0 then we are guaranteed that for any n 2 1 lYk 7 Yl gt E for in nitely many k 2 n thus Ynw0 does not converge to Yw0 Lemma 5 Borel Cantelli Lemma Let A1A2 be any sequence of events If 221 PAn lt 00 then PAn io 0 Proof Note that 221 PAn lt 00 implies PAk 7 0 as n 7 00 Since PU Ak S 22 PAk P f L Uzi Ak limH00 PU Ak 0 recall that U nAk is a decreasing sequence of events Hence PAn io 0 D The next Theorem is an application of the Borel Cantelli Lemma Theorem 15 A strong law of large numbers Let X1 X2 be an Ltd sequence EX1 u lt ooEX12 2 lt 00 and EX M4 lt 00 Then Sn n l 21Xi 7 1 almost surely Proof Without loss of generality we will assume that u 0 From 20 we know that 21 lim 5 00 u6gt056QSl gt 610 By Markov inequality with 4 ls l 21 21 221 221 ElXinXlel E PlSl gt e g T 71464 Considering the term in the numerator we observe that all the terms in the sum mation that are of the form EX3EXj EXi2EXjEX EXiEXJEXEXm ECE 541 55 239 79739 31 Z 31 m are zero Hence i i i i ElXinXszl nElel 37101 1ElX12lElX12l7 and therefore mm 3nn 1M2 PlSl gt e 3 R464 Hence we conclude that ZPS gt e lt oo n1 By the Borel Cantelli Lemma PlSnl gt 610 0 Hence by 21 my Sn or 0 and the theorem is proved D Actually the frequent version of the law of large numbers does not require any assumptions on second and fourth moment We will not prove this version here but it basically states that if EX1 lt 00 then Then 714 21 X M almost surely We next prove a corollary that shows that the law of large numbers holds even when Corollary 1 Let X1 X2 be m 239z39d sequence and EX1 00 Then 71 21 X a 00 almost surely Proof Without loss of generality assume that Xn 2 0 Let M be an integer note that 71 1in 2 71 1in M i1 i1 Therefore 117111 71 1in 2 Lin n lszM EX1 AM Hm H Hm i1 56 MAJEED M HAYAT where the last equality follows from the law of large numbers Hence it follows that 71 limnil Xgt lim EX AMoo go l Mace 1 l where the last limit follows from Lemma 2 D The next theorem uses the Borel Cantelli Lemma to characterizes convergence of probability Theorem 16 From Billingsley A necessary and su cient condition for Xn a X in probability is that each subsequence Xnk contains a further subsequence Xnkm such that an a X as as i a 00 Proof If Xn a X in probability then given nk choose a subsequence nkw so that h 2 implies that Pank iXl 2 i l lt 2 therefore PanW7Xl 2 i l lt 00 By the Borel Cantelli lemma PanW 7 Xl lt i l 1 for all but nitely many i Therefore limiH00 an X as Conversely if Xn does not converge to X in probability there is some 6 gt 0 and a subsequence for which Pank iXl 2 6 gt E for all Is No subsequence of Xnk can converge to X in probability hence none can converge to X almost surely D Corollary 2 Ifg R a R is continuous and Xn a X in probability then gX a gX in probability Proof Left as an exercise ECE 541 57 6 MARKOV CHAINS The majority ofthe materials of this section are extracted from the book by Billings ley Probability and Measure and the book by Hoel et al Introduction to Stochastic Processes Up to this point we have seen many examples and properties eg7 convergence of iid sequences However7 many sequences have correlation among their members Markov chains MCs are a class of sequences that exhibit a simple type of correlation in the sequence of random variables In words7 as far as calculating probabilities pertaining future values of the sequence conditional on the present and the past7 it is suf cient to know the present In other words7 as far as the future is concerned7 knowledge of the present contains all the knowledge of the past We next de ne discrete space MCs formally Consider a sequence of discrete random variables X07X17 X27 7 de ned on some probability space 977 P Let S be a nite or countable set representing the set of all possible values that each Xn may assume Without loss of generality7 assume that S 07 17 27 The sequence is a Markov chain if PXn1 i077Xn PXn1 For each pair i and j in S7 we de ne pij PXn1 len i Note that we have implicitly assumed for now that PXn1 len i is not dependent upon n7 in which case the chain is called homogeneous Also note that the de nition of pij implies that 27651017 Li 6 S The numbers pij are called the transition probabilities of the MC The initial probabilities are de ned as 04139 PX0 i Note that the only restriction on the Otis is that they are nonnegative and they must add up to 1 We denote the matrix whose ijth entry is pm by P7 which is termed the transition matrico of the MC Note that if S is in nite7 then P is a matrix with in nitely many rows and columns The transition matrix P is called a stochastic matrim7 as each row sums up to one and all its elements are nonnegative Figure 6 shows a state diagram illustrating the transition probabilities for a two state MC 58 MAJEED M HAYAT P12 state state 1 0 311 P22 P21 FIGURE 6 61 HigherOrder Transitions Consider PX0 i0X1 i1X2 i2 By ap plying Bayes rule repeatedly ie PA B 0 PAlB CPBlCPC we obtain PX0 i0X1 i1X2 i2 PX0 i0PX1 illXO i0PX2 ingO i0X1 i1 aiopioilpilig More generally it is easy to verify that PX0 i0 X1 i1 Xm im aiopZOi pimilim for any sequence i0i1 im of states Further it is also easy to see that PX3 i37X2 Z392l1 Z3917X0 Z390 piligpigig More generally 22 PXml jl1 Z S anS i50 S s S m pimjlpj1j2pjniljn With these preliminaries we can de ne the n step transition probability as pl MW lem 239 If we now observe that Xmn j Ujojn1Xmn j7Xmn71 j07 7Xm1 jnilj we conclude 10E Z piklpk1k2pkn1j klmknil ECE 541 59 E as the 239jth entry of P the rzth power of 9 m From matrix theory we recognize p the transition matrix P It is convenient to use the convention p 7 consistent with the fact that P0 is the identity matrix I Finally it is straightforward to verify that ply 219 1165 and that 276510 1 This means lP lPklPl whenever Z k n which is consistent with taking powers of a matrix For convenience we now de ne the conditional probabilities PiA PAlX0 239 A E f With this notation and as an immediate consequence of 22 we have il77Xm j7Xm1 jm177Xmn Z 17 7Xm jm17 7XrL Example 19 A gambling problem see related homework problem lnitial fortune of a gambler is X0 zo After playing each hand heshe increases decreases the fortune by one dollar with probability p The gambler quits if either herhis fortune becomes 0 bankruptcy or reaching a goal of L dollars Let Xn denote the fortune at time 71 Note that 23 PXn lel 2391 Xn1 251 PXn len1 251 Hence the sequence Xn is a Markov chain with state space S 0 1 L Also the time independent transition probabilities are given by p ifjz 1z 7 L 24 pljPXn1len q ifjz21z 7 0 1 ifz39jLorifz39j0 6O MAJEED M HAYAT The L 1 gtlt L 1 probability transition matrix P is therefore 10 0 0 q 0 p 0 0 0 0 25 IP 1 Z 0 0 q 0 p 0 0 Note that the sum of any row of P is 1 a characteristic of a stochastic matrix Exercise 2 Show that A 1 is always an eigenvalue for any stochastic matrics P Let Pz be the probability of achieving the goal of L dollars where z is the initial fortune In a homework problem we have shown that 26 PzpPx1qPz7112L71 7 with boundary conditions P0 0 and PL 1 Similarly de ne Qz as the probability of going bankrupt Then 27 QxpQz1qQltz7112L71 with boundary conditions Q0 1 and QL 0 For example if p q then see homework solutions Pm ea mm1ii Thus limLH00 Pz 0 what is the implication of this Exercise 3 Show that 30 Pz 0 when q 31 p ECE 541 61 Back to Gambler7s ruin problem take L 410 06 Then7 31 P 0 04 0 06 0 04 024 0 036 0 32 P2 016 0 043 0 036 and 0 016 0 024 06 1 0 0 0 0 05846 0 0 0 04154 33 330 03077 0 0 0 06923 01231 0 0 0 08769 l 0 0 0 0 1 This can be obtained7 for example7 using the Cayley Hamilton Theorem Note that the last column is Q0 34 Q s 04 and the rst column is 35 P 62 MAJEED M HAYAT 62 Transience and Recurrence Suppose that X0 i and de ne Eij U301X0 7Xn 7 as the event that the chain eventually visits state j provided that X0 i A state i is said to be recurrent if 1 otherwise7 it is said to be transient Note that we can write E as the disjoint union Eij U21Xo i7Xk j7Xk71 7 7 7X1 7 j ln the above7 we are saying that the chain can eventually visit state j precisely in the following mutually exclusive ways visit j for the rst time in one step7 or visit j for the rst time in two steps7 and so on If we de ne if 2 PiX1 jXn1 jXn j as the probability of a rst visit to state j at time n provided that X0 i7 then the quantity 0 2 2 Z 12 n1 is precisely PiElj Hence7 a state i is recurrent precisely when f 1 and it is transient when f lt 1 621 Visiting a state in nitely often Suppose that X0 i and consider the event Ajk de ned as the event that the chain visits state j at least k tirnes Note that if Ajk occurs7 then it must be true that the chain must have made a visit from i to j and then revisited j at least k 7 1 times Realizing the Pl probability that the chain visits state j from state i is fij and the Pl probability that the chain visits state j from state j k 7 1 times is 7171 we conclude that PAAM S fij 73371 On the other hand7 if the chain happens to visit j from i and also revisit j at least k 7 1 tirnes7 then we would know that the event Ajk has occurred Hence7 PiAk fijfjl jil Note that Ajk is a decreasing sequence of events in k Consequently7 07 fjj lt 1 M7 1 739739 ECE 541 63 However7 kAJk is precisely the event Xn j io Hence7 we arrive at the conclu sion 0 if lt 1 36 PiXn j 10 p fquot fijv 1f fjj 1 Taking i j yields 37 PiXn i 10 1 f 17 if fii39 1 Thus7 PiXn i io is either 0 or 11 We have therefore proved the following result Theorem 17 Adopted from Billingsley Recurrence ofi is equivalent to PiXn i i0 1 and transience ofi is equivalent to PiXn i i0 0 The next theorem further characterizes recurrence and transience Theorem 18 Adopted from Billingsley Recurrence ofi is equivalent to anE oo transience ofi equivalent to anE lt 00 Proof Since pg PiXn i7 by the Borel Cantelli lemma anE lt 00 implies PiXn iio 0 According to 377 this implies f lt 1 Hence7 we have shown that ang lt 00 implies transience Consequently7 recurrence implies that Eng 00 We next prove that f lt 1 or transience implies Eng lt 007 which7 in turn7 implies anE 00 implies recurrence7 and the entire theorem will therefore be proved Now back to proving fll lt 1 implies Zn pg lt oo77 Observe that the chain visits j fromi in n steps precisely in the following mutually exclusive ways visit j from i for the rst time n steps7 visit j from i for the rst time in n 7 1 steps and then revisit j in one step7 visit j from i for the rst time in n 7 2 64 MAJEED M HAYAT steps and then revisit j in two steps and so on More precisely we write n71 pl PM j Z PiX1 j s j XH j X 7 50 n 1 j7quot397XrLS1 7 50 Put j i and summing up over n 1 to n Z we obtain in ifmr n1 50 I l Zpll Z fllks S fazpll 50 n51 50 The last inequality comes from the fact that Zips fiin s fig 3 f Re alizing that pg 1 we conclude 1 7 fiiZfL1pE S f Hence if fa lt 1 then Ziplpgl S fii1 7 f and the series 2211 will therefore be convergent since the partial sum is increasing and bounded D Exercise 4 Show that in the gambler s problem discussed earlier states 0 and L are recurrent but states 1 L 7 1 are transient 63 Irreducible chains A MC is said to be irreducible if for every ij E S there n exists an n that is generally dependent up on i and j such that pi gt 0 ln words if the chain is irreducible then we can always go from any state to any other state in nite time with a nonzero probability The next theorem shows that for irreducible chains either all states are recurrent or all states are transient there is no option in between Thus we can say that an irreducible MC is either recurrent or transient Theorem 19 Adopted from Billingsley If the Markov chain is irreducible then one of the following two alternatives holds i All states are recurrent P jXn j io 1 for all i and angl 00 for alli andj ECE 541 65 it All states are transient PUjXn j 2390 0 for all t and angL lt 00 for allt aadj Proof 7 gt 0 and By irreducibility for each 239 and j there exist integers r and s so that pi pg gt 0 Note that if a chain with X0 239 visits j from t in 7 steps then visits j frorn j in 71 steps and then visits 239 frornj in 5 steps then it has visited 239 frornt in rn 5 steps Therefore pgsm 2 pgp p39 Now if we sum up both sides of the above inequality over 71 and use the fact that pgp gt 0 we conclude that Zn pg lt 00 implies Zn pg lt 00 why do we need the fact that pgp gt 0 to come up with this conclusion In particular if one state is transient then all other states are also transient ie fjj lt 1 for all j In this case 36 tells us that PXn jio 0 for all t and j Therefore PUjXn j io S 7 PXn j io 0 for all t NeXtv nOte that 2110 221 221 to 21 ft 2200 S 22017 where the inequality follows from the fact that 21 fig f S 1 Hence ifj is W n transient in which case anj lt 00 according to Theorem 18 then pig 7 ltoofor all t We have therefore established the second alternative of the theorem If any state is not transient then the only possibility is that all states are recurrent In this case PjX j io 1 by Theorern17 and Xn j io is an almost certain event for allj with respect to Pj Narnely PjA Xn j io PjA for any A E f Hence we can write pg PjltXm mXnjiogt S 2 inm1 jv39anil ngtm ZpgZWJL awfu ngtm By irreducibility there is an mg for which 109m gt 0 hence fij 1 Now by 37 PXn j io 1 The only item left to prove is that angL under the second 66 MAJEED M HAYAT alternative Note that if Zn pg lt 00 for some i and j then the Borel Cantelli lemma would dictate that PXn j io 0 and the chain would not be recurrent which n is a contradiction Hence under the second alternative 27101 00 for all i and j E Special Case Suppose that Zn pg lt 00 and Sis a nite set Then 2765 Zn pg Zn 2765109 But since 276510 1 we obtain 2765 angL an 00 which is a contradition Hence in a nite irreducible MC alternative ii is impossible hence every riite state space irreducible MO is recurrent Consequently if the tran sition matrix of a nite state space MC has all positive entries making the chain irreducible then the chain is recurrent Random walk example from Billingsley Polya7s theorem For the symmetric k dimensional random walk all states are re current if k 1 or k 2 and all states are transient if k 2 3 We will only give the proof for the case k 1 Note rst that 107 is the same for all states i Clearly ll pgizn 0 Since return in 271 steps means 71 steps right and 71 steps left 271 1 2n 7 pm 2 2n ii By Stirling7s formula p gt N 7m 12 Therefore angl 00 and all states are recurrent by Theorem 19 64 Birth and death Markov chains The material here is extracted from the book by Hoel et al Birth and death chains are common examples of Markov chains they include random walks Consider a Markov chain Xn on the set of in nite or nite nonnegative integers S 0 d in the case of an in nite set we take d 00 The transition function is of the form In 7 Z i 17 PM my 7 2397 1013 j 1 ECE 541 67 Note that p q r 1 go 0 and pd 0 if d lt 00 We also assume that p and qj are positive for 0 lt 239 lt d With these assumptions the chain can be shown to be irreducible Hence the question is whether such a birth or death chain is transient or recurrent We will address this question next For any 239 E S we de ne T minn Xn 239 as the time to the rst entrance or rst hitting time to state 239 and assume that X0 1 If state 0 is recurrent ie foo 1 then according to 36 P1Xn 0 io flo But according to Theorem 19 P1Xn 0 io 1 so flo 1 Therefore since he P1TO lt 00 we conclude that if state 0 is recurrent then P1TO lt oo 1 We will next show precisely when P1TO lt oo 1 Following Hoel et al for a and b in S such that a lt b de ne the probabilities 112 PTa lt T1 1 lt 239 lt b with ua 1 and ub 0 Recall the transition probabilities Q 73 or pi and note that we can use conditional expectations to write MU qjuj71 Wujpjuj17 a lt 7 lt 5 Since 77 1 7 19 7 q we can rewrite the above equation as ultj1gt7ultjgtltultjgt7ultj71gtgt altjltb For convenience de ne yo 1 and 77971 0ltjltd p1pj and use this to rewrite the above recursion as 38 um 7uj1 ua 7ua1 agj ltb Now sum up the above equation over j ab 7 1 use ua 1 and ub 0 and obtain ua 7 ua 1 1 1271 39 Ya 2721 V 68 MAJEED M HAYAT Thus 38 becomes Y39 J UJ1 17771 7 a3jltb Zja Y Let us now sum up this equation over j 239 b 7 1 and use the fact that ub 0 to obtain 1271 ZiaY alt ltb b7 1 Zja 7739 It now follows from the de nition of that ry 39gt PM lt Tb a s 239 lt b 2739an or 2131 Y39 PTb ltTa gff ag ltb 2739an As a special case of 39 1 40 P1T0 lt Tn 17 771 71 gt1 Zj0 7739 Note that 1ST2ltT3lt which implies To lt Tn n gt 1 are nondecreasing events Hence 42 P1TO lt Tn P1Uf 1T0 lt Tn Equation 41 implies Tn 2 n and thus Tn a 00 as n a 00 Hence U31T0 lt Tn To lt 00 We can therefore rewrite 42 as P1TO lt Tn P1TO lt 00 Combining this with 40 yields 1 ECE 541 69 We now show that the birth and death chain is recurrent if and only if 2 00 j0 From our earlier discussion we know that if the irreducible birth and death chain is recurrent then P1T0 lt oo 1 in which case 43 would imply 20 77 00 On the other hand suppose that 20 77 oo in which case 43 implies P1T0 lt oo 1 Note that we can write using conditional expectations 00 P0T0 lt 00 1000 P01P1T0 lt 00 1000 1001 17 and 077 is therefore a recurrent state 65 Stationary distributions A stationary distribution is a vector 7139 7T0 7T1 that is a solution to the equation 7TlP 7139 or equivalently mph nj such tha 7T 2 0 and if 1 Note that if 7139 is a stationary distribution then an 7139 That is the distribution of Xn remains as 7139 so long as we select the distribution of X0 to be 7139 This is why we call such 7139 a stationary the distribution becomes invariant in time The main result here is from Theorem 86 in Billingsley which we reiterate without proof Theorem 20 From Billingsley Suppose that for an irreducible aperiodic chain that there epists a stationary distribution 7139 Then the chain is recurrent n 7 lim pi 7 7T 00 for alli andj the if are all positive and the stationary distribution is unique The main point of the conclusion is that the effect of the initial state wears off Also limiH00 1P converges to a matrix whose rows are identical to 7139 651 Stationary distributions for birth and death chains Consider a birth and death chain on the nonnegative integers We further assume that the chain is irreducible iepgt0fori20andqigt0fori21 The system of equations mp 7Tj for the stationary distribution yields 7T0T0 7T1Q1 W07 7O MAJEED M HAYAT 7971ij 797339 7Tj1qj1 7T7 jZ 1 Using 19 qj 73 1 we obtain Q17T1 1007T0 07 qj17Tj1 10jo 1179 Pr jilv j Z 1 By induction we obtain and hence 10 7Tj1 7 77 J 2 0 11 Consequently we obtain 44 m M71707 Z gt 1 11 1139 If we de ne A 17 i 07 7Tl39 39 Pompz 17 Zr 2 1 invvlh39 we can recast 44 as 7Tl39 i o Z Realizing that 7T lt 00 if and only if 0 POnpiq Q1Qi lt oo i1 we can say that if p2quot39f1 lt 00 then the birth and death chain has a unique 1 why is it unique stationary distribution given by l gt 747 Z 7 20 77739 On the other hand if E 7 00 no stationary distribution exists why7 71139 If the irreducible birth and death chain is nite then the above formula for the stationary distribution still holds with 00 replaced by d ECE 541 7 PROPERTIES OF COVARIANCE MATRICES AND DATA WHITENING 71 Preliminaries For a random vector X X1 anT the covariance matrix is CX is an n gtlt 71 matrix EX 7 7 This matrix has the following properties 1 2 A 0 V CX is symmetric7 which is clear from the de nition There is a matrix Q such that QTCXQ A7 where A is a diagonal matrix whose diagonal elements are the eigenvalues7 A1 An of the matrix CX as sume that they are distinct The corresponding eigenvectors7 V1 Vn form the columns of the matrix Q Q V v This fact comes from elementary linear algebra Because CX is symmetric7 the eigenvectors corre sponding to distinct eigenvalues are orthogonal prove this Moreover7 Q can be selected so that the eigenvectors are orthonormal that is7 ViTVj 6 The fact that the Vs are orthonormal implies that they span IR and we also obtain the following representation for X TL X E Civi7 i1 with cl XTVi Cx is positive semi de nite7 which means XTCXX 2 0 for any x 6 IR To see this property7 note that XTCXX XTEX7EXX7EXTX EXTX7 2 0 This calculation also shows as long as no component of X can be written as a linear combination of other components7 then XTCXX gt 07 which is the de ning property of positive de nite matrices Recall that for any random variable Y7 EY2 0 if and only if Y 0 almost surely In the homework7 we looked at an estimate of CK from K samples of the random vector X as follows K A 1 7 lt1 OT CX Kiligx x assuming7 without loss of generality7 that EX 07 which has rank at most k see homework problems Therefore7 if n gt K7 then Cx is not full rank and hence CK is not positive de nite 72 MAJEED M HAYAT 72 Whitening of Correlated Data We say that the components X1 Xn are white if EX 7 EXXj 7 EXj 6m for all 27 There is an n gtlt 71 matrix A such that if Y AX then CY I the identity matrix To see this put B AlZ where 45 A E E 7 and as before A is the 2th eigenvalue of CK Recall that QTCX I A Observe that BA 12ltIgtT AlZA lZQT lt1gtIlt1gtT lt1gtlt1gtT 1 Hence Br1 kl21 Now take A B l Let Y AX then CY ACXAT A lZQTCXQA lZ quot 211quotV2 I which is the desired result Let us have better insight into this transformation Note that 1 T le 46 Y AX A lZQTX r If Y is the 2th component of Y then Y 7721Viij Therefore we can think of A as a linear lter with hij 1 Vij as its impulse response whereby m Yi 2 Maxi Whitenin XK gt g gt YK Fllter h White sequence FIGURE 7 Now recall the representation X ZZ1V XVk and use the fact Yk val to obtain X 2Z2 xAkkak This is the reconstruction formula from whitened data 73 Simultaneous Diagonalization of Two Covariance Matrices In addition to CX suppose that CZ is a second covariance matrix corresponding to a random vector Z Our goal is to nd a transformation 1 that simultaneously diagonalizes GK and CZ ECE 541 73 To begin we may want to try A again We know that A whitens X Put T AZ and note that CT ACZAT A lZQTCZQA IZ which is not necessarily a diagonal matrix show this by example Note however that if W is a matrix consisting of the eigenvectors of CT corresponding to the eigenvalues 771 nn then WTCTW M where M is the diagonal matrix whose diagonal elements are 771 nn Consider 1 WTA as a candidate for the desired transformation We need to check if this new transformation diogonalizes both GK and CZ To do so consider D FX and note that CD WTACXATW WTIW WTW I because W is normal matrix ie W l WT Next let F rz and write CF rCZrT WTACZATW But ACZAT CT Thus CF WTGW M 731 Summary Let GK and CZ be given Let B be as before and let W be the matrix whose 2th column is the eigenvector of B lCZB corresponding to the 2th eigenvalue Let 1 WTB l then Y FX and T I Z have the following proper ties 1 CY 1 2 CT M 771 0 47 s s L0 77 where the 777s are the eigenvalues of B lCZB Example 20 2 1 3 i1 48 Cx Cz 1 2 i1 3 Find the transformation 1quot We rst we calculate I Set lCX 7 ME 0 We nd that A1 1 and A2 3 Next we nd V1 and V2 using val Alvl and CXVZ AZVZ Normalize these so 74 MAJEED M HAYAT that HV1H HVZH 1 and obtain 49 1 1 1 1 V1 i V2 i V 1 1 We now form 50 B 1 A lZQT i 1 1 1 1 Next7 1 8 0 51 B 1CZB 1T 7 2 0 g and its eigenvalues are 771 4 and 772 g and the corresponding eigenvectors are 17 0 and 01 Hence7 W I and 1 WTB 1 B l ECE 541 75 8 CONTINUOUS TIME WIENER FILTERING Suppose that we have the observation model Xt 7 so Nt where St and Xt are wide sense stationary random processes We desire to nd a physically realizable causal lter ft with impulse response ht to minimize the mean square error MSE El zl ElSt a 7 t2l7 where ht gtk Xt is the lter output If 04 07 this problem is called the Wiener ltering problem if 04 gt 07 it is called the Wiener prediction problem7 and if 04 lt 07 it is called the Wiener smoothing problem Let7s characterize the solution to this problem We begin by writing Elezl EHSto Ooo hTXtTdTl2 7 mm 7 7 2 0 name aXt 7 7 A A h7hoEXt 7 TXt 7 0drdo 13550 2 OOO hTRXST ad739 000 OOO hrhoRXXo 7 739drdo Suppose that there exists an optimum lter denoted by ho7397 and consider a variation from that optimal lter de ned as hTT h0739 rg7 7 where g is some arbitrary function and r 2 0 The condition E620 0 is a necessary condition for the optimality but not suf cient7 where 62r is the error associated with the 76 MAJEED M HAYAT lter hr Thus7 we can pursue this differentiation approach and write E62r R550 7 2000h0739 rgTRXST adT 000 Joel107 r97lhoa Tg0RXXU deU R550 7 2000 h0TRXST adT 7 2 000 rgTRXST adT rg739h0a rgah0TRXXU 7 739de0 0 0 7 h07h0aRXXa 7 739de0 7 rzg739gaRXXU 7 739de0 0 0 0 0 Note that R550 does not depend on 7 Now take partial derivative with respect to r and obtain gEEUN 2 OOORXSTagTdT W mime100 9Ulh0TlRXXU 7 Tgtd7da 27quot OO 00 gTgURXXU 7 739de0 0 0 Set E620 0 and obtain 72 0 3X50 7 Cayman 7 0 Ooog739h0a 7 gUh0TRXXU 7 Tdea 7 0 Using the fact that RXXT is syrnrnetric7 ie7 00 00 gTh0URXXU 7 739de0 OO 00 gah0TRXXU 7 739d739dU7 0 0 0 0 we get 000 97 RXSQ39 04 000 h0URXXU 7 Tdad739 0 Since this is true for all g the only way this can be true is that the multiplier of 9 inside the outer integral must be zero for almost all 739 2 0 with respect to Lebesgue measure Thus7 we get the so called Wiener Integral Equation RX5T 04 fooo RXXT 7 Uhoada7 739 2 0 ECE 541 77 81 Noncausal Wiener Filtering If we replace fooo with ff in Weiner7s Equa tion ie RX5T 04 RXXT 7 Uh0ada 739 6 R we obtain the characterization of the optimal non causal non realizable lter In the frequency domain we have stw6jwa SxxwH0w and hence How 7353 which is the optimum non realizable lter Suppose the signal and noise are uncorrelated then RXXm EXtXtTl EStNtStTNt7l RSSTRNNT Similarly RX5T R55T Then HOW Example Suppose that Nt E 0 and 04 gt 0 ideal prediction problem then HOW 67W1 or h0t 6t 04 which is the trivial non realizable predictor Before we attempt solving the Wiener integral equation for the realizable optimal lter we will review germane aspects of spectral factorization 82 Review of Spectral Factorization Note that we can approximate any power spectrum as white noise passed through a linear time invariant LTl lter why7 Suppose the transfer function of the lter is w an1w 1 alw a0 Hw AM bd1wd 1 blw be i i i i i N i If the input is white n01se w1th power spectrum 70 the output power spectrum is Note that we can write AM ijW HM 78 MAJEED M HAYAT where A B C and D are polynomials with real coef cients Thus WW AW MW 02w2 w2D2w2 Observe that the output power spectrum will have only real coef cients in its numer ator and denominator polynomials Now recall that the roots of polynomials with real coef cients are either real or complex conjugate pairs Consider the complex variable 5 039 jw lf 5 0k jwk is a root then s 0k 7 jun is also a root Since the polynomials are even in u we also know that if 5 0k jwk is a root then 75k 70k 7 jun is also a root Now replace everywhere jw appears in the expression for the output power spec trum with 5 Then write 55 5wl ws where S has all its zeroes and poles in the left hand plane LHP and S s has all its zeroes and poles in the right hand plane RHP Furthermore lSsl lS sl since poles and zeroes occur in complex conjugate pairs Finally 55 518 lSi8l2 Example 2mg ltjw WW7 3 SM t21 WNW 1 That is s 3 V287 3 SW8 5 18 s 1 s 7 1 We next investigate the connection between causality and the location of poles Suppose that ft is a function that vanishes on the negative half time ie ft 0 for t lt 0 and suppose that ff lftldt lt 00 Then the magnitude of the Fourier transform of this function is e mftdtl S fooo le m t dt lt 00 Va Let s 039 jw and observe that the magnitude of the Laplace transform has the property that S fooo lfte e mldt lt 00 thus F has no singularities in the RHP ECE 541 79 Similarly if ft 0 for t gt 0 we can show that Fs has no singularities in the LHP The converses of these statements are also true We therefore conclude that the inverse Laplace transform of S is causal and that of S is anticausal 83 Back to Causal Wiener Filtering Recall the Wiener integral equation and de ne 97 7 RXSTO7AOORXX77UhOUdU RXSTO 1 RXXT Uh07d7 Since 97 is anticausal by invoking the Wiener equation Gs has no singularities in the LHP And Gs SX55050 7 SXX5H05 SXS865 7 S XSS XSHo8 Now de ne 17 17 7 07 where 17 7 1 f Memdw g 700 Smw b7 if L w e lmdw and 07 S XwH0w07de Note that 00 SX w 17 is anti causal since 592 has no singularities in the LHP Thus 17 0 V7 2 0 XX 9 and therefore b7 07 V7 2 0 Also note that S XwH0w has no singularities in the RHP thus 07 0 V7 3 0 Hence 070 739Wd7 fooo 070 739Wd7 S XwH0w fooo b70 739 7d7 and we obtain HOQU SJrlw fooo 6771M ff jigst li ejw aejw wadT 84 BodeShannon method A simple implementation of the above solution can be obtained as follows Let H1w SXXL by H3w fay De ne h2t h3tut Then we will have Consider the unrealizable lter de ned 5 HOW H1WH2W Example Xt St Nt Nt is zero mean white Gaussian noise that is independent of St 7 A 7 5550 i w2k2 i jwk 7jwk 80 MAJEED M HAYAT 2 k2 SXXQU 5550 l SNNQ wzzfm l w 27 ij 7 Where 6 Nick NeXt7 7 N 39k 1 7 7 N 7 47k r 7 707ij1 v 7 70 ijwk 39 We must now take the inverse Fourier transform to nd h3t f 1s wejw XX jW Therefore7 f4 Ss w5jwa 1 Z ke Sxx ewwiwwm 2 1 5k70 7 gt 7a i MT 7 haw NOZ H9 2 1x11 ekV1ETD TltiOt Jive2 7 7 7 0 739 7 1 2 1 7 f0 h3ltTgtg JL WdT 7 1ijwk7 If 04 0 ltering problem7 then H2w Finally7 h0t WK 1mu t 7 7 2 1 and HOW 7 H1WH2W WW39 Acknowledgement I wish to thank Professor Jim Bucklew for developing most of the matrial of this section ECE 541 81 1 Y S Chow and H Teicher7 Probability Theory Independence Interchange ability Martingales Springer Texts in Statistics New York7 3rd ed7 1997 ISBN 0 387 40607 7 2 W Rudin7 Real and Complecc Analysis New York McGraw Hill7 1987 3 T G Kurtz7 Stochastic Analysis Class note at the University of Wisconsin Madison 4 A Beck7 Integration and Measure Class notes at the University of Wisconsin Madison 5 P G Hoel7 S C Port7 and C J Stone7 Introduction to Stochastic Processes Waveland Press7 Inc 1987 6 P Billingsley7 Probability and Measure Third Edition7 Wiley Series in Proba bility and Mathematical Statistics7 1995 7 W Rudin7 Principles of Mathematical Analysis New York McGraw Hill7 1976
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'