### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Stochastic Processes in Electronic Systems ECE 6010

Utah State University

GPA 3.86

### View Full Document

## 9

## 0

## Popular in Course

## Popular in ELECTRICAL AND COMPUTER ENGINEERING

This 71 page Class Notes was uploaded by Seth Gibson on Wednesday October 28, 2015. The Class Notes belongs to ECE 6010 at Utah State University taught by Staff in Fall. Since its upload, it has received 9 views. For similar materials see /class/230398/ece-6010-utah-state-university in ELECTRICAL AND COMPUTER ENGINEERING at Utah State University.

## Similar to ECE 6010 at Utah State University

## Reviews for Stochastic Processes in Electronic Systems

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/28/15

ECE 6010 Lecture 2 More on Random Variables Readings from GampS Section 33 Section 34 Section 36 Section 37 Section 43 Section 46 Section 51 Section 52 Section 56 Section 57 Section 58 Expectation When we say expectation we m ean average the average being roughly what you would think of ie the arithmetic average as opposed to a median or mode For a discrete rv X we de ne the expectation as For a continuous rv we de ne the expectation as Em fx Now a bit of technicality regarding integration which introduces notation commonly used When you integrate you are typically doing a Riemann integra abxfxxdx im maxigzgnii ixz17rziigt0 ZszXZzxzi 7 96 z ax1ltx2ltltxnb zl xlxl1 In other words we break up the interval into little slices and add up the vertical rectangular pieces Another way of writing this is to recognize that szXZzxz1 96 S 211306 lt X S Xz1 21FX9511 FXWJ and that in the limit the approximation becomes exact Note however that this is ex pressed in terms of the cdf not the pd f and so exists for all random variables not just continuous ones This gives rise to what is known as the RiemannStieltjes Integral EX OZainmu 7 mm 1m maxlgzgnil 11H Ezgt 11 We write the limit as b acde a This notation describes continuous discrete and mixed cases That is 00 EX acde z 700 We have de ned the Riem ann Stieltj es integral in a context of expectation However it has a more general de nition b fltzgtdgltzgt max zggi Z migmm 7 9a When gx as this reduces to the ordinary Riemann integral Suf cient conditions for existence ECE 6010 Lecture 2 iMore on Random Variables 2 0 995 of bounded variation 0 and x continuous on 1 b o f of bounded variation 0 995 continuous The rst case covers the case of expectation In a directly analogous way we de ne 0 gltzgtdeltzgt limZganinw 7 Fle Now consider the rv Y gX Em m deyltygt Note that dFy is the representation of the limiting value FYyz1 FYyz PTW lt Y S yz1 PTyz lt 900 S yz1 PT9 1yz lt X S 9 1yz1 PM lt X S 96m which in the limit is equal to dFX when y Thus EY 0 deyy oogxdFXx Let us put this in more familiar terms If Y then Em gltxgtfx x dx 1 One might think that nding EY would require nding fy However as 1 shows all that is necessary is to substitute 995 for x in the expectation This is sometimes called the law of the uncanciaus stalislin39an since it can be done nearly thoughtlessly An interesting result is obtainedthrough the use of indicator functions Let I 7 Q A Rbe de nedby IAW l wEA 0 w A In other words the indicator function indicates which its argument is in the set which is the subscripted argument We de ne a simple function as one which is a linear combination of indicator func tions For some collection A1 A2 n E 9a ZMAJCW k1 This gives us a piecewiseconstant function on Q It also de nes a random variable Note that the collection need not be disjoint However we can shuf e things around to write the function as 9a 2521Aw k1 ECE 6010 Lecture 2 iMore on Random Variables 3 where the Afgs are disjoint and where the bis are unique Note that Zw 919wbi Now note that EUA PltAgt Based on this and the disjointness of the Aquot we can write There are many instances where indicator functions are used to get a handle on the prob ability of an event Now we will get a bit more technical dealing with some issues related to the existence of expectations We have seen how to de ne expectations for simple functions which are random variables But what about more general random variables Let X 2 0 be a random variable We de ne Ely where the sup means least upper bound and the limit is taken over all simple functions 9 satisfying 9 S X It can be shown that this limit always exists though it may be in nite There is thus no question of convergence or anything like that Generalizing further let X be an arbitrary rv Since the previous result holds for non negative random variables let us split X XX7X Ele sup gsimple g X where X is the positive part and X is the negative part XmaxXw020 X 7minXw020 Now X and X have wellde ned expectations We take Ele ElXl ElX l which is de ned in every case except when EX and EX are both in nite in which case the difference is unde ned Let us examine the expectation in light of the Riemann Stieltj es integral We de ne 00 b EX xdFXx limb xdFXx This is a stronger sense of the limit than for example xdFXx lim acde For example sinx has an integral in the latter sense which is equal to 0 but not in the former sense Now we will consider an example of a density where the expectation does not exist Example 1 Cauchy density f gt 7 1 X x 7 7rl 952 It is straightforward to show that this satis es the requirements for a pd f It looks a lot like a Gaussian but has heavy tails It can be shown that a Cauchy rv can be obtained as a ratio of Gaussians X YZ Now let us attempt to compute Z 1 b Em alga r952 d ECE 6010 Lecture 2 EMore on Random Variables 4 If a Eb then the integral is zero no matter what If you x a or b taking the limit of the other one the result is 0 That is EX Jr exists and EX exists although both are 0 but they can t be subtracted 1 Properties of Expectations 1 IfX cthen EX c 2 IfY aX b then EY aEX b EX acts kind of like an integral of X 40 over 9 weighted by P One way that the expectation is expressed is EXAXwPdwAXdR An integral in this form is said to be a LebesgueStieltjes Integral Since X induces a probability PX on R B as we have observed we can also think of the probability space R B PX We can write EX RxPXdx where now X is the identity rv on the real line We thus have two equivalent de nitions XwPdw xPXdx n R Back to properties 1 KY go Xthen Em Aooxxwmm RgltxgtPxltdxgt may Pairs of random variables Ultimately we will be dealing with in nite sequences of random variables As steps along the way we will examine carefully pairs of random variables then vectors of random variables On R2 the smallest a eld of interest is 132 which is the smallest a eld containing all of the rectangles This is the Borel a eld of R2 De nition 1 A bivariate random variable X Y is a measurable mapping from Q 7 to R2 132 1 That is w EQXYwE BEfVBEBQ Note that two rvs X Y on Q 7 form a bivariate rv De nition 2 The joint or bivariate distribution of X Y is PXyB Pw E Q X Ylto E B for B E 132 1 De nition 3 Thejoint cdf of X Y is de ned as FXya b PX g a Y 3 b PX Y e RM ECE 6010 Lecture 2 iMore on Random Variables 5 Where Rab is the semiin nite rectangle Rajbxy ll2x ay b Properties of the joint cd f 1 limatum Fxya 5 1 N liman x Pkg15 0 limbn w nyy l b limanoo nyy l b Fyb the marginal cd f of Y lirmH00 nyy l b FX 1 the marginal cd f of X L gt FXya b is continuous from the northeast Lquot FXyx y is montonically increasing or more precisely nondecreasing in both variables Any function with these properties is a legitimate cdf and completely characterizes the family ofjoint cd fs Joint discrete rvs De nition 4 If X Y are discrete rvs taking values in sets 951 r r and yh r r re spectively then X Y forms a discrete bivariate rv and its joint pm f is de ned by prab PX aY b Properties opry l pr 2 0 andpxya b 0 ifa g 951 r r or b g yh r 2 EilEiilpxaxuyj1 3 FXYW b Enhyzmsawsb Wm3 1 4 Marginals PXI1 pr xz yj j Di31139 ZPXYxza yj Joint continuous rvs De nition 5 X and Y are jointly continuous rvs if there is a function ny R2 A R2 such that FXya b b a nyxydxdy for all a b E R2 The function ny is called thejoint pdf ofX and Y when it exists Properties ofjoint pd f D 1 fXY 2 0 2 30 30 nyxydxdy 1 ECE 6010 Lecture 2 EMore on Random Variables 6 3 We can get the pdf from the cdf 82 fXYIy mFXYWnl 4 Marginals mac mommy fYy 100 fXY96dx If X and J are jointly continuous then they are marginally continuous that is f X and fy are continuous However the opposite is not true Example 2 Important example X and Y are said to be jointly Gaussian with parameters ME by and 03 05 where ME by E R and 03 05 2 0 and p where p E 711ifthe have the joint pd f fXYxy 1 ex 27raay lip2 p 2lip2 7 May y We write X Y N NW1 My 0 05 p In this case the marginals satisfy XNNWaca Tg YNNKWaai If we make contour plots of the function that is plots of constant probability we obtain ellipses When p 0 we get circles As p increases positive the ellipses tilt clockwise As p decreases the ellipses tilt counterclockwise p is called the correlation coef cient It is a measure of how much X looks like Y 1 Independence of rvs De nition 6 X and Y are independent if PX E AY E B PX E APY E BWA B E B i X and Y are independent iff FXya b FXaFybVa b E R2 if X and Y are jointly continuous then they are independent iff nyx y fX fy IfX and Y are discrete then they are independent ifprya b pXapyb 59 If X and Y are jointly Gaussian random variables then they are independent iff p 0 Show this using the pdf Caution Gaussian rvs are special this way As a general rule uncorrelated does not imply independence In practice it is common to assume that random variables are independent based on physical arguments rather than to prove is by identifying a joint density and computing the margina s Many times independence is also taken as an assumption even when it is not strictly true This independence assumption frequently simpli es analysis However the validity of the assumption must be validated e g using computer simulations ECE 6010 Lecture 2 EMore on Random Variables 7 Expectations of functions of two rvs Letg R2 A Rbe measurable eg z y E R2 gz y E B E BQVB E 132 Then for a bivariate rv X Y we can de ne Z gX Y EZ zszz Rgx yPXydxdxy 7 30 gx ynyx y dxdy X Yjointly continuous E Ej gx yjprx yj X Y discrete Properties 1 EXY EX EY 2 IfX 2 YthenEX 2 3 If X and Y are independent then EglXgQY E91XEQQ YVmeasurable wellde nedgl 92 Comments If X and Y are independent then EXY However if EX EXY this does not mean that they are independent Uncorrelated does not imply independence However if E91X92Y E91XE92Y for all appropriate functions then and Y are independent In fact this is necessary and suf cient for indepen dence De nition 7 The covariance of X and Y are is de ned as COVX Y EKX 7 EleY 7 ElYll The variance of X is de ned as V3rX COVX X 4 covX Y EXY E varX EX2 E 5 IfX and Y are independent then covX Y 0 If covX Y 0 we say that X and Y are uncorrelated Again uncorrelated does not imply independence 6 V31X Y V31X V3rY 2 COVX Y If COVX Y 0 then V3rX Y V3rX varY 7 covaX b cY d ac covX Y for all constants a b c d E R Thus V31aX a2 V3rX De nition 8 If 0 lt varX lt gt0 and 0 lt varY lt 0 the correlation coef icent between X and Y is COVX Y VarXVarYl This is a normalized version of the covariance 11 MXaY ECE 6010 Lecture 2 iMore on Random Variables 8 8 l pl 3 1 This can be shown using the CauchySchwartz inequality lpl 1 iff X and Y are linearly related X aY b for some constants a b with a y 0 Example 31fXY N NW My 03 03 9 then pX Y p 1 As we have observed before if X Y are jointly Gaussian and p 0 then they are independent Otherwise p 0 does not imply independence Characteristic functions The characteristic function is essentially the Fourier transform of the pd f or pmf They are useful in practice not for the usual reasons engineers use Fourier transforms e g fre quency content but because they can provide a means of computing moments as we will see and they are useful in nding distributions of sums of independent random variables De nition 9 Let X be a rv The characteristic function ch f of X is gtXu Elem for u E R Here239 xil We will notuse xi j Let us write some more explicit formulas Suppose X is a continuous random variable Then by the law of the unconcious statistician gtXu 100 emfxm dz This may be recognized as the Fourier transform of fX where u is the frequency variable Comment on sign of exponent Note that given j X we can determine fX by an inverse Fourier transform If X is a discrete rv gtxltugt 6 sz x1 1 which we recognize as the discretetime Fourier transform and as before u is the fre quency variable Comment on the sign of the exponent Given a j X we can nd pX by the inverse discretetime Fourier transform Properties 1 Mo 1 Why WM 3 Wu Why j X and fX form a unique Fourier transform pair WM fX lt gt X Thus j X provides yet another way of displaying the probability structure of X 4 gtX 30 5 de This is referred to as the FourierStieltjes transform of Fx Lquot j X is uniform ly continuous ECE 6010 Lecture 2 iMore on Random Variables 9 De nition 10 For an rv X the kth moment ofX is EXk for k E N 1 We can write Em O xdeXm Theorem 1 IfEHXlk lt oothen That is we can obtain moments by differentiating the characteristic function For this rea son characteristic functions or functions which are very similarly de ned are sometimes referred to as moment generating functions Proof k k m 00 W w W Xu 7Ee EZ k 95k 2713M k0 k0 Then dv V J 7 77239 j 1H 7 other stuff IcgtJ so at u 0 V d 7 39jE XJ39 u0 2 i 1 Example 4 X N NW 02 Then it can be shown homework that Xu eiu2022zuu Then it is straightforward to verity homework that Em EM 0 w so thatvarX 02 1 De nition 11 For ajoint rv X Y we de ne ajoint characteristic function AXEu v EequwY 1 Then lt1 Xy and FXy are uniquely related twodimensional Fourier transforms De nition 12 The nth order moments of two random variables are the quantities of the orm MCJEX CYZ k20120kln The nth order central moments are mkl EX 7 EX CY 7 Eml k 2 012 0k 1 71 11 Example 5 For n 2 the second order moments are EX2 EY2 and EXY The central moments are c0VX Y varX and varY 1 Properties 1 Moments 87L MM aukavMXYW v uko 239l l 2 X and Y are independent if and only if dunu v gtXu gtyv for all u v E R2 ECE 6010 Lecture 2 iMore on Random Variables lO Sums of independent random variables Let X and Y be independent rvs and let Z X Y Then gtzu Eexpiuz Eexp239uX tumu But also Eexp239uX Eexp239uX exp239uY So zu Xu Yu If X and Y are continuous rvs then so is Z fz2 f 1 gtzul f 1 xz yzl fx2 JCSZ by the convolution theorem Thus when continuous independent random variables are added the pdf of the sum is the convolution of the pdfs and respectively pm f for discrete independent rvs An example Jointly Gaussian X30 NNanya 0 02W then 1 mm m exptzw my 7 030 2uvpaxaygt1 We make an observation here the form of the Gaussian pdf is the exponential of quadratics The form of the Fourier transform of the eponential of quadratics is of the form exponential of quadratics This little fact gives rise to much of the analytical and practical usefulness of Gaussian rvs Characteristic functions marginals We observe that XYu 0 Xu In our Gaussian example we have tumu 0 expiut 7 73132 which is the ch f for a Gaussian X N NW1 7 We could of course have obtained a similar result via integration but this is much easier Some important inequalities In general when we observe an outcome of a random variable we expect it to be near the mean that is near the expected value Further the farther the outcome is from the mean the less likely we expect the outcome to be There are some very useful probabilities which quantize these intuitive expectations These are the Markov inequality and its ECE 6010 Lecture 2 iMore on Random Variables ll consequences the Chebyshev inequality and the Chernoff bound We will introduce these ere Let B E B for a Borel set B Recall that x B 1306 953 Let X be a random variable and let Y IBX This is a measurable function so Y is another random variable EY PXB PX e B We will use this expectation as probability idea to get a bound Suppose g is a nonnegative nondecreasing function Suppose I E R with gb gt 0 Consider the function gt gt hxgx gt 71x71 91 2 0 Vac Observe that M96 2 Iboox for all 95 since 1 x 2 b Note that Mb byoo Now we have Ewwnzmmmwnmxzw Mm EUR 7 g EWMMVMM Thus HszsEgg A similar result can be established if g is nonnegative nondecreasing on 0 gt0 and sym metric about 0 We can thus establish that POX Special case Assume X 2 0 and let for all I gt 0 Somewhat more generally the Markov inequality says MMZMSMKH ECE 6010 Lecture 2 7More on Random Variables 12 for any I gt 0 Special Case Take 995 X 2 This satis es the requirements for 9 Then EY2 POY gt b 3 b2 Let Y X 7 1 We obtain the Chebyshev inequality EKX MP varX PiX xlgtb b 2T Interpretation The probability that X differs from its mean by more than some amount I is less than the variance of X over b2 Further away less probable Higher varianace more probable Special case The Chernoff bound In this case let us take X positive and let gx e for s gt 0 Then we obtain There is some exibility in the choice of s which may be selected to make the bound as tight as possible The Chernoff bound is a powerful tool which has been put to good use in digital com munications See eg Wozencraft and Jacobs Jensen s inequality Jensen s inequality can be used in some cases to interchange expectation and function evaluation at least approximately It is based on the idea of convex functions De nition 13 A function c R 7gt R is convex if Cl 7 195 ay 3 l 7 acx acy forallxyinRand0 a 1 1 That is a function is convex if the chord connecting the points x and y lies above the function between as and y Draw a picture It can be shown that if c is twice differentiable then c is convex iff c 2 OVx E R c x C95 x2 C95 ago b 1 It can be shown that all convex functions are measurable with respect to the Borel eld B Example 6 CI e 5 Theorem 2 Ifc R 7gt R is convex then Elcxl 2 CElxlb with equality if and only i cx is a constant function In otherwords we can interchange expectation and functions and at least get a bound Example 7 EX2 2 EX2 E 5X 2 5EX le 2 0E1X21EX le gt 0 7ElogX 2 7logEx 11 ECE 6010 Lecture 2 iMore on Random Variables l3 Cauchy Schwartz inequality This is an inequality that holds in any Hilbert space It more or less forms the theme for the rst several weeks of 6030 Theorem 3 IfEX2 lt oo andEDQ lt 00 then EXYP s EX2EY2l For example COVX Y 2 S V3rX V31Y implying p S 1 Observe that varX EX2 7 EX2 2 0 using Schwartz and Jensen inequalities Conditional Expectations and Distributions Suppose X is a discrete rv Now we de ne the conditional distribution of another rv Y given X ask at some point where PX m gt 0 by PYltX FY XW PWSlewk By the law of total probability Fyy ZFHXWVWPXQIC 1c As we discussed before when we condition on an event we are shrinking the sample space under consideration So there is some normalization that takes place We also de ne EYX xk deYlXWixk Note that this depends on the value of ask it is a function of me Let us now take the expectation with respect to X E mmXxm mmXxm wmm k We can think of EYX ask as a discrete random variable that is a function of X For a discrete rv X the function Fy X could be either a discrete or a continuous rv Discrete Pyxx lxk PW le 9 Continuous There exists a function fy X such that y FY XQixk leXltZixkdZ We can also write FYXylxk Ei17ooyyiX 96k If Y is discrete we have PY y X wk prymw torVGA 9 10 l ECE 6010 Lecture 2 iMore on Random Variables 14 When X is a continuous rv conditional probabilities and expectations are somewhat more complicated because PX wk 0 for any particular value of z Recall that EYlXC 9xk for some function g and De nition 14 Suppose Y is an rv on the probability space 9 f P with lt 0 Then for A E 7 de ne YdP EIAY A 11 De nition 15 Suppose X and Y are random variables and E lt 0 The conditional expectation of Y given X x is any measurable function gx EYlX z of as satisfying EmX 95mm YdP 2 B X1B forallBEBwhereX 1Bw QXw B 1 1 It can be shown that under the stated conditions such a function always exists 2 If X is discrete then EYlX ask as de ned earlier satis es the property 3 E Y X z is unique in the sense that if there are two functions gx and Mac both satisfying 2 then Pgx 1 When a condition is true with probability 1 we say that it is true almost surely or as Once we have de ned conditional expectation we can de ne a conditional cdf FYXl95 El1ooyylX gcl Properties 1 This de nition agrees with the previous one when X is discrete 2 Fyy R Fy XylxPX 3 Fy X is a cdf as a function of y because it satis es all the properties of a cdf 4 If X and Y are jointly continuous then Fy X has a density for every x fYXyl95 fc y There is another interpretation Fy Xylx Aglci3PY S ylx 7 Ago2 lt X S x Ago2 7 limArHOPY S yx7 AZ lt X 3 95 Ax2Ax limAmnoJr Px 7 AZ lt X S x Ax2Ax 8x 8 aFXW a FXYW y a FXYxa 3 160605 ECE 6010 Lecture 2 iMore on Random Variables 15 8 If X and Y are Jomtly continuous then 87F X ex1sts and y 8 f In ainYlXC iix Also 00 mix z yfyix l dy Analogously for continuous random variables EYlX wk ZyzprQllg kl z A more general de nition of conditional expectation We will explore conditional expectation in terms of probability spaces Suppos Q is a sub a eld of f A function g Q A R is measurable with respect to Q if w Qgw B QforallB B Any such function 90 would be a rv too But this 90 is more restricted We will now de ne conditioning on a a eld Suppose Y is a rv with EHYlQ lt gt0 and Q is a sub a eld of 7 Then E Y Q is any Qmeasurable random variable such that EYlQdP w for all A e g A A If Y itself were Q measurable it would be its own conditional expectation Example 8 Let 1 f R B Let Q 700 0 0 0 Z R This is a a eld Let Y be a rv Then EYlQ is any Qmeasurable rv satisfying Amiqu EY EYlQdP de 7000 7000 EYlQdP YdP 000 000 Note to be measurable on Q means being constant on 700 0 and 0 0 The Qmeasurable rvs in this case are simple functions 9M b ago I w lt 0 so bP7ltxgt 0 YdP OOVO bP0ltxgt YdP 090 This gives us two equations in two unknowns b 7 000 YdP b 7 700m YdP 7 7 7 f0oo 7000 ECE 6010 Lecture 2 iMore on Random Variables 16 De nition 16 If X is an rv de ne 0X the a eld generated by X to be wEQXw BforB B 1 Fact A rv Y is measurable with respect to 0X if and only if there is a measurable function g R A R such that Y gX We now de ne conditional expectation with respect to a a eld De nition 17 IfX and Y are rvs with lt 0 we de ne EmX Emma Properties 1 By the fact stated above we can write ElYlX l 996 for some function g gx EYlX ElYl ElElYlel Y 2 3 If Y itself is Gmeasurable then EYlQ 4 ElaY1 mm aEDmg ElY2l9l 5 IfY 2 0 then EYQ 2 0 6 IfEHYH lt oo andQ C E C 7 then EYlQ Idea If you rst condition on a eld that is less course than Q you get a rv Then condition on Q De nition 18 Two a elds Q and H are independent if PGH PGPH for all G e g and H e H Note X and Y are independent rvs iff 0X and 0Y are independent a elds 6 If 0Y is independent on then EYlQ 7 lfY is Qmeasurable then EYlQ So for example if Q 0X and Y gX for some 9 R A R that is Y is Qmeasurable then EYlX Y gX More informally ECE 6010 Lecture 9 Linear Minimum Mean Square Error Filtering Background Recall that for random variable X and Y with nite variance the MSE EX 7 hY2 is minimized by MY EXlY That is the best estimate of X using a measured value of Y is to nd the conditional average of X One aspect of this estimate is that The error is orthogonal to the data More precisely the error X 7 EX lY is orthogonal to Y and to every function of Y ElX EleYl9Yl 0 for all measurable functions 9 We will assume that Eg2 lt 0 We want to show that h minimizes EX7hY2 if and only ifEX7hYgY 0 orthogonality for all measurable 9 such that E 92Y lt 0 EKX EleYl9Yl ElElX EleYllYl9Yl ElEleYl EleYl9Yl 0 Conversely suppose for some 9 EX 7 hYg y 0 Consider the estimate W 7 MY a900 where a ElX hY9Yl ElamH Then ElX hY9Yl2 EX 7 Wm 7 EX 7 MW El92Yl lt EX 7 hY2 Suppose now we are given two random processes Xt and that are statistically related that is not independent Suppose to begin that T R Suppose we observe Y over the interval 1 b and based on the information gained we want to estimate X t for some xedt as a function of Y a S t S b That is we form X 7 MYna r s m for some functional f mapping the function to real numbers If t lt b We say that the operation of the function is smoothing If t b We way that the operation of the function is ltering If t gt b We way that the operation of the function is prediction The error in the estimate is Xt 7 Xt The meansquared error is EXt 7 Xt2 Fact built on our previous intuition The MSE EXt 7 Xt2 is minimized by the conditional expectation Xt EthY7a g 7 g b Furthermore the orthogonality principle applies X t 7 EthY7 a S 739 S b is orthogonal to every function of Yh a S 739 3 While we know the theoretical result it is dif cult in general to compute the desired conditional expectation De nition 1 Suppose is second order Let Hy be the set of all random variables of the form ELI olYt c for n E Z and al c E R and it E 11 1 ECE 6010 Lecture 9 7 Linear lVIinimum MeanSquare Error Filtering 2 Note that Hy may include in nite sequences so we assume meansquare limits The set Hy contains meansquare derivatives meansquare integrals and other linear transfor mations of Y t E 1 The set Hy is the Hilbert space generated by the linear span of Yr Let s now solve A Amin ElXt Xt X2771 A couple important properties 0 If EX3 lt 0 then X e Hy solves if and only if EXt 7 Km 7 0 for all Z E Hy That is the error is orthogonal to all elements of Hy Proof If Suppose Xt E Hy satis es EXt 7 XQZ 0 for all Z E Hy Let X be an element of Hy ElXt Xl ElXt Xt Xt Xf ElXt Xt 2ElXt Xt X Xfll EKXt Xl W EH 7 EKXt 7 Xt EKXt 7 X232 2 EXt 7 Xt So the orthogonality condition is suf cient for achieving MNISE Only if Suppose Xt E Hy and there is an element Z E Hy such that EXt 7 XtZ y 0 We will show that there would then be a better estimate Let A EX7XZ XXt 5 20 Z l Then ElXt 7 KW EKXt Xt2 EXt 7 Xt2l EZ2 lt ElXt 7 Xi So Xt cannot be the MNISE estimator which implies the necessity of the orthogo nality condition EXt 7 ngl 7 0 for all z e Hy if and only if Elia 7 Em and EXt 7 Xtm 7 0 for all T 7 ab This is a restatement of orthogonality but for a restricted space Iiroof Only if necessity Want to show that EXt 7 XQZ 0 only if EXt 7 Xt 0 and EXt 7 XtYT 0 for all 739 E 11 But this comes by de nition since 1 E Hy and Y7 E Hy for eachr E 11 If suf ciency Suppose Z E Hy and EXt 7 Xi 0 and EXt 7 XJYT 0 for all 739 E 1 b That is the error is orthogonal to each YT Then for n z 7 lim Msxzom c 171 we have n ElXt thzl nIEgOEKXt thzazytz Cll 171 ECE 6010 Lecture 9 7 Linear lVIinimum MeanSquare Error Filtering 3 where the limit may be interchanged because X t is assumed to be second order EXt 7 XM 313010 ialEMXt 7 KM cEXt 7 X0 7 0 Suppose we further restrict Xt to be of the form A b Xi Mt TY7d739 ct That is Xt is the output of a linear lter driven by Yt Note that Xt E Hy By property 2 we must have 1 b EXt EXt Marya7517 ct so that b c 7 wt 7 WJWWW and 2 Emaq Emaq for 739 E 11 That is b RXytT htaRyaTda ctily739 This gives us two equations in the unknowns ct and h We can eliminate ct b Rm 7 7 Mt comm 7 7 y739y0d0 7 WWW b Pmtar 7 meme 7 hlttagtcyltargtda b CXytT Mt 0Cya739 do 739 E 1 b The optimal h is that which solves this integral equation Since we are dealing with covariances the means have been eliminated It is frequently assumed that Xt and Yt have zero means In this case the covariances are equal to the correlations and we can write b Ewe 7 ht 0Rya may This equation is called the WienerHopf equation An integral equation of this form is called a Fredholm equation The theory on the existence of solutions Fredholm integral equations is wellknown In practice solutions are usually numerical The solution 1 is sometimes called a Wiener lter Example 1 ANonCausal Wiener lter Suppose a 700 and b 0 Suppose that Xt and are individually and jointly WSS Then the WienerHopf equation becomes RXYt 7 739 1 ht 0Rya 7 739 do ECE 6010 Lecture 9 7 Linear lVIinimum MeanSquare Error Filtering 4 for7ltxgtlt 739 lt 0 Letixa7r Then RXyt 7 739 00 ht1 TRy1 511 Let s t 7 739 nys O my t 7 sRyzdz Observe that the lefthand side is indendent of t Thus if there is a solution there must be a solution which is independent of t This means that there is a timeinvariant solution we will call it ho Then by a particular choice of the form of he we can write nys be h0s 7 VRyy du That is RXy ho Ry How to solve for he Easiest way is to use Fourier transforms SXyw H0ltUSYW The lter in this case is called a NonCausal Wiener Filter Example 2 Suppose Yt St Nt where St is some signal random process of interest and Nt is some noise process Assume that St and N t are independent and individually and jointly WSS Also assume that they are zero mean Let Xt Stir Given we want to estimate Xt IfA 0 this is a lten39ng problem If A gt 0 this is a prediction problemIf lt 0 this is a smoothing problem We nd Syw Ssw SNltU 2 ReSSNlto SXYltltUgt elWASSyQU 51WASsw SSNltU We obtain the transfer function H w SXYW Mam SW1 SSW SNlto 2 RSSNw If signal and noise are orthogonal 7 SXYW 7 SSW em W Sm SsltwgtsNltwgt Let us look at the amplitude gain part SS 7 SsSN N 1 SsSNgtgt1 SsSN SsSN1 0 SsSNltlt1 ECE 6010 Lecture 9 7 Linear lVIinimum MeanSquare Error Filtering 5 11 It can be shown that the residual error for the noncausal Wiener lter is 7 1 lSXYWNQ MMSE 7 27f 1 0Sxw 7 SYW This can be seen as follows EXt 7 Xt 7 EXt 7 ngt 7 EXt 7 Mfg By orthogonality the last term is 0 which implies that EXtXt We thus obtain 1 EiXt 7 KY EX3l 7 13th EX3l EX3l 100 Sxw 523M dw The MTVISE is sometimes written as 1 00 MMwg wwmwmwmmm where SXy w pXYW SXwSyw Example 3 For the signal noise problem we have 7 i w SswSNw MMSE 2w 7 SSW SN 1 Example 4 Let us now do the signal noise problem for a particular signal source Sup pose A2 W and N0 SNw white noise Then HOW SSW 1m ampMwa 7 142012 w2 AQa2 7w NO25 7 2A2 MIA 7 2A2 N0oz2 w2e 7 2A2 1 WA 7 To 012 2A2Noe 2A2 1 zwA elm No lt02 W Then A2 he mmle This is not a causal lter Plot for various values of 11 ECE 6010 Lecture 9 7 Linear Mnimum MeanSquare Error Filtering 6 Causal Wiener Filtering The examples we have seen so far have produced nancausal lters ie practically nonim plementable in many cases We will see what can be done now to make causal lters Let us take a 700 with Xt and jointly and individually WSS and take I t ltering Furthermore assume that the lter is timeinvariant and causal Then the WienerHopf equations can be written RXyS Ms 7 VRy1d1 The question is how can this be solved Because the limit does not proceedto 0 we can t use conventional transform techni ues Here are some facts to help Suppose Sy 2 satis es the following condition 10 SYw 2dwltltxgt 00 40 This is known as the PaleyWiener condition Then it turns out that we can write S 3 S Y Y W Y where WWW SQMTQ SYM and f lSi w is zero for negative times that is it is causal and 745 w is zero for positive times that is it is anticausal Moreover S w is also causal and 1 Saw is also anticausal The proof in general is rather dif cult we will skip it but give some examples This factorization is known as the spectral factorization ECE 6010 Lecture 10 Markov Processes Basic concepts A Markov process Xt one such that 1thr1 xk1 Xtk 95k Xtk1 xk71Xt1 x1 1thr1 xk1ith 951C for a discrete random process or fxtk1 th xkxt1 x1 fxtk1 th 951C for a continuous random process The most recent observation determines the state of the process and prior observations have no bearing on the outcome if the state is known Example 1 Let X be iid and letS X1 M 594 Xn Then PSn1 374415 371 Sl 31 PXn1 Sn175 PSn1 374415 Sn 1 Example 2 Let Nt be a Poisson process PNtk1 nk1 Ntk mg i i i Nt1 771 events in tk17tk PNtk1 nk1 Ntk mg 1 Let Xi be a Markov rp The joint probability has the following factorization 13th 953th x2Xt1 x1 13th 953 th 952PXt2 x2 Xt1 951PXt1 951 Why Discretetime Markov Chains De nition 1 An integervalued Markov random process is called a Markov chain 1 Let the timeindex set be the set of integers Let 1010 PiXo 1 be the initial probabilities Note that Ej pj 0 1 We can write the factorization as Pan in a JXO Z390 Pan inanil Zln71quot39PX1 Z391iX0 ioipiXo 2390 If the probability PXn1 y39an is does not change with n then the rp X7 is said to have homogeneous transition probabilities We will assume that this is the case and write pzj Pan1 jan Note Ej PXn1 y39an 1 That is Ejplj 1 We can represent these transition probabilities in matrix form 1000 p01 1002 P p10 p11 p12 ECE 6010 Lecture 10 7 Markov Processes 2 The rows of P sum to 1 This is called a stochastic matrix Frequently discretetime Markov chains are modeled with state diagrams Example 3 Two light bulbs are held in reserve After a day the probability that we need a light bulb is p Let Yn be the number of new light bulbs at the end of day n Draw the diagram 1 Now let us look further ahead Let PM PanIc jlxk 239 lfthe rp is homogeneous thenpljn PXC lel Let us develop a formula for the case that n 2 PX2 jX1 lX0 2 PX0 PlX2 lei llPlXi llXo Z39lPlXo il PX0 pzl1plj1 pzlplj PX2 jX1 1le 239 Now marginalize PX2 leo 239 ZHXQ jX1 llXo 239 EMF 1 1 Let P2 be the matrix of twostep transition probabilities Then we have P2 P1P1 P2 In general by induction we have Pm Pquot Let PXn o 1301 PXquot1 or whatever the outcomes are Then pm PXn j ZPm leH 239PXH 239 pljpxn 71 Stacking these up we obtain the equation pm 13n UP 1371 130 We we run the Markov rp for a long time what happens to the probabilities That is what is pj as n A 0 Let us denote 7 nggop nl If there is a limit the probability vector 7139 should satisfy 77739rP PTTrT TrT This is an eigenvalue problem ECE 6010 Lecture 10 7 Markov Processes 3 ContinuousTime Markov Processes Let is still deal with discrete outcomes If X t is homogeneous then PXSt leS j POW le0 i Let pljt PXt le0 and form amatrixPt plj with P0 1 Example 4 Suppose X t is a Poisson counting process t H r pjt PM 7239 events in t seconds 8 j gt 239 j 7 2 Then P 5 Ate At25 t2 5 Ate MPB MZ 1 Let us now consider the question of how long the rp remains in a state Let T be the time spent in a state 239 The probability of spending more than t seconds in a state is PTl gt 2 Suppose that the process has been in state 2 already for 3 seconds What is the probability that it remains fort more seconds PTzgttsszgt3PTlgttlea20 alts But recall that X t is Markov PTl gt tslTz gt s PTz gt tlea 239 0 S a S s PTl gt tles PTl gt t Such a process is said to be memoryless Let us look at these computations again PTlgttsTlgts PTlgtts PTlgttslTlgtsWW We have seen that this probability must be P T gt t PTlgtts 7 rm There is thus a sort of cancellation that takes place The only distribution which has this property is the exponential PTl gt t 5 Using this we have 57w 7 7 5 7 A 2 542s So the waiting time for a Poisson rp is exponential We have derived this another way in the homework This result has the following rather curious interpretation The amount of additional time you have to wait does not depend on the amount of time you have already waited We can describe the operation of a continuoustime Markov chain as follows 1 Enter a state 239 2 Wait a random amount of time T1 this random variable is continuous ECE 6010 Lecture 10 7 Markov Processes 4 3 Select a new state according to a discretetime Markov chain with transition proba bilities we will call fzj 4 Repeat In discrete time we have the probability update pk 1 pkP We will develop an analogous result for continuous time Instead of a set of couple difference equations we will get a set of coupled differential equations Let 6 be a small time increment PTl gt 6 57V 17 116 06 The probability that we remain in the same state at time 6 is pu6 PTl gt 6 l7 V16 06 or 1 7 pu6 116 06 Now consider the transition When leaving state 239 we move to state j with probability 13sz 1005 7 17 Pzz5l zj 7 1051311 05 H leave state Let Yzj Vz zj 1005 7 V05 05 We say that 4M is the rate at which X t enters state j from state 239 De ne 4quot 711 so that 17101103 7 74116 06 pu6 7 l 4116 06 Summarizing what we have so far 10116 7 1 7 m6 06 1005 7 V05 05 Divide by 6 and take the limit 10116 71 7 133 76 7 vquot 1015 133 JT 7 w Now de ne pjt PXt j Then we have pjt6 PXt6 7 j 7 ZPXt5 7 NW 7 01300 7 Z39 7 21011032016 and 1010 5 7 MO 7 Zm m 7 MG 7 22011032010 pjjtpjt 7 MG 17139 7 21 510 13 10110 7 Um 17139 Divide both sides by 6 and take the limit 2096 7 Zwmt V1120 t 7 Zwmt zj z ECE 6010 Lecture 10 7 Markov Processes 5 Example 5 Let us model a twostate system having an idle state and a busy state In the idle state the system is waiting for work to arrive Assume that the waiting time is an exponential rv with mean 1 a In the busy state the machine works for a random amount of time with an exponential distribution having mean 1 We can think of the rate of motion from idle to busy as oz and the rate of motion from busy to idle as The underlying discretetime Markov chain has 0 1 Q 7 i1 oi when the transition occurs there is no ambiguity We nd that Yoo 01 V01 011101 a V10 51110 Yu 3 We obtain the coupled differential equations WW W 10103 a 5 P103 WAMW We also have an auxiliary equationp0t p1t 1 or ptT1 1 We can solve this as 1313 explAtipm Somewhat more explicitly 3133 130 ANS 31 7 Aps 130 p351 lA 1p0 317A 1S0 71s a s Sas a Sa so 133 8 p10 mam 3 OMS 04010 s ap20 Now it is a matter of straightforward but careful computation to show that pom fig 2000 7 paw 1 0 0 t 7 0 wr pd a m a 5 Inthe limit 0 W o aw W o W What are the steadystate conditions in general 0 sumpup ECE 6010 Lecture 10 7 Markov Processes 6 Since 4 ivj we can write 1ij Z Yszz w and since V2 2w 2 we can write 2012710 2421202 w 2 Classes of States De nition 2 State j is accessible from state 2 if pljn 2 0 for some n More informally a path from 2 to j exists in the state diagram States 2 and j communicate if they are accessible to each other Two states are said to be in the same class if they communicate with each other A Markov chain with a single class is irreducible Xi STD 8 Example 6 3 classes V 1 Example 7 One class irreducible m 1 Example 8 Classes 1 Example 9 One class 9 1 De nition 3 A class is recurrent if the process returns to the state with probability 1 1 Let fl Pever returning to state Then state 239 is recurrent if fl 1 If fl lt 1 then state 239 is said to be transient o If started in a transient state then the state does not recur an in nite number of tim es ECE 6010 Lecture 10 7 Markov Processes o If in a recurrent state then the state recurs an in nite number of times Let X n denote the Markov chain with X 0 239 Let lif X 2 Mac OotherW1se Then Enumber ofretuIns to state E 11XnX0 2pquot 1 n1 We see that recurrent means that 221 pu 0 Transient means that 221 pquotn lt 0 Example 10 712 g P0001 12 Zp00n1lt 0 so state 0 is transient What if we start in state 1 a17ai quot F1101 a This sums to 0 Example 11 1531181181181 PM 1710 Zpum lt 0 Example 12 Random walk 17p 17p 17p 17p ECE 6010 Lecture 10 7 Markov Processes 8 Start in state 0 We can return if we make as many righthand moves with probability p as lefthand moves with probability 1 7 p The total number of moves left and right must be an even number take the total number as 222 There are ways ofmaking 22 RH moves 2n poem 7 le 7 p n Summing to see if transient 00 00 Zn 7 n 7 n 2200001 7 Z lt n 1 P 721 721 How to sum this We can get a good approximation using Stirling s formula 21 m 2wnn e Then 2 1 a W4quot and mm m 4plt7 gt Now sum 00 00 2mm 2 W Still a little hard But take the particular case of p 12 Then we get i 00 why n1 I 7W7 lfp y 12 then 4p1 7 PM lt l and the sum converges Observation The states of an irreducible nitestate Markov chain are all recurrent Limiting probabilities If all states are transient then all the state probabilities approach 0 as n 7gt 0 If a MC has some transient classes and some recurrent classes then eventually the process enters and remains in one of the recurrent classes For limiting purposes we can focus on individual recurrent classes Suppose a MC starts in a recurrent state 2 at time 0 Let 1111 1111 1112 denote the times when the process returns to state 239 where 111k is the time that elapses between the k 7 1th and kth returns The T1 form an iid sequence The proportion of time spent in state 2 after k returns is k T2lt1gt M2 TM In the limit 1 A rt ft t t t 39 7 propo iono ime spen in s a e 2 7 EH1 m by the law of large numbers where ETl is the mean recurrence time If ET lt 0 we say that state 239 is positive recurrent 2r gt 0 If ET 0 we say that state 239 is null recurrent 2r 0 Example 13 ECE 6010 Lecture 10 7 Markov Processes 9 The MC returns to state 0 in two steps with probability 12 and in four steps with proba bility 12 The mean recurrence time is 1 1 ET0 i2 4 3 State 0 is positive recurrent 7r0 13 1 Example 141n the random walk with p 12 the process is recurrent However it can be shown that the mean recurrence time is 00 in this case This means that the process is null recurrent 1 We can nd the 7er using 7139 TrP and solve this if the number of states is nite Summary The proportion of time spent in state j is 7rj State j Transient Recurrent 7rj 7 Positive recurrent NU recurrent 7fj gt 0 7rj D Aperiodic Periodic lining pjj 7W 11mn ijjltndgt d7rj ECE 6010 Lecture 1 Introduction Review of Random Variables Readings from GampS Chapter 1 Section 21 Section 23 Section 24 Section 31 Section 32 Section 35 Section 41 Section 42 Section 44 Section 45 Why study probability 1 Communication systems Noise Information 2 Control systems Noise in observations Noise in interference 3 Computer systems Random loads Networks Random packet arrival times Probability can become a powerful engineering tool One way of viewing it is as quanti ed common sense Great success will come to those Whose tool is sharp Set theory Probability is intrinsically tied to set theory We will review some set theory concepts We will use 5 to denote complementation of a set with respect to its universe U 7 union A U B is the set of elements that are in A or B W 7 intersection A W B is the set of elements that are in A and B We will also denote this as AB 11 e A a is an element ofthe setA A C B A isasubsetofB ABACBandB CA Note that A U AC 2 Q Where Q is the universe Notation for some special sets 1R 7 set of all real numbers Z 7 set of a11 integers Z 7 set of all positive integers N 7 set of all natural numbers 012 1Rquot 7 set of all n tuples of real numbers C 7 set of complex numbers De nition 1 A eld or algebra of sets is a collection of sets that is closed under comple mentation and nite union 1 That is if is a eld and A e 7 then A5 must also be in f closedunder complemen tation HA and B are in 7 which we will write as A B e f then A U B e 7 Note the properties of a eld imply that f is also closed under nite intersection CDeMorgan s law AB z A5 U BC 5 De nition 2 A a eld or aalgebra of sets is a eld that is also closed under countable unions and intersections 1 What do we mean by countable A set with a nite number in it is countable A set Whose elements can be matched oneforone with Z is countable even if it has an in nite number of elements Are there noncountable sets Note For any collection F of sets there is a a eld containing F denoted by 0F This is called the a eld generated by F ECE 6010 Lecture 1 7 Introduction Review of Random Variables 2 De nition of probability We now form ally de ne what we mean by a probability space A probability space has three components The rst is the sample space which is the collection of all possible outcomes of some experiment The outcome space is frequently denoted by 52 Example 1 Suppose the experiment involves throwing a die 9 1134116 1 We deal with subsets of S For example we might have an event which is all even throws of the die or all outcomes 3 4 We denote the collection of subsets of interest as f The elements of 7 that is the subsets of Q are called events 7 is called the event class We will restrict to be a a eld Example2LetQ 12 34 5 6andletf 1 2 3 4 5 6 2 4 6 1 3 5 What do we need to nish this off 1 Example 3 Q 1 2 3 We could take 7 as the set of all subsets on f 9L 1 2 3 1 2 1 3 2 3 1 2 3 This is frequently denoted as f 29 and is called the power set of Q 1 Example 4 Q R f is restricted to something smaller than all subsets of Q So that probabilities can be applied consistently 7 could be the smallest a eld which contains all intervals ofR This is called the Bore eld B 1 The tuple S f is called a preprobability space because we haven t assigned prob abilities yet This brings us to the third element of a probability space Given a pre probability space S f a probability distribution or a measure on S f is a mapping P from f to R which we will write P f gt R with the properties PQ 1 this is a normalization that is always applied for probabilities but there are other measures which don t use this PA z OVA 6 7 Measures are nonnegative lfA1A2 H e f such that AiAj z 9 for alli j that is the sets are disjoint or mutually exclusive then PU1A EMA i1 This is called the aadditive or additive property These three properties are called the axioms of probability The triple Q 7 P is called a probability space 32 tells what individual outcomes are possible 7 tells what sets of outcomes 7 events 7 are possible P tells what the probabilities of these events are Some properties of probabilities which follow from the axioms ECE 6010 Lecture 1 7 Introduction Review of Random Variables 3 1 PAC1 PA 2 1179 0 3 A c B gt PA g PB 4 PA UB PA PB PAB IfA1A2 e fthenPltU 1Ai g 21PAi Lquot There is also the continuity of probability property Suppose A1 C A2 C A3 this is called an increasing sequence De ne lim Aquot UiglAi E A n gtOQ We write this as Aquot f A An converges up to A Similarly if A1 3 A2 3 A3 a decreasing sequence de ne lim Aquot lAi EAquot L A n gtOQ IfAn t A then PAn t PA IfAn L AthenPAn L PA Example 5 Let S f R B 1 Take Aquot z 00 lnnl2 Then Aquot L 00 0 So P00 0 n1ime00 1ngt 1i13P00x 2 Take Aquot z 00 lnn 212HW ThenAn f oc0 P00 0 quotlgan00 1ngt 111 P00 x We will introduce more properties later Some examples of probability spaces 1 Q w1 w2 Hrwn adiscrete setofoutcomes f z 29 Letp1p2rupnbea set of nonnegative numbers satisfying 2le p 1 De ne the function P f gt R by PWZM w eA Then S f P is a probability space 2 Uniform distribution Let Q 0 l f z B071 smallest a eld containing all intervals in 01 We can take Without proving that this actually works i but it oes wavzw m forOfafbgland Punions of disjoint intervals sum of probabilities of individual intervals ECE 6010 Lecture 1 7 Introduction Review of Random Variables 4 Conditional probability and independence Conditional probability is perhaps the most important probability concept from an engi neering point of view It allows us to describe mathematically how our information changes when we are given a measurement We de ne conditional probability as follows Suppose Q 7 P is a probability space and A B e 7 with PB gt 0 De ne the conditional probability of A given B as PAB PB PAB Essentially what we are saying is that the sample space is restricted from 32 down to B Dividing by P B provides the correct normalization for this probability measure Some properties of conditional probability consequences of the axioms of probability are as follows 1 PAB z 0 2 PQB1 3 ForAl A2 6 fwithAlAj z fori j Hug1143 ZPltAi 13gt izl 4 AB 9 gt PAB 0 5 PBB 1 6 A C B gt PAB z PA 7 B CAgtPAB 1 De nition 3 A1A2 A e f is a partition of S if AiAj z 9 for 139 74 j and UleAi 2 and PA gt 0 1 Example 6LetQ z 1 23 45 6 andA1 1 A2 2 5 6 A3 3 4 11 The Law of Total Probability lfA1 Aquot is a partition on and A e 7 then M PM ZPAiAiPAi i1 CDraw picture Bayes Formula is a simple formula for turning around the conditioning Because conditioning is so important in engineering Bayes formula turns out to be a tremendously important tool even though it is very simple We will see applications of this throughout the semester Suppose A B e f PA 0 and PB 0 Then PBAPA pltAB Why De nition 4 The events A and B are independent if PAB PAPB ECE 6010 Lecture 1 7 Introduction Review of Random Variables 5 1 Is independent the same as disjoint Note For PB gt 0 ifA andB are independentthen PA 13 z PA Since they are independent B can provide no information about A so the probability remains unchanged If PB z 0 then B is independent ofA for any other eventA e 7 Why De nition 5 A1 Aquot e f are independent if for eachk e 2 H n and each subset i1ikofln k PmljzlAij H PAj j1 Example 7 Take n z 3 Independent if PA1A2 PA1PA2 PA1A3 PA1PA3 PA2A3 PA2PA3 and PA1A2A3 PA1PA2PA3 11 The next idea is important in a lot of practical problem of engineering interest De nition 6 A1 and A2 are conditionally independent given B e f if PA1A2B PA1BPA2B 11 draw picture to illustrate the idea Random variables Up to this point the outcomes in 32 could be anything they could be elephants computers or mitochondria since 32 is simple expressed in terms of sets But we frequently deal with numbers and want to describe events associated with sets of numbers This leads to the idea of a random variable De nition 7 Given a probability space S f P a random variable is a function X mapping 9 to R That is X 2 gt R such that for each a e R weQXw aeF 11 AfunctionX 2 gt Rsuch that w e S Xw f a e 7 that is suchthatthe events involved are in f is said to be measurable with respect to 7 That is f is divided into suf ciently small pieces that the events in it can describe all of the sets associated with X Example 8 LetQ z 1 23 45 6 f l 2 3 4 5 6 9 9 De ne 0 wodd l wotherwise X0 Then for X to be a random variable we must have w 6 9W1 11 ECE 6010 Lecture 1 7 Introduction Review of Random Variables 6 to be an event in f 9 alt0 w 9XwSa 135 0 alt1 Q 1121 So X is not a random variable i it not measurable with respect to f The eld 7 is too coarse to measure if u is odd On the other hand let us now de ne 0 w E 3 lwgt3 X0 Is this a random variable 1 Generate partitions of the underlying sample space 2 mnrlnm var which are not events in the a eld 7 Another way of saying thatX is measurable For any B e B w QXw GB 67 Recalling the idea of Borel sets B associated with the real line we see that a random variable is a measurable function from Q 7 to R B X 2 gt R B We will abbreviate the term random variable as rv We will use a notational shorthand for random variables De nition 8 Suppose 517 P is a probability space andX Q 7 gt R B For each B e B we de ne PX e B Pw e S Xw e 3 1 By this de nition we can identify a new probability space Using R B as the pre probability space we use the measure PXB PX e B So we get the probability space R B PX As a matter of practicality if the sample space is R with the Borel eld must mappings to R will be random variables To summarize 22 f P Xgt R B PX where PXB Pw 6 9W0 6 Bl forB e B Distribution functions The cumulative distribution function cdf of an rv X is de ned for each a e R as FX11 PX S a Pw 6 9W0 5 al PXOO l1 Properties of cdf ECE 6010 Lecture 1 7 Introduction Review of Random Variables 7 1 FX is nondecreasing lfa lt b then FXa f FXb 2 llmaquXQl l 3 lima30 FXa z 0 4 FX is rightcontinuous limba FXb z FXa Draw typical picture These four properties completely characterize the family of cdfs on the real line Any function which satis es these has a corresponding probability distribution 5 Forb gt a Pa ltX Sb FXb FXa 6 PX a0 FXa0 lim a0 0 agao Fx 1 Thus if FX is continuous at 110 PX From these properties we can assign probabilities to all intervals from knowledge of the cdf Thus we can extend this to all Borel sets Thus F X determines a unique probability distribution on R B so F X and PX are uniquely related Pure types of rvs 1 Discrete rvs i an rv whose possible values can be enumerated 2 Continuous rvs i an rv whose distribution function can be written as the regular integral of another function 3 Singular but not discrete 7 Any other rv Discrete rvs A random variable that can take on at most a countable number of possible values is said to be a discrete rv XQ gt x1x2 De nition 9 For a discrete rv X we de ne the probability mass function pmf or discrete density function by PXlt11gtPltX11gt 11 ER where pXa z 0 ifa 74 x for any rv outcome xi 1 Properties of pmfs l Nonnegativity 20 aex1x2 0 else PXa ECE 6010 Lecture 1 7 Introduction Review of Random Variables 8 2 Total probability 0 Z pXltxigt 1 3921 These two properties completely characterize the class of all pdfs on a given set x1 x2 3 Relation to cdf PXa FX11 11m FXb b Hl39 To the pdf and the cdf contain the same information Note that for a discrete rv the cdf is piecewise constant Draw picture 4 FXW z Em Mia PXltxi Example 9 Bernoulli with parameter n 0 f n f l X G 0 1 l n a z 0 pXa z n a l 0 otherwise 0 lt FXa l n 031131 2 l The Bernoulli is often used to model bits bit errors random coin ips etc 1 Example 10 Binomial n n 0 f n f l n e Z Xe01n pm n 1anal nquot a a e01n 0 otherwise The binomial n n can be viewed as the sum ofn repeated Bernoulli trials We can ask such questions as what is the probability of k bits out of n being ipped if the probability of an individual bit being ipped is 71 How do we show the total probability property Use the binomial theorem Plot the probability function 1 Example 11 Poisson A X 2 gt N Ade k a e N a z a pXlt gt 0 otherwise The Poisson distribution is often used to model the number of rare events occurring in a length of time F or example what is the number of photon emmissions from a substance over a period of time What is the number of cars passing on a road Later in the course we will clarify the assumptions and derive this expression How do we nd 220 PX k 1 ECE 6010 Lecture 1 7 Introduction Review of Random Variables 9 Continuous 1 vs A rv X is said to be continuous is there is a function f X R gt R such that a mm fXltxgt dx 30 for all a e R In this case F Xa is an absolutely continuous function1 Properties 1 fXx Z 0 2 30 fXxdx 1 These two properties completely characterize the f Xs fX called the probability density function pdf of X 3 PX e B B fXxdx PXB B e B 4 PX a z 0foralla e R 5 PX G x0x0 Ax fx0Ax PX 6 dx PXdx z dP z fxdx Example 12 Uniform on at 8 with 8 gt at ocltxlt8oroc x 8 fXx 5 H otherwise Plot pdf and cdf Uses random number Phase distribution 1 Example 13 ExponentialA A gt 0 Ae x z 0 fXltxgt 0 otherwise Plot pdf and cdf Uses Waiting time We ll see later what we mean by this 1 Example 14 Gaussian normal N01 02 l fXx explt ltx mzzaZ V2710 Plot Uses Noise Sums of random variables This is the most important distribution we will deal with The cdf Let Z N0 1 De ne l ltIgtx F206 30 m e zz2 dz We also use 1 3 zZ2 Qxl ltIgtxPZgtxi 6 dz xZTL x 1 1A anction F is said to be absolutely continuous if for every 6 gt 0 there exists a 3 such that for each nite collection ofnonoverlapping intervals 11 b1 i 1 k 2211 Fb1 7 F011 lt 6 if ELI b 7a lt 3 1 p 433 Being absolutely continuous is a stronger property than being continuous ECE 6010 Lecture 1 7 Introduction Review of Random Variables 10 Properties of Gaussian Random Variables LetX N1 02 i IfY 04X 3 then Y NocJ 3 04202 X MW Nlt0 1 FXltagt ltIgtltlta We ltIgt x 1 ltIgtx 59 5 A Gaussian rv is completely characterized by its rst two moments ie the mean and variance We will also see and make use of many other properties of Gaussian random processes such as the fact that an uncorrelated Gaussian rp is also independent References l P Billingsley Probability andMeasure New York Wiley 1986 ECE 6010 Lecture 5 Sequences and Limit Theorems Convergent sequences of real numbers and functions De nition 1 Let x1 x2 be a sequence of real numbers This sequence converges to a pointx e R if for every 6 gt 0 there is an N e Z such that ixquot x lt e for all n 2N We write xquot gt xorlimnooxn x 1 For real numbers which are complete a necessary and su lcient condition x fle converges gt lim sup ixquot xquot 0 quot90 mgtn The latter condition says that xn is a Cauchy sequence De nition 2 Suppose f1 f2 H is a sequence of mcl ians 2 gt R This sequence converges pointwise to f 2 gt R if fx gt fx for everyx e 52 That is for every x e S ande gt 0 there is anN e Z suchthat fnx fx lt e for alln z N 1 It may be necessary to choose a different N for each x De nition3 We say that fquot converges uniformly to f if for each a gt 0 there is an N e Z suchthat lfnx fx lt e for alln z Nandfor all x e Q 1 Modes of convergence of sequences of rvs Suppose X1 X2 is a sequence of random variables de ned on Q 7 P How can we de ne a limit of this sequence As it turns out there are several different and inequivalent ways of de ning convergence Almost sure convergence This is a very strong form of convergence and usually quite dif cult to prove De nition 4 A sequence of rvs Xn 21 converges almost surely as to the rv X if P620 l where 520 w E Q Xnw gt Xw This is also called convergence with probability 1 1 One tool for showing as convergence is the following fact Xquot gt X as if and only if P lim sup Xn X 0 l quot mmw39l Example 1 Let Q 01 7 B0 1 Let Xnw rte quotw w e 01 andn e Z Note that Xnw gt 0 for all u e 0 l Xn0 gt n diverges So if P0 0 then Xquot gt 0 as But if P0 gt 0 then Xquot doesn t converge in the almost sure sense 1 ECE 6010 Lecture 5 7 Sequences and Limit Theorems 2 Mean square convergence This is a strong mode of convergence which is usually easier to show than as It is widely used in engineering De nition 5 The sequence X n 21 converges to the rv X in the meansquare sense if lim EXn X2 0 n gtOQ 1 We write Xquot gt X ms or Xquot gt X qm quadratic mode There is a Cauchy criterion for ms convergence If EX lt 00 for all n e Z then X n converges in meansquare if and only if lim sup EX Xn2 0 quot mmgtn Example 2 LetQ 0 l f B0 l and P is uniform Pa 2 b 11 Let n we01n3n EZ X n w 0 otherwise Then 1 1 EX3 n2P0 1n3 02Pln3 1 712 7 gt 0 n n So Xquot gt 0 ms What about as convergence in this case 1 Here is an interesting fact len gt X ms and Xquot gt Y as thenX Y as Convergence in Probability De nition 6 The sequence Xn 21 converges toX in probability ip if POX X gt e gt 0 asn gt ooforeverye gt 0 Equivalently we say that POX X s e gt 1 1 Example 3 LetQ 0 l f B0 l and P is uniform Let Xquot n w e 0ln 0 otherw1se Note Xquot gt 0 as butXn does not converge in ms PXn 0 gt e P0 ln ln gt 0 so Xquot gt 0 ip 1 Convergence in Distribution De nition 7 The sequence X quot11 converges in distribution or in law to the random variable X if FXK x gt FXx at all continuous points of FX 1 Example 4 Q 0 l f B0 l P is uniform Let lwzln X quot 0 wltln ECE 6010 Lecture 5 7 Sequences and Limit Theorems 3 Then FXnxgt1ngtuxgt11ngtux 1 Then X n gt X where F Xx ux 1 Draw the distributions 1 Note from this example that the X n values don t really approach any value 7 the values are still 1 and 0 This is in distinction to the rst three modes of convergence in which Xn X gt 0 in some sense By the de nition of this mode of convergence we don t have to worry about the points of discontinuity of F X Example 5 Let Xquot ln for all w e Q So it doesn t matter what the underlying P is The pointwise convergence is hm FXKOC l x gt 0 nota cd f n gtOQ 0 x 3 0 Why isn t this a cd f Take 1 x gt 0 F WC 0 x lt0 This is a cd f but different from lim F X However the difference is at a point of disconti nuity Hence X n gt 0 in distribution 1 Why and Which We have de ned several differentmodes of convergence Why so many The basic answer is that they are inequivalenti one does not meet all the analytical needs Some are stronger than others Xquot gt X ms gt Xquot gt X ip Xquot gt X as gt Xquot gt X ip Xquot gt X ip gt Xquot gt X in distribution So convergence in distribution is weaker than ip ms or as In general none of the implications can be reversed And ms and as do not imply each other Venn diagram 7 dist on the outside then ip with ms and as overlapping inside Proof oan gt X ms gt Xquot gt X ip By Chebyshev s inequality for every 6 gt 0 PX X gt e g EX X262 SoifEX X2 gt0thenPX Xgte gtOforalle 1 Proof oan gt X as gt Xquot gt X ip Chooses gt 0 Write Bquot w e S supmgtn Xm w Xw gt a This isadecreasing family of sets Suppose m lt m and w EABnZ Then by the de nition to 311 so Em C Bm Note that 2an limnam Bquot Now consider the set Z w e 00 Xw 71gt Xw For a given a we see that 2an is a subset of Z Since Xquot gt X as we have PXn 79 X 0 Thus Plim Bquot 3 PXn 79 X 0 It Now notice that w Xn X gt e e 3 since we are looking at only one point and not supman So PX X gt e E PB which we just showed gt 0 11 ECE 6010 Lecture 5 7 Sequences and Limit Theorems 4 Proof oan gt X ip gt Xquot gt X in distribution Suppose Xquot gt X ip Choose a gt 0 and let x be a continuity point of FX Then FXx ePX x ePX x EXn3xPng EXngtx J FXnx PX g x eX g xPX gt x eX g x 8 Solving for the bracketed term in the second and substituting it into the rst we obtain FXx e FXnxPX Ex eX gtx PX gtx eX 3x 3 FXnx PX E x eX gt x Observethat XEx E Xn gt35 C an Xi gt5 for example let Xquot x 81 andX x e 82 with 81 gt 0 and 82 z 0 Then Xn X le8182l gt e sothatPX Ex Xn gt x s PXn X gt 5 Thus FXx e g FXnx PX x gt 6 Similarly FXKOC S FXOC E 13an Xl gt 6 Since we have convergence in probability limpqu PX X gt e 0 so FXx e 3 lim FXnx quot00 Similarly also FXx e 2 lim FXnx n gtOQ Combining these FXx e 3 lim FXnx E FXx e n gtOQ Since x is a continuity point of F X and e is chosen arbitrarily we can write lim FXx e FXx 3 lim FXnx E FXx lim FXx 6 E gt0 71 00 E gt0 So F X x gt F Xx Convergence in distribution 1 Some examples of invalid implications To see which modes are stronger than others we can consider some counterexamples Example 6 LetXn gt X ip Can we say that Xquot gt X ms Let Q 7 P 0 l B0 luniform Let n w e 0 ln Xn 0 otherw1se We ve shown thatXn gt 0 ip but EX gt 00 so Xquot 79 0 ms Since Xquot gt 0 as we also see that as ms 1 Example 7 Does ip imply as De ne a sequence of rvs as follows on Q 0 l X1w l ForX2X3 divide into two parts 0 12 12 1 withX2w l on the rst half and X3w l on the second half For X4 X5 X5 X7 split into fourths with X4w l on the rst fourth etc Poxquot 01gt e PXn 1 which decreases at a rate approximately 1 log n as n gt 00 So Xquot gt 0 ip ECE 6010 Lecture 5 7 Sequences and Limit Theorems 5 However for as convergence we see that X n alternates non uniformly between 0 and 1 So Xquot 71gt 0 as Note that this example also converges in ms because the 2ndmoment is PXn l k l log2n gt 0 1 Example 8 What about convergence in distribution and convergence ip LetX N0 1 anan lquotX Note thatXn N0 1 So FXK Fx foralln But Plt1Xn X gt E OPGX gt 62 n odd 11 even So it does not gt 0 for all 639 it alternates All the other modes of convergence depend on joint distributions but convergence in distribution depends on marginals which don t tell us the whole picture 1 Some other relationships 1 len gt X ip then there is a subsequence Xnkhfil such that limk00 Xnk X as 2 len gt X and there is arv Y with nite second moment such that Wquot E Y as for every n e Z thean gt X ms 3 len gt C in distribution thean gt C ip Limit Theorems Laws of Large Numbers Suppose X1 X2 is a sequence of rvs We are often interested in sums 2le X as n becomes large What can we say about such sums Suppose all X have the same means 11 E X i u and are uncorrelated We would expect the average Z 2le X to approach 11 in some way as n gt 00 lfvarx lt 00 consider 1 n 7 E X u n 11 Let us look at ms convergence 1 quot 2 1 2 E 222n u E 2200 10 11 1 1 n7EZltX1 m l 1 772EZZltX1 ugtltXj w 1 J 1 1 quot n7 ZZcovltXiXJ n7 ZcovXXj i j i1 1 n n7 E varX i1 ECE 6010 Lecture 5 7 Sequences and Limit Theorems 6 summarizmg If E Xi M and Xi are mutually uncorrelated and have nite variance 2121 V3139Xi gt 0 so that 1 quot 1 quot 7 ZXi gt pms gt 7 ZXi gt u1p n i1 n i1 This is an example of a weak law of large numbers De nition 8 Suppose Xii 1 is a sequence of rvs and blL9 is a sequence of reals diverging to 00 Then X ho satis es a weak law of large numbers W39LLN if there is another sequence 11 i 1 of real numbers such that 1 quot a ZXi 11 gt 01p i1 In the example we just gave 1 n and 11 11 De nition 9 A strong law of large numbers is the same as the preceding de nition except that convergence is almost sure as 1 Kolmogorov s Strong Law De nition 10 An in nite sequence of rvs is independent if every nite subcollection of the rvs is independent 1 Theorem 1 Kalmagarov s Strong Law Suppose X n 9 is a sequence of independent rvs with nite quot1 meansfar each 139 I n varX Z T lt 00 izl 1 then 1 n FZXi an gt 0as quot i1 where Zquot i1 Mi an 71quot Example 91Hquot n and 11quot M then Kolmogorov s law implies n n varX l 2711 lt00gtZX gtuas 11 11 Note that in the case that all the variances are bounded eg varX lt 02 lt 00 for alli then so so varX 2 1 Z 7i s a Z 72 lt 00 11 11 So if the variances grow sublinearly the theorem can apply We can get an even stronger conclusion ECE 6010 Lecture 5 7 Sequences and Limit Theorems 7 Theorem 2 Kinchine s Strong Law of Large Numbers Suppose Xii 1 is an i id sequence i e a sequence ofiid rvs with nite mean lElXillll1 l lt 00 Then the sample mean converges almost surely to the ensemble mean 1 n 7 ZXi gt p as n 11 Proving these types of theorems The proofs follow from more general limit theorems De nition 11 Let Anz1 be a sequence of events The limit superior lim sup of An is lim sup Aquot 021 Uiin Ak n This is the set of all points that are in An in nitely often 1 So u e lim supquot An gt w is in in nitely many of the sets Aquot It keeps coming back Another notation is lim supquot Aquot Aquot io in nitely often We observe that ifAn T A or Aquot l A then Aquot io A Lemma 1 The Borel Cantelli lemma This is frequently a good problem for math quali ers I PA lt 00 then PAn io 0 Thatis PAn gt 0 2 ConverselyIfAz1 are independenteventsandZ iquot1 PA octhenPA io 1 Proof 1 1310 Ak C UiinAk for alln So 2 A io 00 PltAn 10 s man1k ZPltAkgt gt 0 kzn asn gt 00 iii PAk lt 00 So PAn io 0 ifziil PAk lt 00 2 Using DeMorgan s law An LOlc U211 013quot A Pickn and N with n lt N Consider N PmlfinAi H PAi by independence N N H1 PAk g Te PM since 1 x g equotC kzn kzn N exp Emmi kzn If 221 PAk diverges then 22quot PAk diverges too and thus N 133nm exp Zia1w gt o kzn ECE 6010 Lecture 5 7 Sequences and Limit Theorems 8 So ngnumiin 0 for all n ie Pmiin i 0 for all n Now lim supquot Aquot is just the union of all of those intersections so 00 PltU211 mi 2 2 Z szin i 0 quot1 so thatPA io l Kolmogorov s Inequality Suppose X1X2 i i i are independent with zero means and nite variances De ne Squot to be the running sum n Squot 2X kzl Then for each at gt 0 1 P max SH 2 at g 72varSni 15kgquot at This is a lot like the Chebyshev inequality but instead of looking at the variance of all of the terms we simply look at the variance of the last one Central Limit Theorems Theorem 3 Central Limit Theorem Suppose Xn is a sequence ofiid random variables with mean mu lt 00 andvariance 02 lt 00 Then 1 n E ZltXi u gt X in distribution i1 where X M0 02 l n t 2 2 P 7 X ugx gtf rm dt WE i a The main point Sums of iid random variables tend to look Gaussian To work our way up to this here are a couple of lemmas That is Lemma 2 Suppose Xn is a sequence ofrvs with characteristicfunctions 115quot Ifthere exists a rv X with chf 1 such that hm 11 11W n gtOQ for all u e R then Xquot gt X in distribution ECE 6010 Lecture 5 7 Sequences and Limit Theorems 9 Lemma 3 SupposeX is a rv with EX2 lt 00 Then 115 has the expansion 2 2 Xu l quX 3EX 8u where limuao 8u 0 Proof of the Central Limit Theorem For convenience Wolog take p 0 De ne quot en squot 111 Xi mm Eiexpltiusngt1 EexpltiuJr7 XXIgt1 HEiexpiiu Xiii i i1 WWWMquot 1 iuxZM uxZ22EXi2 5ux quot 1 u22nlta2 satmmquot From elementary calculus we recall that 1 anquot gt elim mquot Thus mu gt exp nlgnOQlt u22lta2 8ltuJ gtgtgtl explt azu22gti This is the form of a characteristic function of a Gaussian with zero mean 1 Summarizing if X k ahas zero mean and variance 1 1 n 7 ZXk gt 0 as n i1 n 10 gt Mo 1 in distribution ECE 6010 Lecture 4 Change of Variables Reading from GampS Section 47 48 49 410 411 Changing variables One dimension A simply invertible function Let Y gX where X is a continuous rv and g is a onetoone onto measurable function Then Fyy PY S y P9X S y PX S 9 10 Fx9 1y So we can determine the distribution of Y Let us now take a different point of view that will allow us to generalize to higher dimensions and develop and understand a commonly used formula Consider an interval along the X axis Px 3 X S xdx m Suppose the function gx has a positive derivative The interval along the Y axis when Y gX is dy m dz at the point x The probability that X falls in its interval is the probability that Y falls in its interval Px X xdxPy Y ydg where y or equivalently x g 1y Then fxxdx fYydy That is hmnm dag fx9 1y 1971 y dy If we take the other case that gx has a negative derivative we have to take fX zdx fy Combining these together we obtain hmkW g Example 1 LetY aX I Then hm nwiww Example 2 Suppose fX gig 7 Cauchy Let Y 1X Then 1a7r fY W Cauchy 11 ECE 6010 Lecture 4 7 Change of Variables 2 Example 3 Suppose X N 7101 b with 0 lt a lt I Then fXx i for x E 11 Let Y 1X Then 1 1 1 fYWf0rg ltylt E 1 Example 4 Let Y 5X Then 1 fyY gfx ny y gt 0 IfX NW 02 then fyy 1 an y 2202 yo This density is lognormal 1 Example 5 Suppose y 9a tan z or x tan 1 y This has an in nite number of solutions However if we take as E fa2 7r2 then there is a unique inverse We have 1 7 2 gltxgt7COSQ 1y Now let X N Mia2 7r2 Then 1 fad W Cauchy 1 Example 6 Suppose X has continuous distribution FX and Y FX X That is the function we use to transform is actually the cdf of X Then 9 06 F dx fxx and f lt gt f lt gt x X x f y X 1 0 lt y lt 1 Y W mac That is Y is uniformly distributed In the image processing literature this is called histogram equalization 1 Example 7 Let X N H0 1 and let Y have a speci ed cdny which we take to be continuous Let W F z 0 lt x lt1 95 Fyy Then fyy fx9 1y Since fX is 1 for all values of y and since dag dy 1fYy1fYy g dy ECE 60 0 Lecture 4 7 Change of Variables 3 putting all the pieces together we nd fYy fYy That is Y has the desired distribution The point of this is that if we can generate H0 1 we can in principle transform it to produce any other continuous distribution 1 Multiple inverses It may happen that g is not a uniquely invertible function That is for a given y there may be more than one value of x such that y For example y g x 952 then x and x 7 y are both inverses We will prove the concept for two solutions Let y 9951 9952 assuming to be speci c that the slope is positive at 951 and negative at 952 PyltYltydyPx1ltXltx1dx1Px2dx2ltXltx2 That is fYydy fXx1dI1 fXx1ldx21 From this dxl dag fYy fXx1 Ty fX962 E This is sometimes written 7 fXI1 fXI1 fY y 7 wow MM In general with n solutions x1 x2 95 we have fYy W1 lgltz1gt39 gltzngt 7 mm mm mm Example 8 Suppose X N 177r 7r and Y asinX 9 Generally the sin function has an in nite number of inverses but there are only inverses in the stated range of X We have gx acosx 9 xa2 7 32 The density ofX is fX for x E 77r 7r Then 1 1 1 1 1 27quot a2y227r a2y27 a2y2 fYy lt11 9X constant in an interval If the function g X is constant over any interval then there is no inverse nor even multiple inverses However we can still compute the distribution Let gx yl for 950 lt x 3 x1 ie constant Then PY yl Px0 lt X S 951 FXx1 7FXx0 Hence there is probability mass at the point yl This results in a cd f which is not contin uous at that point ECE 6010 Lecture 4 7 Change of Variables 4 Example 9 Suppose gx is the limiter 7b xlt 7b gm x 7b 3 x s b b 95gt I Then PY 7b PX g 7b FX7b PY b PX gt b 17FXb For7b YltbFyyFXy 1 Example 10 Suppose I x gt 0 996 71 x lt 0 Then PY 71 PX g 0 FX0 PY 1 pX gt 017FX0 We have a twovalued discrete random variable 1 Changing Variables Multiple dimensions Consider now multiple variables Let g Rquot A R where Y gX We go with the equal probability idea The probability of falling in a region in X space should be the same as the probability of falling in the corresponding region in Y space We ll draw pictures in two dimensions but the concepts apply to higher dimensions Px1 lt X1 lt 951 dx1x2 lt X2 lt 952 dam m fXx1x2dx1dx2 Suppose the region d951sz maps to the region dA in the y coordinates Equating probabil ities We have JCS311 y2dA fX961 962M961de We need to evaluate dA The region dA is a parallelepiped described by the vectors 5131 5132 5131 5132 Tandxld7mdx1 and Edam Tandem Fact Recall from calculus that that signed area of the parallelepiped described by the vectors V v1 v2 and w wl 112 is obtained from the cross product Let us express this in matrix form 1 j k V w det v1 v2 0 111 1120 In our case we have i k det did de 0 kdx1d dxrdmdm d d dxl dag dxl dag Ii m 0112de 012de 0 ECE 6010 Lecture 4 7 Change of Variables 5 The signed area is then 5131 5132 5132 5131 7 5131 5132 5132 5131 6176151951 E51952 7 Tandxldimdam 7 dxl dm 619615161962 dxldx2 Let dyz dyi dyi 7 0121 0112 J 7 det dyz 0112 This matrix of partial derivatives is called the Jacobian of the function g Back to probabilities We have fYy1 EMMA fXI1 I2d961d962 fYy1 y2lJldI1d962 fX961 962d961d962 fYy1y2lJl71fXI1962 or in general for an invertible function g fyylJl 1fxg 1y Example 11 BoxMuller transformation Let X1 H0 1 and X2 H0 1 indepen dent Let Yl WCOS2WXQ Y2 x72lnX1 sin27rX2 Then Y1N01 Y201 Many to one mappings Let Y gX1 X2 Where X1 and X2 are jointly distributed random variables For a given value of y the inverse may form a curve in x1 x2 space Let Ay denote the region in the X1X2 plane such that gX1 X2 3 y This may not be a connected region Then Y S y 9X13X2 S yfX1X2 Ag FyyA fX1X2x1I2dI1dx2 Let AAy denote the region of the X1X2 plane such that y lt gx1 x2 3 y dy then fyydy AA fX1X2ltxhx2gtdx1dm Example 12 Let Z X Y The region in the my plane such that x y S 2 is the part of plane below the line as y 2 We have Fzz 1201 nyx ydxdy ECE 6010 Lecture 4 7 Change of Variables 6 Differentiating this with respect to 2 we have 00 fz2 fXYZ yay y 700 Independence 1 Example 13 Let Z X Y The region of the plane such that xy S 2 can be determined as follows Fix 2 For y gt 0 we want the region where x S yz and for y lt 0 we want the region where x 2 yz We obtain Fm f0 1116sz ygtdzdy 1 0 fmx yum To get the density The region AA is the triangular sector bounded by the lines as yz and x yz dz The coordinates of a point are x 2y and y The area of a differential is yidydz We obtain M3512 nyzyyydydz Then cancel d2 00 m2 mamyidt 1 Example 14 Let Z xX2 Y2 The region A is the circle 952 y2 S 22 If nyx y e z szz we have 7 271172 2 Fzz mag2 72 dr 17 57222012 2 gt 0 0 2 2 2 fzze 220 2 gt 0 This is aRayleigh distribution Sometimes it is helpful to introduce an auxiliary variable then integrate it out Suppose Z X Y Introduce the auxiliary variable W X Then the inverse func tions are straightforward to compute XWandYZW The Jacobian is a a 8x a J 871 7x7w 895 By The joint density is 1 fZWZw WfXYUa 210 Then the density of Z can be obtained by integration fzz 00 fXYwZwdw ECE 6010 Lecture 6 Basic Concepts of Random Processes Basic de nitions and concepts De nition 1 A random process or stochastic process on a probability space A f P is an indexed collection ofrandom variables Xm t E T each de ned on Q 7 P where T is an indexing set of real numbers 1 o lfT is a singleton one element then Xm t E T is a rv o lfT t1 t2 then Xtt E T is a bivariate rv o lfT consists ofa nite number of elements then Xh t E T is a random vector 0 lfT is countable then Xm t E T is a random sequence For most applications we think of t as time In some cases T is multidimensional Then Xt is called a random eld Three interpretations of a rp l A collection of waveforms that occur randomly That is it is de ned on some prob ability space For each 1 E 9 there is a corresponding waveform Xtw t E T as a function of t with 1 xed Think of having a big bag of waveforms We reach into the bag and pick out a waveform i a function of t i at random N A collection of random variables In this case that is for each xed t E T we have a random variable Xt 3 A realvalued function of two variables Xt Q X T A R De nition 2 A function Xtwt E T assumed by Xtt E T for a xed 1 E Q is called a realization of the process Also known as a sample function or sample path 1 A realization is just a function It does not exhibit the randomness De nition 3 lfT contains a continuum of values eg T R or T 0 1 then Xht E T is a continuoustime random process 1 De nition 4 If T contains only countably many values eg T Z T Z then Xh t E T is a discretetime random process 1 De nition 5 Let n be a positive integer and Xht E T a random process The set of ndimensional distributions of Xh t E T is a collection of all multivariable distributions of collections XM t2 th where t E The set of all ndimensional distributions for all orders n is called the set of nite dimensional distributions fdds of Xm t E T 1 We will assume this set completely characterizes the statistical distribution of the pro cess De nition 6 T is closed under addition if T1 T2 E T implies T1 T2 E T 1 De nition 7 Suppose T is closed under addition The random process Xh t E T is sta tionaryto order k iffor all t1 t2 tk E T the distribution of Xt1 h Xt2h thirh does not depend on 1 for h E T ECE 6010 Lecture 6 EBasic Concepts of Random Processes 2 If this is true for all orders k the rp is strictly stationary 11 Strict stationarity is a fairly strong condition and we don t necessarily need it always Example 1 Stationarity to order 1 means that F X2 is the same for every t E T Stationarity to order 2 means that FXt YXS depends only on the di erence between t and S 1 Ergodicity Assume throughout that Xt is stationary Loosely speaking a random process Xi is ergodic if time averages are equal to ensemble averages That is averages over 1 i expectations 7 are the same as averages over t That is ensemble averages are the same as sample averages Here is an example Suppose Xn is an iid sequence The ensemble mean is M f XwPdw The sample mean is By the SLLN we have 1 n E Z XW A p n k1 This is an example of an ergodic property Means and Autocorrelations De nition 8 The mean function ofa rp Xht E T is Mx EiXt W T 1 De nition 9 The autocorrelation function ofa rp Xh t E T is Rxa s ElXth t s e Ta 1 De nition 10 A random process is second order if lt 00 for all t E T 1 For a second order rp pxtl lt gt0 and lRXt sl lt 00 for all t s E T Properties of Autocorrelation functions 1 RXt t This is the second moment 2 lRXtsl2 S RXt tRX s s Schwartz inequality 3 RXt s RXs t symmetric ECE 6010 Lecture 6 EBasic Concepts of Random Processes 3 Wide sense stationarity De nition 11 Let T be closed under addition A second order random process Xh t E T is said to be widesense stationary WSS if MEG is a constant pr t pm and RXt h t depends only on 1 for all t h E T 1 Since RX t h t depends only on 1 we write by an abuse of notation RXt m E RXh Thus Rxts Rxt 7 30 Rxt 7 S If a random process is second order and strictly stationary it must also be WSS On the other hand if a process is WS S it is not necessarily strictly stationary De nition 12 The autocovariance function of a random process Xh t E T is CXts COVXtXs tsE T 1 We say a process is covariant stationary if C X t 5 depends only ont E s or equiv alently C X t h t depends only on 1 Properties of RX for WSS rps l RX0 independent oft 2 leTl EX3TEX3 Rxlt0gt 3 RXT RX7T even function 4 A de ning property of these functions M M akaij k 7 t1 2 0 a H ll H forallt1tn ETandalla1an andforallnE Z5 Any function with this property is a nonnegative de nite function Before proceeding with more properties a few examples A Sinusoidal Process Let T R Assume A and 9 are independent rvs with EA2 lt gt0 and 9 1E7r 7r De ne Xm t E T by Xi Asinw0t 9 where we is a known constant A typical realization is a sinusoid This is an example of a deterministic random process that is a random process deter mined by random parameters MXt EA sinw0t 9 EAEsinw0t 9 sinw0t 95119 0 ECE 6010 Lecture 6 7Basic Concepts of Random Processes 4 We observe that this process is ergodic in the mean 7 a time average is equal to the en semble average RX t s EXth EA2 sinw0t 9 sinwos 9 EA2Esinw0t 9 sinwos 9 1 7r cosw0t 7 3 cosw0 t s 29 d9 EA2227r EM w w 7g We observe that RXt 5 depends only on the time difference t 7 3 Hence the rp is WSS Let 739 t7 s We can write cosw0739 RX7 EA2 Checking the properties observe that we have a local maximum at 739 0 and that the function is symmetric The Homogeneous Poisson Counting Process Let T 0 0 Suppose events occur randomly in time in the following fashion 1 The number of events occurring in nonoverlapping intervals of time are independent 2 The probability of one event exactly in any interval of length At is equal to At 0At for At suf ciently small 0At 7gt 0 as At 7 0 At That is 0At is the generic term for terms of order higher than At Also the probability of more than one event occurring during an interval of length At 0 At Now de ne a rp Xt t E T by Xt as the number of events occurring in the interval 0 t Then Xt has the following properties 1 Xt 7 X5 is Poisson with parameter t 7 s 7 1c 7W4 Hamp7ampm L Lgi 2 Xt1 7 X51 and Xv 7 X52 are independent rvs for all nonoverlapping intervals 31 t1 and 32 t2 The parameter A is called the rate of Xt Property 2 follows from the rst assumption We say that such a process has independent increments Such a process is called a Poisson counting process PCP with rate These two prop erties complete determine a PCP All nitedim ensional distributions fdds of the process can be determined from these two properties How do we show the Poisson distribution property Pick t gt s 2 0 Let pkt s Prexactly k occurrencesin 3 t ECE 6010 Lecture 6 7Basic Concepts of Random Processes 5 for k 2 0 Then pkt At 3 Prk occurrences in 3 t At Prk occurrences in 3 t Pr0 in t t At Prk 7 1 occurrences in 3 t Prl occurrence in t t At Prfewer than k 7 1 occurrences in 3 t Prall the rest By assumption 2 pkt At 3 pkt317 AAt 0At pk1t 3At 0At 0At AtAPk1tas 1014123 PW 3 0At NOW a t At t 7 pk 3 Pk 3 5W 3 7 111130 At 0At At WHO 3 710W 3 A1130 gm s 7 Alma s 7 wt Sn t2 st Now p1t 3 0 When k Owe get 3 t gt7A t gt atpo 3 i 100 37 p0t 3 C3e t We have another boundary condition po 3 3 1 giving pears 57At73 Now we could proceed solve the set of equations for k 1 2 For example when k 1 8 510103 3 M1000 3 p1ta 3 This could be solved eg using Laplace transforms In general we would nd 7W4 7 1c pkt3w k01 tgt320 As stated the properties allow us to nd all nite dimensional distributions For exam ple suppose we want to nd the joint distribution of X M and th for t1 lt t2 PXt1 239th j PXt1 239 th 7 th j 7 PXt1 2713th Xti J39ri t1le 1 t2 7 t1j e 2quot1 239 j 7 239 where the factorization occurs because of independent increments Draw a typical sample path The process is called homogeneous because the rate at which the events occur does not depend on t Let us work out the mean and autocorrelation functions Xt Elth ElXt X0 t ECE 6010 Lecture 6 EBasic Concepts of Random Processes 6 Poisson Assume t gt s Eixtxsi Ewe E XsgtXsi Ein EiXt E Xlelel Ele Wt SllASl A3 MW 2ts s and ift lt s 2ts t This process is not WSS The mean is not constant and the autocorrelation is not a function of the time difference Now create a function Z Xtm E Xt for some xed At The random process Z is WSS The increase in the number of counts over some xed interval does not depend on the time We could create an inhomogeneous Poisson if the probability of an occurrence in the interval t t At is AtAt 0At Then t Xt 7 X2 Poisson with rate Ax doc 0 Gaussian Random Processes De nition 13 The Gaussian random process is a rp all of whose nite dimensional distributions are Gaussian That is if Xht E T is a Gaussian rp then for all n E Z and all sample instants t1 t2 tn the random vector th th is a multidimensional Gaussian distribution 1 For GRP the entire distribution is completely determined by the mean MXUl Xtn and the covariance COVXt1Xt1 COVXt1Xt2 COVXt1th COVXtHXt1 COVthXt2 COVthth which are determined by pX t and RX t 3 That is the entire distribution is determined by just the rst two moments It follows therefore that a WSS Gaussian process is also strictly stationary More properties and de nitions Recall that a secondorder process is WSS if its mean function is constant and its autocor relation function depends only on the difference of its arguments De nition 14 Suppose T R The power spectral density or spectrum of a WSS process Xm t E T is de ned as SX fRX ECE 6010 Lecture 6 iBasic Concepts of Random Processes 7 That is the PSD is the Fourier transform of the autocorrelation function 00 SXw e lWTRXU39 d7 7 00 assuming the transform exists Suppose T Z The power spectral density or spectrum is SX fRX where the discretetime Fourier transform is used i0 eizwk RX k k7ocgt assuming the transform exists 1 A suf cientcondition for the existence of SX is that 0 lRX ldr lt 00 or 27 lRX lt 0 Example 2 Suppose RXT 025 l7l Then 7 2amp72 7 Q 2 Such a process is called a widesense Markov process Comment on changes with A GRP with this spectrum is called an OrsteinUhlenbeck process It is sometimes used as a model for wideband eg nearly white noise which has been lowpass ltered 11 SXw Example 3 The Ideal Low Pass Process Suppose So lwl lt we 3 X60 0 otherwise This is a model for wideband noise in the passband of a system of interest S RXT owe Sinw0739 7r 10739 Observe from the sinc function that there are delays where the signals are uncorrelated wwo 27rw0 etc If the signal were Gaussian it would also be independent 1 Example 4 Let RXk 027 for M lt landk E Z Then 0217 7 2 172rcosw 7 2 SXw Properties of Spectra 1 Symmetry SXQU 30 COSWTRXTdr T R 227 COSkwRXk T Z ECE 6010 Lecture 6 iBasic Concepts of Random Processes 8 2 SXw SX7w SXw S w 3 Inverse 1 00 RXT eWTSXwdw TlR 7r 700 1 00 g 100 coswTSXw dw RX7 emsxmdw T 2 7r 7 2i coswTSXwdw 7r 700 4 HT R 1 00 Mo Elel g mm ifT Z2 RX0 if SXMCIM Lquot SX w 2 0 for all 2 This follows from the nonnegative de niteness of RX Any symmetric nonnegative de nite function having a nite integral is a legitimate spectral density Observe that being nnd and nite integral is analogous to a probability density so that makes RX analogous to a characteristic function Cases when R XT does not have a transform Example 5 Recall the random sinusoid had RXT cosw0739 This is a periodic function As a result it can be described using a Fourier series However if we restrict ourselves to conventional functions as opposed to 6 functions there is no Fourier transform 1 We examine this and other cases that are WSS but do not have a Fourier transform in the conventional sense 1 Suppose T R Then RX 739 is continuous and is the autocorrelation function of a WSS rp if and only ifthere is a cdf GX satisfying GXb 17 GXb such that RXrRX0 I eleGXQu This transform is called the FourierStieltjes transform 2 Suppose T Z Then RXk is the autocorrelation function of a WSS rp if and only if there exists a GX satisfying GX b 1 7 GX b such that 12XkILK0i7r ewkdaxw ECE 6010 Lecture 6 EBasic Concepts of Random Processes 9 Thus the acf acts like a characteristic function but also has symmetry If S X 1 exists then Gm m sawammo In this case SXw27rRX0 is in fact a pdf More generally 27rRX 0GX is a spectral distribution of Xi Example 6 Random sinusoid RXTRX0 coswo739 1 e WdGXw where 1 02M gm 20 uw 7 W rightcontinuous 1 Joint properties of Two Random Processes Suppose we have two rps Xht E T Y t E T De nition 15 The cross correlation function is RXYtS Eithl Properties of the cross correlation function 1 RXyt s Ryxs t symmetry 2 lRXyOE S S Rxt tRys s Schwartz inequality 3 iRXYtSi S lPXtat Iii33 De nition 16 The random processes Xt and Yt are orthogonal if RXYt s 0 for all 313 E T 1 De nition 17 The crosscovariance function of X t and Yt is CXyt s COVXt 1 De nition 18 The random processes X t and Yt are jointly widesense stationary if MX t and Myt are constant and nyth t is independent oft for all t E T and for all h E T In this case we write RXyt h t RXyh 1 Properties of RXyh forjointly WSS l RXy0 Ryx0 2 RXyUt Ryx7h 3 IfX andY are individuallyWSS then RXYM S xRX0Ry0 and nym g lPX0 RY0l ECE 6010 Lecture 6 EBasic Concepts of Random Processes 10 De nition 19 If Xt and Yt are jointly WSS random processes the cross power spectrum is de ned as 30 e WTRXyU39 d7 T R SXYQU flRXY7 l 2700 eEzkaXYU T Z Properties of Spectra l SXyw Sinw 2 If Xt and Yt are individually and jointly WSS then SXYltltUgtj2 S SxwSyw 3 ReSXyw ReSXy7w and lmSXyw 7 Uncorrelated and independent De nition 20 Two random processes are uncorrelated if CXyt S 0 for all s t E T 1 De nition 21 The random processes Xt and Yt are independent if XM th and Y5 Ysm are independent random vectors for all 7177thJr andallthtn31smET 1 Note Independence implies uncorrelated The converse is not true De nition 22 The random processes Xt and Yt arejointly Gaussian if Xt1 th Y51 Ysm is a Gaussian random vector for all n m E Z and all t1 tn 31 Sm E For jointly Gaussian random processes we can characterize by a mean vector and a covariance matrix All fdds are determined by MEG pyt RXt s Ryts and RXYOZ 3 For this case it is true that uncorrelated implies independence ECE 6010 Lecture 3 Random Vectors Grimmet amp Stirzaker Section 49 Random Vectors Random vectors are an extension of the bivariate random variables n rvs 1 2 n e ne a measurable mapping from an underlying sample space 9 f to OR Bquot Where Bquot is the smallest a eld containing all sets of the form 951962awafn1l11 lt961 b13a2ltx2 52 aanltxn bn De nition 1 Thejoint distribution of X1 X7 is PXlXQWXJB Pw E Q X1wX2w X w E B for all B E B This probability is denoted as PXB 1 De nition 2 The joint cumulative distribution function cdf if PXiXQVVVXna1a2 a PX1 S 11 Xn 3 an Fxa a E Rquot De nition 3 The joint probability mass function pm f is PX3 PX1 aiawaXn an 1 De nition 4 The joint probability density function pdf fxa is the function that satis es a a 71 a1 Fxa fxxdx1dx2 5195 for a continuous random vector 1 Fact X1X2 Xn are independent if FX or px or fx factor into products of marginals Suppose g OR Bquot A OR B is measurable Then 9X1 Xn is a random variable Law of unconscious statistician EgX1 Xn fgx1 znfxxdx continuous 2EmmeimmodesmeMast dlscrete Covariance Suppose X 9 A R B andY 9 A Rquot Bm that is X and Y are random vectors of dimension n and m respectively De nitionS covX Y E EXY E EYT E RW quot Exy Where ECE 6010 Lecture 3 iRandom Vectors 2 1 Note X is frequently used as a symbol to denote covariance It should not be confused with a summation sign and is usually clear from context Property covXY covY XT lfAisk gtlt nandBisl gtltmanda lRlC andbElthhen MAX 3 BY b AEXyBT covX X XIX is called the covariance of X It is a symmetric matrix non negative de nite or positive semide nite and thus has all nonnegative eigenvalues If X1 X2 X7 are mutually uncorrelated then 2X diagwi 0 t t t at Where 0 varXC Suppose we partition X of n dimensions as KO X lxwl of k and n 7 k elements respectively let uEWF Where it EX1 and M2 EX2 Similarly X111 X112 1 X i221 X22 Where 211 COVX1X1gt 212 COVX1gt X2 222 COVX2gt X2 or in general V 21139 COVX1gt Xm Characteristic functions De nition 6 The characteristic function of an ndim ensional random vector X is de ned as mwm m Where u E Rquot 1 As before this is just an ndimensional Fourier transform De nition 7 X is a Gaussian random vector with parameters M and X if T 1 T gtXu exp2u M 7 in 211 We write X N NW 2 1 Properties of Gaussian random vectors 1 EX p 2 X1 X2 X7 independent if and only ifX is a diagonal matrix ECE 6010 Lecture 3 iRandom Vectors 3 3 4 Lquot lfY AX b then Y is also Gaussian Y N NAt b AEAT Linear functions of Gaussians are Gaussians Said another way Family of Gaussians closed under af ne transformations Suppose X is positive de nite Then it can be factored as 2 COT where C is an n X n invertible lowertriangular matrix This factorization is called the Cholesky factorization This is essentially a matrix square root Suppose X N Nt E with X pd LetY C 1X 7 u ThenY is normal with p 0 and X I This process of diagonalizing the covariance matrix is called whitening We say that uncorrelated iid components are white leI gt 0 ie pd then X is a continuous rv with fxX W exp irax WTE RX W where 2 det2 product of eigenvalues Important Suppose X N Nt E with X gt 0 Partition X KO lxwl where X0 has k elements It turns out that X0 is also Gaussian How could we easily show this Let us partition 1 MV 212 Ma 222 211 1 i221 Then X0 N NM1211 X0 N NW0 X122 Consider X0 conditioned on X 1 X1I fXX 2 1 fXlt2gtXlt1X lX fxmwm Then it can be shown that w WNwU where M M0 2212131061 7 0 2 X22 X12121711212 This is smaller than 222 Discuss implications Draw pictures Note For a Gaussian vector the conditional density is Gaussian ECE 6010 Lecture 3 7Random Vectors 4 An Application MMSE Prediction Suppose we have a random sequence X1 X2 i Xn and we observe the rst n 7 1 of them X1 I1X2 962awaXn71 xn71 Given this data we want to predict the value of X 7 Our estimate of 95 will be denoted as 5 Clearly it could be a function of all the observed data 557 M951 952 l l l 95771 for some function h Rquot 1 7gt R One thing we could try is to minimize the average of 957 7 hx1uixn12 That is we would like to solve mhinEKXn h951 962 In712i It is easy to see HW that the best such 1 is M951 952 l l l yon EXW X1 x1 X2 952 l l l Xn1 95771 That is the best estimator in a minimum meansquared error sense is the conditional expectation Now let us take a speci c distribution Suppose X N Nt E and partition according to X1 X2 X7171 X71 Given X1 x1 i i Xn1 9574 the variable X7 is N 02 where 961 i 1 962 i 2 M Mn Emn712l1 96n71 Mn71 2 7 2 71 T 0 i 0n E xn 12n712nm71 E 2H 25m Ema 07 where and 27171 COViX1 aXL71Jii En kl COVX7 X1 covXn X1 l l l covXn X 1l So M is the conditional mean that we want and 02 is the variance of the conditional distri bution 72 varXn X1 l l l Xn71 EX 7 Mf Xl 951 l l lXn1 95771 This is the minimum meansquared error WSE Notationally write T 71 a Eanlz ili ECE 6010 Lecture 3 iRandom Vectors 5 Then 961 i 1 in M aT 39 967k 1 Mn7 1 This is just a digital lter We can also show that 02 S 03 so that 39 decreases our uncertainty Note For a Gaussian rv the MNISE estimator is linear from Estimation in a Markov model Suppose that PanXn1Xn2i HX2X1 PanXn1 That is given Xn1 X7 is independent of X1 X2 i i Xnd This is actually quite common it doesn tmatter how you got to Where you came from only Where you came from Such a model is called a Markov model Under the assumption of a Markov model Eanlea Xnill Eanan71 and COVX7LX7L1 va Xnil x7171 7171 567 M This can be written as an M pXnXn1varXn 94 7 tn1gtvarXn1gt and 72 varXnl 7 p2Xn X 1i

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.