Basic Statistical Meth
Basic Statistical Meth ISYE 2028
Popular in Course
Popular in Industrial Engineering
This 0 page Class Notes was uploaded by Maryse Thiel on Monday November 2, 2015. The Class Notes belongs to ISYE 2028 at Georgia Institute of Technology - Main Campus taught by Kobi Abayomi in Fall. Since its upload, it has received 15 views. For similar materials see /class/234198/isye-2028-georgia-institute-of-technology-main-campus in Industrial Engineering at Georgia Institute of Technology - Main Campus.
Reviews for Basic Statistical Meth
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 11/02/15
ISYE 2028 A and B Lecture 8 Kobi Abayomi March 25 2009 1 Independent Random Variables Two random variables are independent if rum96711 mamy 1 01 Jen9671 fX95fYl 2 This is directly analogous to the general probability rules The conditional probability mass and density functions7 are then just pxyz7 y mamy pX YX Y my my W i fXY7l i fXWWill 7 fX YX Y My My M Dependence is any Violation of this condition 11 Example Let fX1X21z2 1 x2 10ltm1lt10ltm2lt1 Are 1 and 2 independent 1 Well 1 f11 f172d2 1 2d2 0 39 10Sm1lt1 1 f22 f172d1 1 2d1 0 2 39 10Sm2lt1 But 1 2 31 1 12z2 12 The answer is no 12 Independence is factorization of the pdf ln general7 for X17X2 fx1z2 if f12 g1hx2 this implies that X1 is indepen dent of X2 This is not a formal proof We know that we can always write a joint pdf as product of a conditional and marginal fX1X2172 fXZiX12l1fX11 lfthe functional form of fX2 X1x2l1 does not include depend 1 say fXZ X1z2l1 hz2 then integrate both sides over 1 fX1X2172d1 fX2iX12l1fX11d1 which yields ram hm fX1951d951 and of course fX2952 h952 fX2X1952l951 1But then7 What is these days Thus7 independence of X2 and X1 is equivalent to the factorability of the joint distribution The de nitions for independence are straightforward Often we can exploit the strong as sumption of independence to simplify modelling7 and get interesting results 13 Example In n m independent trials7 each with probability p of success let X E the successes in the 15 the rst n trials let Y E the successes in the nal m trials Then RX 7 x Y 7 y 7 Cipm17 p mO py1 7 pYH and it is apparent that X and Y are independent What about Z X Y Are Z and X independent llDXzZz llDXzYz7 7 03pm 7 p mCimpZ 1 7 pm z which implies that X and Z are not independent 2 Mutual Independence Here7s an example of mutual independence Let 972 67myz10ltmyzltoo The cumulative distribution7 then7 is z y m Fyz e wdudvdw 0 0 0 17 e m1 7 e y1 7 e The joint distribution is completely factorable 7 X7 Y and Z are mutually independent In general for X17 Xn we say they are mutually independent if IPX1 gauge gm HMXi gm 3 21 Expectations of independent random variables Recall our result for expectations of sums of random variables EX1 X ZEX This result holds regardless ofthe dependence structure ofthe joint distribution u7 For a product of random variables only holds for independence 22 Bivariate independence does not imply mutual independence Asserting X17 Xn are mutually independent implies that any Xi is independent of X72 This strongest condition requires that the joint distribution be completely factorable m1 WM Hmzi But bivariate independence 7 Xi independent of Xj for any 2397j 7 does not imply mutual independence Example Let f17273 14 39 1100010001111 2as well as independence of subsets of greater length 4 Which implies fijWiJj 1439100100111 and 12391mi01 Thus fiifjxj 14 7 any Xi and X are independent7 BUT fiifjjfkk 18 31 14 So there is not complete mutual independence 3 Dist for Sums of independent Random Variables Let X L Y Calculate the distribution of Z X Y myngxfyydzdy by the independence of X and Y fxltzgtfyltygtdzdy Setting X Z 7 Y 00 Fx2 7 Mfg96111 4 yields the equation for the cumulative density function cdf of Z 7 this is the generalform for the edf 0f the sum of two independent random variables Taking a derivative with respect to 2 yields szltzgt f FXlt2 7 ygtfyltygtdy m2 7 ygtfyltygtdy lt5 00 the equation for the probability density function pdf of Z 7 this is the generalform for the pdf of the sum of two independent random variables 31 Example LetXU01YU01 anXmY Thenthe pdfforZXYis Z7 0 S 2 S 1 fZz 2 2 1 lt 2 S 2 0 aw 32 Example Let X N Na 1x7 Y N P 1AX 1 Y and Z X Y Then X is the distribution of wait time for 04 Poisson events7 each with parameter A Recall that if X N Na 1A the pdf for X is It turns out 7 sketched in class 7 that the pdf for Z X Y is Aa e7Azza 71 fz2 W In words7 the sum of two independent Gammas with the same scale parameter is also Gamma distributed ln general7 for Xi Nazi 1A3 k Z X N m 041 1 3You need to know only the result on sums of Gammas from this section 4 Joint Distributions of Functions of a Random Vari able We7ve seen the univariate case4 let7s state it explicitly Take X N FX7 FX known What is the distribution fy if Y gX7 with 9 some invertible bijection from the space of X to the space of Y Well Thus dFX dFx9 1y fx9 1y dlg 1yl 41 Example Let X N fX 2z10ltwlt1 Let Y 8X3 So Y E 08 What we do in general is to use the known distribution of X to generate the distribution of Y The event a lt Y lt b on the space of Y the image of the transformation is the same as the event gal3 lt X lt blB on the space of X the pre image of the transformation So ll a lt Y lt b mgr3 lt X lt blSp Which is just the integral on those limits of the pdf of X b132 2zd Qua2 We change the variable from X to Y in the integral7 to get a statement using the pdf of X but in terms of Y z g 1 ylS which implies if g Ky Bf35 4refer past lectures where we investigated the distribution of Y X2 when X N U7l7 1 5Aside The derivative of the inverse of a function is f df 1z W 7 This yields ylS 1 l7 ll a lt Y lt b mixm d 2 y Here7 the pdf fyy llDY E y i 6 is the integrand fyy ml3 ll40ltylt8 More generally ma lt y lt b 1 fg 1ylldg 1lldy d 7 71 i 71 fyy f9 ylldy9 ll Remark 1 We found the pdf7 fy7 by changing the variable on the de nite integral for the probability g 1a lt X lt gquotlb7 with ab on the space of Y 2 We really only needed i the conditions for which fy gt 0 and ii the integrand for ll a lt Y lt b These constraints require Hit9 1 which is always non negative So there are two conditions for the change of variable formula 1 Verify that Y gX is a bijection and 2 take the absolute value of the rate of change of the transformation This two d rules yield the univariate formula for the pdf of a function gX fyy fg 1ylyg 1l ln general7 we call Hit9 1 the absolute value of the Jacobian 7 its the rate of change of the pdf under the transformation The absolute value is necessary so that fy gt 0 42 Example Let fXx 10ltmlt1 and Y gX 72lnX The inverse is g 1y e yZ Here dg 1y i e yZ lt 07 so it is important to remember to take the absolute value Thus My fx9 1ylJl fXeey2 572M 1 1 eiyZ 2 1 Eeiy2 5 in two dimensions The extension of the general formula to the multivariate case is as natural as the extension of the derivative in multivariate calc to vector valued functions In two dimensions7 let X 93 be a vector valued invertible function a function that maps X17X2 to Write 917104756 X1 and 92401756 X2 be the component wise inverse of 97 and thus dwwhyz dl l 391 391 321 312 39271 39271 321 312 is J7 or the Jacobian for the transformation 9 1 7 the determinant of the matrix of deriva tives The full equation for the pdf of the transformation X Y17Y2 95 9X17X2 is7 for X17X2 N fX1X23 fan 91112 fX1X29 11179 1yz M 6 where lJl is the absolute value of the Jacobian NB 1 de ne 391 391 J J 3y 322 l kl l l 3921 3921 3m 322 51 Example Let XiiX2 N fX1X2 3910ltmim2lt1 and let Y1 91X1iX2 X1 X2 and Y2 92X1iX2 X1 7X2 This implies gfl X1 X g YZ and g1 X2 Xig YZ up Figure 1 Illustration of transformation Y1 91X1X2 X1 X2 and Y2 92X1X2 X1 7 X2 Note the boundaries of gures A and B Then NIH ch NI H NIHNIH NIH Thus inAa 1 H1amp265 Where 8 0 lt yl lt 2 71 lt yg lt 1 With the constraints illustrated in gure 1 10 The extension to higher dimensions is natural7 direct Let X17X2 ExpOx thus zhzg E 000 gtlt 000 and g Ae AXi Let the transformation be Y1 and Y2 X1 X27 thus yhyg E 01 gtlt 000 Then 1 911117112 91 3992 and 2 9 1yi7yz 1121 11 And 12 11 m l 1 12 11 If A 17 then beXZ e er and thus fy1y2y1y2 yg 6722 1Olty2ltoo 39 10lty1lt139 This implies Y1 1 Y2 Y2 1127 1 and Y1 U01 6 Order Statistics Now we come again to statistics6 in this class moving focus from the random processes variables which generate data7 to drawing inference on a supposed process The order statistics are ordered values of a random process Here7 the order statistics are not technically observed values so we write them with big X7s instead of little z7s Let X17 Xn be independent and identically distributed random variables 7 the X7s are iid7 we say The ordered values7 written X1Xn with X1 E minX17 Xn7 XW E maxX17 Xn and XW the kth largest value in X17 Xn That is7 for X1X2n1 Xltn1 is the median 7 Distributions for Order Statistics As a rst example7 let X17X2 fX1X2z12 with X1 1 X2 6A statistic is any function of observed data In our language7 in this class7 observed data are samples 1111171 from a random process X 71 n 2Distribution of XW Let XW maxX1X2 X1 V X2 Then7 from rst principles FXQ V X2 S S 7X2 S FX1X27 gt FX1 39 lf X17X2 le fX2 per usual the derivative yields the pdf dFX fX1FX2 FX1fX2 me if X17X2 are iid me 2 fX FX 72 n 2 Distribution of XG Now let X1 mmX1X2 X1 X2 Again7 from rst principles FXWQ llDX1 S s lP X1 lt x U X2 3 FX1 FX2 FX1X2 lf X17X2 are iid FX then FX195 FX FX FX FX so the pdf is dFX1 fX fX fXFX FXfX fX1 FX fX1 FX 2 39 fX1 FX Recall these when we look at general 717 below 12 8 Distributions of Order Statistics 81 Distribution of maximum and minimum Starting with the pdf for the maximum XW fXW lP 1 0f X1L7 E z i gall else lt m fX marl n 7139 leFXlnil The pdf for the minimum X1 can be found similarly me llD10fX1Xnm z i gall else gt m fX may n 77 7 FXlnil 82 Full Joint distribution The joint distribution for the entire order statistics can be derived7 heuristically from what we know about pdf7s of transformations Let X17 Xn fXm d Then set Yk Xk the kth order statistic Then Yk is a trans formation Yk 9X17 It is easy to see that the determinant of the Jacobian 7 the matrix of partial derivatives of the form 7 Will be 1 For n 37 for instance7 say yl 392 x3793 37 one possible outcome The Jacobian is 1 on the diagonal and 0 elsewhere7 yielding a determinant of 1 But there are 3 possible arrangements7 so the pdf is fX1X2X3x1swag 3 fXz1 fXz2 fXz3 In general the result is fX1Xn nl fX961 fxim With the full joint pdf you can compute marginal densities for any one or several of the X1Xn by integrating out the remaining densities The pdf for any X me 39 l1 395ltjgtlquot 1 1 Fx96ltjgtl j fX95j The pdf for any XiXJ with 239 ltj is n lt09 23971gtIltj7z71gtxltn a7 39lFx9 ltigtli 1 39lFx9 Fxilquot i 1 39l1 Fx9 gtln fx96i 39fx96ltj There is an attractive heuristic for these pdfs The probability fXiXj is the multinomial probability FXzli 1 is the probability that 239 7 1 values are less than the 2th order statistic mi FXJ 7 FXzZj i 1 is the probability of the values between the 2th and jth order statistic 1 7 FX7 j is the probability of the values greater than the jth order statistic 83 Example distribution of median We have been calling E the median7 ie the value such that FXi 1 For X17 X2n1 2 random variable7 the median is XOR The distribution is 271 1 l n 7 n 7 n lt l 3me 1 117Fxltzn1gti2 1 1 fx95n1 f fXWJrl 9 Exchangeable Random Variables Random variables are called mchangeable if the probability distribution is invariant to per mutations7 ie if lt 17 7Xl39n lt in lt 17 lt in for any permutation 2391 The distribution of the sample average X Xi is exchangeable7 for example 7 the distribution does not depend on the order you see the items in the sum 10 The Moment Generating Function We7ve seen the mean and variance as functions of the pdf7s of random variables A gener alizing take on functions of the pdf are the so called Moment Generating Functions These functions7 for our purposes in this class7 are complete for the pdf in that there is a one to one correspondence between a Moment Generating Function MGF and the probability distribution That is7 in this class the MGF uniquely determines the pdf the pdf can be known directly from the pdf 11 The MGF and Moments Suppose7 3 an h gt 0 such that when t E 71 h we call Ele Xl 13 2 lt7 the Moment Generating Function for X The existence of the MGF7 in a sense7 in the band flui where etX is integrable When t 0 MX0 1 Note 0 Not every distribution has a MGF Eh gt 0 o The uniqueness of the MGF in this class7 guarantees if X N M and Y N M then X N Y 111 Example Say X is discrete with MGF 1 2 3 4 M tit i2 i3 4 X0 106106 106 106 15 The uniqueness of the MGF in this Class7 yields that fXz 1 112 Example Say X N Bcrp Then E6tX p6 17p60 pet 17p 113 Example Say X N B nnp Then EWX 7 Z E Kc pk 7 W 7 Z 02p6tk1 7 MM 7 pet 1 7 19 114 Example Say X N N01 Then 717m going to stop saying that now The reason that the MGF does not always uniquely determine the pdf relates to the condition on the integrability of the Ee X7 the existence of an h gt 0 A generalization of the MGF7 the so Callecl characteristic function Eei X7 is complete for the pdf 115 Example Say X N Exp Then E6tX emAe de 111 A A mdx 111 A 7 7 1 TAitili t with 116 Moments The eponymous result is that we can obtain moments directly from MX Take the rst derivative of MXt d I aMx Mxt thme fmmfx This implies that Mtlt0 EX M Take another derivative d2 u fxzemfx WM 7 mm 7 2952mm which implies that Mtlt0 EX2 Thus 02 MH0 7 M 02 In general Mmtlt0 EXm the mth derivative of the MGF is the mth moment of X Take Notel tX 1 2 1 k e 1tX tX ytX from the Taylor expansion of etX about 0 Taking an expected value yields EM E1tX tX2 1ktXk l 17 M 0 M 0 MW 0 M07t t2 Myquot 1 2 my 2 m 1EXtt2uEX tm 1x 2 m so the coef cient of t in the Taylor series is the 12 Properties of the MGF 121 MGF on aX b If X N MXt and Y aX b then Mylttgt Maxbt E6aXbt E6aXtbt E6aXt6bt ethXat 122 Joint MGF IfX MX and Y N My7 the the Joint MGF of XY is MXy ix ty E 6txXtYY and the mm joint moment of X and Y is Emmy MWWOD 6mn M amtxanty XYio0 In general7 for X17 Xn7 the Joint MGF is MX Xn EkZLlnXi 1y 123 MGF of Sums of Independent Random Variables If X N MX and Y My with X l Y7 then M01X02Y MX01tX 39MY02tY and in general for random variables X17 Xn and constants cl an MZ10iXit H MXiCitXi Z If X17 Xn de7 then just write MZ10iXit H MXiCitX OI MEL t H MXiclt 1231 Example Say Y 211 Xi with Xi N Epr 1 then E MyHMX i1 1 04 illi tl This is the MGF for a Pa random variable 13 Joint Dist of and 82 Let X 702 7 with some unspeci ed distribution Let Y zz Xi and 2 T 2109 X2 We know from an earlier work that M and VarY By the Central Limit Theorem we know that lirnn Xn NW 77 2 But what about the distribution of 27 It turns out that we can write i M2 YY 7 2 n if XXX My since 2X 7 M 7 Y 0 Write W gig and W1 732 and W2 Then W W1 W2 and W1 l W2 So EletW EletwllEletWZ which is just MW MW1 Mw27 by independence But we know that W N xi becauseiit is the distribution of the sum of squared deviations We also know that W1 xi since X N Np702n and we standardize it to get W1 That implies that W2 xiil as it is independent7 and its MGF must be Mwl r2 1 717 2t MW 2 H n M W2 n712 which is xiil Just one last bit of algebra XXXFYVn 1 027171 n 71 W2 So the distribution of n 7 1 s2 is xiil 2 lSYE 2028 A and B Lecture 4 Dr Kobi Abayomi January 20 2009 1 Introduction Continuous Random Variables We call a random variable continuous if it has an uncountable number of values if it can take all values in an interval of values Examples of continuous random variables Survival time of drinkers of Smoke Cola Time to recidivism for parolee of Savings and Loan Scandal Amount of weight lost Etc Etc That the de nition of continuous closely matches the version we use in single variable calculus is natural and should make us feel good We can extend what we7ve said already about discrete random variables using 27s to say analogous things about continuous random variables using f7s Remember that the integral f is just the limit of E as the discrete index goes to be an in nitesimall 2 Probability Distribution of a Random Variable Let7s extend the de nition of the probability distribution to the continuous case by rst restating that the distribution is the complete speci cation of values of the random variable with assigned probabilities In the discrete case we could use this heuristic to write down a function or a table In the continuous case the distribution of the random variable is explicitly functional 1ln Leibnizls view of the calculus Now would be a good time to break out your Cale l textbook if you need to Here is an explicit de nition of a continuous probability distribution or probability density function pdf For X a continuous random variable the pdf of X is the function fX such that b ll a S X S b fzd 1 we call fx the density curve for X We can also restate some of our probability rules using this new de nition For X a continuous random variable on the real line with density function f 0 fix fudu is called the distribution function for X llDX S x or the probability that and random variable X is less than or equal to x o fzd Foo 7 F7oo 1 7 0 1 Pay attention to nuance here The distribution function of X is 1 at in nity Every value of X is less than or equal to in nity There is an analogous argument for F7oo 0 And I point out that the area under the density curve must equal 1 o For allXelioooo0 f 1 21 Example Say we have an interval A x 0 S x S 2 where we observe a real valued random variable X Say we believe the distribution function is of some form 0x2 with c a constant We can immediately determine 0 since F2 1 4c a c 14 MWampFW K Wwe n Then the probabilities for any interval for example PG F14 364 12 XLA1WHUD 22 Features The property of the complement yields FEPltXgt1PXS1F 2 1 7 is called the survival distribution for X The median for the discrete distribution is i stFx 05 3 For a continuous RV the median is a xst fudu 5 4 The median is a special case of a p tile or percentile of the distribution of X In general7 the pth p tile7 notated X07 is X0 xst f fudup 5 3 Functions using probability distributions Mean and Variance Here7 for continuous rv7s7 the mean7 n and variance7 02 are parameters which we calculate from the probability density function For continuous rv7 we weight each value by its probability f and use the integral for calculation 31 Population Mean Given a random variable X7 de ned on the real line R l 7 007 oo 7 the expectation is u zfltzgtdz EltXgt lt6 is the mean or empeetation or eapeeted value of X Example Say X N f 96 3 Then In general for any function7 h Example Say X N f 96 Then 2 2 E2X 23dz286 0 Additionally7 this equation 7 known as the layered representation 7 holds for non negative random variables EX R1 711mm 8 Example Say X N f 95 Then 32 Population Variance 02 EKX e m ltz 7 View varltXgt lt9 Again7 this can be reduced to 02 VarX EX2 7 o2 10 Example Say X N Wt Then VarX 02 7 862 gdz 89 4 General Joint Distributions Two given random variables X and Y have a general7 joint distribution that is an extension of the single variable de nition In the discrete case PX7 Y 9611 pmy 11 the joint probability mass function In the continuous case PX7Y ieyyi f967y 12 We generate the marginal distributions for X and Y alone just as we did for contingency tables by summing over all values of the other variable 19196 Edammzy 21196711 13 M96 roadway fltuygtdz lt14 Two random variables are independent if mmz y pxpyy 15 5 OI Jen9671 fX95fYl 16 This is directly analogous to the general probability rules in lectures 2 and 3 The conditional probability mass and density functions are then just rim967a mixMY pyy 17 i Jim9571 leYXlY fyy 18 5 Means and Variances of Linear Transforms General forms for Joint Distributions I may have said this already a linear transform is any aX b of X Look back in Lectures 1 and 3 for a refresh on summation notation and transforms This linear property of the Expectation holds for continuous random variables If X has mean am and a b are constants known then EaX b ax bfxdu axfpdu bfxdu aEX b 19 If Y has mean My and cd are constants known as well then a natural extension to the single variable general case is EaXbcYd aambcayd 20 Same as the discrete case Notice the variance for the general linear transform aX b VaraX b a2EX2 7 of aZVarX 21 SinceE apb2 7 E axb 2a2EX2 2ab mb27a2 1272M mibz M M M Again same as the discrete case 6 Expectation and Covariance The main idea is that expectation is a linear operator and that the expectations of a function is the expectation taken over the values of the function In a natural extension to two dimensions 7 2 Ex 979P7y Ly discrete EltgltX7Ygtgt i ff99 7yf7ydxdy Ly cont 61 Example We get the moments we use in calculation of mean and variance7 etc by choosing the function we take an expectation of 9196 7 I 7 E91X 7 MX WWW W S E92X7Y EXY My 7 y MHZ 7 E93Y 7 MY 7 My 7 VarY 7 Covariance Let gX7 Y X 7 MXHY 7 My Then E9X7Y ElX MXllY Myl EXY 7 MIX 7 MXY MXMY 7 EXY 7 MYEW 7 MXEY quy EXY 7 quy This expectation has a special narne7 the covariance of X7 Y So the covariance of XY is 00109 Y 7 ElX 7 WHY 7 Ml 7 EXY 7 MXMY 22 71 Properties of Covariance 711 Covariance can be negative CovX7 Y E R Note that VarX 2 0 712 Independence implies zero Covariance If X 1 Y then EXY MXMy MXMY MXMY 0 but 713 Zero Covariance does not imply independence The fact here is CUUXY 95gt X 1 Y For an example7 take X7 Y with this distribution Thus EX 0 and EXY 0 but Y is obviously a function of X In general7 for many Y gX7 where g is symmetric about zero7 for instance7 CovX7 Y 0 but X is 7 of course 7 not independent of Y gX 714 Covariance is symmetric CauX7 Y CorY7 X 715 CUUXX VarX Com x mm 7 MXHX e M varltXgt 72 Covariance of linear transforms Covaz b Y aCovX Y The veri cation of this is an exercise 73 Covariance of sum is sum of covariance This is a generalization of the above CorE X E E E CUUXYj i j i j The veri cation of this is an exercise too 74 Variance of a sum of nonindependent random variables Recall if X1 Xn MXU are independent then VaTZiX VarX But say X1 Xn are niot independent ie CUUXXj 31 0 then VarZ X Cam X X Z CmX X Z CmX X Z CmX X i we mm X Z VarX 2 Z CauX X ilt7 M and for a linear transformation laid llX b Z agVaMXi 2 Z aiajCovXXj ilt7 8 Correlation Coef cient The number p which we introduced as a parameter to the multivariate normal distribution is called the correlation coe cient OovX Y EX lelY MYl 23 W xElXiuxl2ElYml2 Fact a version of the Cauchy Schwarz inequality lElX MmllY Mle S ElX Mxl2ElYml2 so p 6 711 Notice EXY MXMY pUXUY since 000X Y pUXUy 81 Properties of p Let X Y N ny with the conditional distribution of Y X x fy X Then EmX z ymxdy f yfmdy m Remember that the expected value of Y given X z is a random variable depending upon the observed value of X x Say this expected value is a linear function set EYlX m a bx Call this equation E EYlX s a bx Let7s derive a general result for the conditional expectation when it is constrained to be a linear function ie let7s solve for constants a and b If we integrate both sides of this equation with respect to dz we get 10 MyabMX Now7 integrate z both sides with respect to dz this yields EXY aux bEX2 Realizing that EX2 0 5 with the two above equations for the two unknowns7 a and b7 the result is Mix 7 z 7 MY pj7ltX 7 m lt24 NB This is the same as the conditional expectation for the bivariate normal distribution This suggests a role for the normal distribution in linear conditional expectation Notice that in equation 24 the expectation is simply My if p 0 For the bivariate normal distribution7 this is equivalent to X L Y Moreover7 if Y aX b then CovX7 Y i 00 and 00 00 p 71sgna 1U a2U lal0 82 Variance Again This is important for a general equation for variance of linear transforms aX bY VaraX bY EaX bY2 7 EaX bY2 E02X2 abXY bZYZ 7 aux bpylz E02X2 Eb2Y2 20bEXY 7 02ng2 7 20pr 7 bZMyz 02EX2 i 1sz b2EY2 7 way 2abEXY 7 WW a VaraX bY aZVarX bZVarY 2abCovXY 25 9 Canonical Continuous Random Variables We use continuous random variables in situations where we have uncountably many values like X E Gross National Product of a nation or where all values on an interval are available like weight of a gnat in decigrams2 Continuous random variables just like discrete random variables are completely determined by the probability distribution Remember for discrete random variables we could write llDX k for the probability that random variable X equal number k With continuous random variables llDX k is always equal to zeros A probability density function for a continuous random variable will often look something like this r some function of z dependent upon the parameters When we use continuous random variables as models we need to state the Capital letter for the random variable and its distribution with the associated parameters We7ve talked implicitly about the Normal distribution we7ll continue with these canonical distributions in Lecture 7 For now 10 Uniform Distribution A random variable is said to be uniformly distributed over the interval ab if the pdf is fz5j1 altxltb 26 OW Notice that b l llDa X b Edd This model is used where it is reasonable to assume that the probability of X is proportional to the length of the interval 2Both these situations are mathematically equivalent even the smallest intervals have uncountably many valuesi You can make 01 look as big as 01000 or 0ooll if you take small enough partitions 3Computer packages that return nonzero numbers for lP X 16 really mean lP X k i e lP Uc 7 e S X S k e e is very small So what the computer is returning is the height of the density curve at X k 12 lSYE 2028 A and B Lecture 6 Canonical Continuous Random Variables and some brief results Dr Kobi Abayomi February 10 2009 1 Introduction The Normal Distribution Normal Data Remember that continuous data is data that can assume an uncountable number of values We cannot list in a frcqucncy tablc7 for example the possible values of a continuous variach 7 the list would be in nitely long If you think for just a second you can probably that7s supposed to be funny convince yourself that the relative frequency of any one value of a continuous variable is very low when there are many observations For a continuous variable we only determine the rclatiuc frcqucncy for a range of values Remember the histogram and how we bin values for discrete distributions 11 What s normal Let7s use a toy example say we have the discrete distribution of clown shoe sizes Lets say this is the distribution Shoe Size Count Relative Frequency Cumulative Relative Frequency 17 1 17 17 18 1 17 27 19 1 17 37 20 2 27 57 22 1 17 67 24 1 17 77 These data are discrete and that the relative frequency sums to one Let7s look at an illustration of the distribution of this data BUD DDS BTU UTE BED DES Is 19 Figure 1 Histogram of observed shoe size Remember that the area of the histogram sums to one when the y axis is density 2 30 3 14 z 1 Notice how the observed data are binned For this example7 well use R xlt c22241917201820 histxbreaks7colquotredquotdensity20mainquotHistogram of Observed Shoe SizequotfreqFALSE meanx meanx 2 The mean of our observed data7 the shoe sizes is M 20 What if7 for some reason indulge this example7 we could only observe some of the data never all of it at once Say we can take only four observations at a time 2 For example say we observe x1 22 19 18 20 An estimate of the mean 7 or sample mean E n 1 E x 7 based on this subsample is El 1975 If we could not observe all of the data at once we might want to resample again and again and get E2Egf4etcetc It is reasonable to expect the mean of these resampled means to be very close to E 20 As we take more and more resamples we can say that almost surely the mean of the sample means is 20 Lets illustrate this using R samplesizelt 4 Here I m going to take a sample of size 4 from the Clown shoe sizes numsampleslt 12500 Here I m going to take a lot of these samples Over and over and xmatrixlt matrix O nrownumsamples ncolsamplesize first a place to put all of these samples so we can compute means and graph the results fori in 1numsamples xmatrixilt samplexsamplesize a simple loop to take the sample and put it in our new data table histapplyxmatrix1mean breaks20 a graph of the results These resampled means which are now continuous variablesl have a distribution of their own illustrated by the histogram The distribution appears to be unimodal symmetric and with an arithmetic center at about 20 As we expected 1Well not really since the number of samples of size 4 is nite 12500 samples of 4 shoe sizes in each sample 7 f a Densly 42 WW y a Figure 2 Histogram of means of samples of observed shoe size Remember that the area of the histogram sums to one when the y axis is density The relative frequency that the resampled mean is between 1920 and 20 is approximately 2 44 2 42 2 70 2 41 394 We could restate the distribution of these sample means using a frequency table with inter vals We are now looking at continuous data the number of possible values is unlimited it is now appropriate to characterize the distribution using intervals Interval Height of Rectangle BaseHeightRel Frequency Cum Rel Frequency 184 lt E lt186 29 2 gtk 29 058 058 186 lt E lt 188 13 2 gtk 13 026 084 188 lt E lt190 27 2 gtk 27 044 128 190 ltElt 192 0 20 0 128 192 lt E lt 194 44 2 gtk 44 088 216 194 lt E lt 196 42 2 gtk 42 084 300 196 lt E lt 198 70 2 gtk 70 140 440 198 lt E lt 200 41 2 gtk 41 082 522 The short story here is we have data with a continuous distribution and we can estimate frequencies using the areas in the histogram With what relative frequency is the resampled mean between 194 and 2007 With what relative frequency is the resampled mean between 186 and 1927 With what relative frequency is the resampled mean below 1887 With what relative frequency is the resampled mean above 212 Which value of the resampled mean is greater than about 30 percent of the data Notice that we are taking the height of the histogram times the width of the bars 7 the area under the pdf 7 to approximate the relative frequency It turns out that we could repeat this procedure for all sorts of distributions and nd that the distribution of the sample mean would look similar We talk about the Central Limit Theorem a little bit later 7 it gives us the result that the sample mean is normally distributed 12500 samples of 4 shoe sizes in each samp e DEE l V 01504 MD 1 Densty DDS DZJTD V w V V 2a V g A Figure 3 Histogram of means of samples of observed shoe size with graph of approximating function the normal curve Remember that the area of the histogram sums to one when the y axis is density The area under the normal curve sums to 1 In R we can generate this graph sigmalt sqrt var apply xmatrix 1mean mult meanapplyxmatrix 1mean here we want hist apply xmatrix 1mean breaks20 colquotredquot dens ity20 main quot 12500 samples of 4 shoe sizes in each samplequot Xlabquot quot freqFALSE Xlimc 16 24 par newF parmfgc 1 1 curve sigmasqrt23 14 1eXp 5X musigma 2 1624Xlimc 1624 ylabquot quot mainquot quot lwd3 2 Normally distributed Random Variables When we write X N NUMTZ we say that 7X is Normally distributed with mean parameter mu and variance parameter sigma squaredi The parameters for the Normal distribution are the mean M and the variance 02 The pdf for any Normal random variable7 X7 with parameters p02 is 2 2 2 712 M 1 M lt m gt exp 202 lt gt is the density function for the normal random variable This distribution7 the Normal Distribution7 is used as a model for continuous data that we believe to be unimodal and roughly symmetric You see here that it arises naturally as the distribution of sample averages 7 via the Central Limit Theorem lt arises naturally and often7 in many circumstances So if someone writes J57 257 you know to refer to a Normal distribution of mean 5 and variance 25 standard deviation 5 We use a trick to check that f is in fact a pdf2 We need to check if Rf95 R27T712exp7 1 Look at the square of this integral7 with a substitution in the second product 71 2 32 71 2 92 7 270 exp77dzgtltlt2wgt expeTm 1 This yields 1 x2 2 7e y dxdy 27139 R R 2Without loss of generality7 let X N 3907 l Use polar coordinates for a substitution z r00807y T527109 and thus dxdy rdrd and x2 y2 r2 Then rewrite the integral 1 27r 00 T2 7 76777 de6 27139 0 0 Which yields 1 T2 7 d6 2W 5 2 l0 Which is 1 27r d6 27139 21 Normality of Transforms of type YaXb It turns out that if X 702 then Y aX b7 ab both constants7 then Y N Nap b70202 Brie y Fy llDY S 2 llDaXb xgb zib MX FX Then taking dFx fx 1 z 7 b fYW 51 a 1 5 5 V202 2mm 1 gimbwwwv 27mm 22 Standard Normal The Standard Normal random variable is a special case we set M 0 and 02 1 We usually write Z N N01 reserving Z as a the special letter for the standard normal random variable This is a rough7 rule of thumb style7 take on the p tiles of Z oll 7l3Z31m66ll 72 Z 295P73 Z 399 oll 7oo Z 05P0 Z oo5 ollD7oo Z ool 221 Normal Approximation to Binomial lf np gt 30 we often approximate the discrete Binomial distribution with the Normal model by setting X inp 5 2 Z 711907 Then Z N J07 l7 as usual 3 Introduction Nonnormal Random variables We have seen that the normal distribution arose naturally via the Central Limit Theorem to be covered in Chapter 8 as the distribution of sample averages We use the normal distribution as a model for symmetric7 unimodal probability distributions There are many cases where the normal distribution is not applicable Here we introduce three asymmetric continuous distributions 4 Gamma Distribution The Gamma Distribution use P as the capital Greek Gamma arises as an continuous extension of the Poisson distribution Remember that the Poisson distribution was a large sample size small probability p limiting approximation for the Binomial Distribution Remember that the Binomial distribution asked how many successes in a xed number of Bernoulli p trials Remember that the Bernoulli distribution is the fundamental engine for all of probability something happens with probability p7 or it doesnt with probability 1 7 p From the Bernoulli7 we use counting principles combinations and permutations7 the factorial notation and build up to the Poisson 41 Continuous Time Factorial We know that n nn 71n 7 2 n 7 3 1 this is the number of distinct arrangements of n distinct objects taken all at a time This notation and de nition works well when n is an integer real number An extension of the factorial de nition to 2 any complex number3 is the Gamma Function de ned as4 m 0 tz le tdt z 71 3 Here are a few facts about the Gamma Function7 some naturally from its de nition as the factorial for integer real arguments o Pz z 71Pz 71 0 When n is an integer Pn n 71 o P11P12 0 P2P17 z 7 9139an 3Remember 2 a bii a and b are real numbers and i R then 2 is a number on the plane de ned by the real and complex axis 4This all arises from the idemdifferentiability of 69 A brief 77proof zl fem tzequotdt 7726quot 2f0mtz lequotdt zz7 ll zl oo 0 42 General Form of the Gamma Distribution Let7s change a variable x t then since dx dt a dt Wizil im l m2 3 e dz 0 Rearranging things and calling 2 a 0471 im JCW 0476 d 4 Here is our de nition of the pdf for a non negative random variable X We say X has a Gamma distribution with parameters a and B X N Poz The Gamma distribution arises as the distribution of waiting times until between a number Poisson distributed events each with parameter A We set 6 A l 421 Proof interpretation of the Gamma Say we have X E The random time needed to see exactly k Poisson events each event u distributed Pot Then the cumulative distribution function for X is PX x 17PXgtz I just rewrite the last equality 1 7 PLess than k Poisson events in z length interval U N P0tx since the length of the whole interval is X and the rate is A l7ll abbreviate this next part a bit but the gist7 is5 5Take the antiderivative integral of zk le z with respect to 2 and you will see that it is 21871 zk le zi 10 co ukileiu 1 i idt Pneed z length of time to see exactly k Poisson events Now mAaukileiAu F d X ma Remember our parameterizations7 Oz k and B uaileiu FXW 0 W51 5 which is just our cumulative distribution function for the Gamma So7 our interpretation of the Gamma is that if X N No Ia then X is the amount of time to wait until k Poisson events happen7 each with parameter PM just tell you EX MX 046 and VarX a 0462 The Gamma distribution is useful because of its exibility Many non negative processes can be modelled dgmmalx 2 Figure 4 The pdf for a Gamma distribution With 3 xed and Oz increasing as we change from black to green 1 6 dgmmam 1 vale Figure 5 The pdf for a Gamma distribution with 04 xed and B thus the rate of each poisson7 7 is increasing is decreasing as we change from black to green 5 Exponential Distribution A special case of the Gamma distribution is when a k 1 and B 7 we call this the Exponential distribution7 X N Exp f96 A 6 The above is the pdf for the exponential distribution Below 1 7 67 7 is the cumulative distribution function So then7 for X N Gammaa 16 we say X N Ezp so EX 046 i and VarX 0462 We7ve seen this distribution before One reason the Exponential distribution is so important is its 77memory less77 property That is7 the distribution of successive intervals of waiting time are independent of past waiting time To see this7 take X N Ezp and look at the probability one has to wait 5 units of time to 12 see a Poisson arrival7 given that to units of time have expired That is P gt thZto pX25t0 2t0b Pz 2 to 1 i 1 i MSW 1 i 1 i we 5Asto 45 57mg PX 2 s 6 ChiSquared Disribution If we take X N No 76 2 then Fm m tT2 1e t2dt s 0 P072 is the cumulative distribution for a strictly non negative random variable that we call 77Chi squared ln notation we say X N X20 or 77X has a chi square distribution with 7 degrees of freedom This distribution is widely used and is tabled in the back of most introductory statistics textbooks There is a speci c use of the Chi squared distribution7 which illustrates its wide application 602 Squared Deviations are ChiSquared distributed Take X N NM702 We know we can 77standardize77 it by setting Z Z N N017 which you should be able to verify Z is an absolute measure of the deviation of X7 scaled by 039 We know that7 on average EZ 07 which may obscure the magnitude of many deviations some will be positive7 others negative 7 they will cancel When it is important7 we construct Z2 075 as a measure of the squared deviation Set V Z2 and investigate the distribution ofthe squared deviation Fvltvgt Pltv v Pew z s w i 2 ie 22dv 0 V 27139 Let 1 then v I FV 1 0 mewMy which looks suspiciously like a No U276 2 distribution Let7s look at F 1 f1 the density function 1 7 7112 1 i 5 f Tm 7 1271 112 r12212 6 lt is the Chi Squared distribution So the square of a standard normal random variable is Chi squared distributed 7 A few Inequalities and Limits When you know the probability distribution7 FX7 of a random variable X you know ev erything about the random variable The limits and inequalities here hold with minimal assumptions that an FX exists7 that fzdFX lt 007 etc The results are general the in equalities induce limits which are inferior to those when the particular FX is known the asymptotic results are presented without convergence rates 71 Markov s Inequality Tchebyshev s Inequality Given a random variable X gt 07 fX dFX7 fdeX lt 007 we can write IPX 2 a a dFXt oo dFXtgt Xgtoz Which implies ozlP X 2 a zdFXt Xgtoz since 04 lt X Which implies ozllDX2a xdFX Xgtoz de EX Which yields Markov s Inequality E X MX 2 a g a l 9 Now substitute X gt gt X 7 2 Then 7 2 pXM2 2 a2 3 EKX ZMH a yielding Var X Mix 7 m 2 a s lt10 Which is Tchebyshev s Inequality 711 Example Let X N EX 507 VarX 25 Then 0 MX gt 75 leX gt 75 g o M40 lt X lt 60 imX i 50 210 VWX 25 102 W NB The TchebysheV and Markov bounds are not tight knowledge of a particular result gives an exact7 lower bound 712 Example Let X N U010 Then EX 5VarX 253 By TChebyshev7s Inequality 25 lP X75 4lt752 i igtgt342 but7 by using the exact distribution PX75gt42 72 Weak Law of Large Numbers Given X17 Xn EX M We know from Algebra that EX M and VarX 02 Using TChebyshev7s inequality yields HWY 7 m 2 6 lt L2 for E gt 0 Then7 in the limit 7 527 lim P Xim 26 a0 naoo 11 the Weak Law of Large Numbers states that the probability that the distance between the sample mean and the true mean is positive7 goes to zero as 71 goes to in nity 73 The Central Limit Theorem The CLT is a result on the distribution of the sample mean The assumptions are given X17 7Xn N 702 Let Z 7 then 2 ragw m e Tds z 74 The Strong Law of Large Numbers 12 Given X17 Xn EX M 0 Let Sn ZXi Working with the ELSE X027 the expansion of the sum is m X 01 Z XfXj 62 Z Xfo Cg Z XfXij 04 Z XinXle M M ijk 13th But Xi L Xj so ElXZSXJl ElXiXJXle ElXiZXJXk 0 Only terms X1 and XlZX2 remain ES nEX 60 EX X n K 3nn 7 NB Since VarXl2 EX 2 EXl227 which yields ES nK 3nn 71K Dividing by 7147 gives Taking the limit as 71 goes to in nity gives 54 lim 7 i 0 mace n and multiplying both sides by 7147 yields the Strong Law of Large Numbers 313 a r 0 lt13 which is just that the sample mean converges to the true mean in the limit Remark The WLLN states that the sample mean is likely to be near the true mean7 but large values of the sample mean can occur The SLLN7 with the additional assumption of nite variance7 states that the sample converges to the true mean exactly 75 Jensen s Inequality We call a function b convex if7 for 0 S 6 S 1 17 lSYE 2028 A and B Spring 2009 Lecture 18 Dr Kobi Abayomi April 25 2009 1 Introduction Bayesian Statistics The Bayesian approach augments frequentist procedures by including prior information about the parameter of interest 0 Consider X N f9z say X N Na02 In the frequentist approach7 a 02 are constants which we estimate say via the likelihood Iikxl0 Hg fan for example We derive estimates using the likelihood of the data or the sampling distribution Common estimates are Ill E and 72 s2 In the Bayesian approach7 9 say a 02 is an instance of a random variable with a PDF say 7r0 and now we derive estimates using the additional randomness of 7r0 via BayesEquation MK fltxlg0 M X In this setup for 1 Wm X N f9x 7rt9 E prior dist for 0 fXlt9 E likelihood prob of data given 0 9X E marginal dist of 1 xn 7rt9lx E the posterior dist for 0 We get the marginal distribution for the data7 gx 7 Z fltxit9gtWlt0gt 923 discrete 9X 7 fltxit9gtWlt0gt 9 is cont 2 Bayesian Computation In the Bayesian approach7 the posterior for 0 7r0x is a full PDF7 or distribution This distribution is the tool or method by which we conduct inference Example Let X N Bin2p7 and 7rp 1 67 7rp 2 4 Then fF12W imp 12 05pm 729W And 9X pr7rp p 7r1 7r2 051 92 6 032H 4 So i 7r1 i 7r1 m 7 T 7 i 1192 m 6 7 192 m 6 2P82 m 4 and here7 7r2 m 17 7r1 m Suppose we observe data mobs 07 then 1092 6 391 010926 2082 4 6550 7r20 3450 Example 2 Let X17 Xn N P0i and the prior N Pa Then AmieiA 10040 H T i1 139 AaileiA W WW Generate gx 00 n i 7A 0471 iA Am 5 e g 0 11 9 N005 PZ 04 Hzixrltagtmltgt2ma Thus Azwiaelewltwltn lgtgt W mm commm DEW This is just B 7r X NP 2 l047 1 Example 3 Let X N Bin2p fpm m p With 7139p17 0 g p g 1 Then7 1 M fxp7rpdp 1 05 p 1p2 dp 0 In this case 90 91 92 13 so 05p 1p2 wmmiegjggii 3 05pm W m Now7 with a full PDF for 197 we can nd posterior estimates of parameters Contrast this with the frequentist approach where we found point estimates and used the sampling distribution in the Bayesian approach the sampling distribution role in inference on the parameter is replaced with the Bayesian posterior distribution Example X N Bin2p and 7rp 17 0 g p g 17 we observe data m 1 Thus 7quotPl1 6291 29 Thus 1 23 EU 0 p6p1pdp 12 one possible estimate for the parameter p Another possible estimate of p7 13 the posterior mode at wmm dp 13 12 3 Bayesian Posterior Intervals In the frequentist approach the Con dence Interval is the interval7 say I 7 such that ll u E I in repeated experiments 17 a In the Bayesian approach the Con dence Interval is the interval7 I7 such that lSYE 2028 A and B Lecture 10 Sampling Distributions and Test Statistics Dr Kobi Abayomi April 2 2009 1 Introduction The context for Con dence Intervals and Hypothesis Testing Sampling Distributions for Test Statistics Here is a non exhaustive illustration of the populationisample dichotomy that is the center of what we are studying in this introductory course Population Sample Random Variable Statistic Population Mean7 Expectation Sample Mean Parameter Estimate M E We make assumptions or de ne a population to 77 t77 observed data Our data is information about events we wish to speakior gain inference about The natural framework is that of an experiment the population composes our assumptions about what might happen the sample data compose what we actually observe Our beliefs about what we seeithat is the sample distribution7 are related to our general assumptionsi that is the population distribution We have canonical population models in our overview of random variables Bernoulli7 Bi nomial7 Poisson7 Normal7 Exponential7 etc characterize types of experiments we use these characterizations to make statements about data Bernoulli distribution to model simple events that can either happen or not Like whether a coin turns up heads or not Binomial distribution to model sums or totals of Bernoulli events Like whether a coin turns up heads k times in n tosses Poisson distribution to model Binomial type events when the probability of any event is very low and the number of events is very high Like the number of soldiers who are kicked in the head in a military campaign in 18th century France Exponential distribution to model continuous positive events like waiting times or time to failure Normal distribution to model averages of events or events where the outcomes are contin uous or when we just dont know any better hal Chi Square distribution to model squared deviations sums of squared deviations and squared normal random variables Moving on we use our these canonical random variables to make statements about observed data The setup is almost always this we compare observed data to an expected value under our assumptions This comparison yields a test statistic We then use our probability model ie our fundamental assumption about population for the data to make a probabilistic statement about the population parameter In general a test statistic looks like this observed value 7 ex ected value TestStatistic p 1 standard error In general the 77observed value77 will be some statistic or function of data The 77expected value77 will be some parameter the population correspondence of the statistic We call statistics used in this context 7 to estimate population parameters 7 estimators A popular notation is to use d read 77theta hat as an estimator of the population parameter 0 We have already been exposed to one such estimator i E 7 the sample mean We use functions of data 7 statistics 7 to estimate parameters and then our test statistics are rescalings by the standard deviation of our estimator We call Var6 the standard error of the estimator 2 Sampling Distributions as Test Statistics We haven7t yet looked at hypothesis testing but we have used a speci c example of a test statistic for the population mean For instance if X N M 02 is a random variable model and we collect some data E z 21 The distribution of the sample mean variance known You will recall that the sampling distribution E N Ni 0271 can be used to construct the test statistic E Mo xUzn which has the standard normal distribution N0 1 The Z statistic is the deviation of the data from the null hypothesis over its standard deviation ln words abs 7 exp Z E SD0bs is the statistic we want to use if we want to make a probabilistic statement about the true mean M using observed data X1 Xn 211 Example We believe X N M 80002 402 Find the probability that a random sample of 16 bulbs will have an average life of less than 775 hours Solution We know Y N NW 80002 100 So Then MY lt 775 M2 lt 725 00062 212 Example We saw that we could use the number of successes in a Binomial experiment as an estimate of the parameter for a Bernoulli We let 13 and Y N B nn1 remember that Y Z Xi and every Xi Bern11 Then 13 E our estimate for the population proportion of success of the Bernoulli experiment notated with X Using some numbers for illustration It is known that 42 of trick or treating nutritionists are overweight How likely is it 50 nutritionists out of a sample of 100 are pleasantly plump Solution First notice that we are given the population proportion 13 42 by the words 77 it is known Notice also that we have crossed into the world of data if the word 77sample77 is deleted then we could see this as merely a standard Binomial probability question2 Here we are asking a question about the distribution of 137 the sample estimate of the popu lation proportion P Z 5 We know that the distribution of the sample estimate of the population propor tion is normal 13 N1L 4202 We7ll use a zstatistic 20 100 Then the probability that we7ll see 50 or more heavy nutritionists out of a sample of 100 is M13 2 5 lP Z 2 20 lP Z 2 163 m 05 What do I want you to get from this illustration 0 First The sampling distribution is truly a distribution we can answer probability questions about sample data by appealing directly to the sample distribution 0 Second The distribution of the sample mean is Normal This is the result of the central limit theorem Regardless of the distribution of the parameter7 if our estimate of it is a sample average an average of data we can use the CLT to make probability statements 1this is just the de nition of a Binomial experiment 2And the answer would be lP Y 2 50 2350 C 00i42ki58100 ki 0 Third Notice the special use of the Binomial setup to generate estimates of the Bernoulli parameter 0 Fourth Notice our usual construction of Z so that we can use our standard normal tables in the back of the book or on your computer Situations often arise where the sample mean cannot suf ciently describe7 or test for7 im portant hypothetical differences in populations We must appeal to other distributions7 to other quanti cations of difference7 to test other hypothesis A useful alternative is 22 The T Distribution In many situations we cannot assume that we know the variance of the sample mean As well7 we often have not enough samples to apply the central limit theorem to the sampling distribution In these situation we construct the t statz39stz39e f i M t 7 2 The t distribution7 T N tdf is an approximation to the normal distribution Notice I have written df as the parameter of the distribution3 The T distribution is centered at zero7 just like the Z4 We let df E degrees of freedom When we talk about sample data7 we loosely de ne 77degrees of freedom77 as the number of independent observations 7 the number of observations we have left after we subtract the number of parameters we have to estimate df E n 7 k where we let n the number of observations and k the number of parameters to be estimated Notice that our constructed t statistic is a deviation7 which we expect to be Normal7 rescaled or divided by our estimate of the standard deviation Notice that 3What7 if any7 are the parameters for the Z N07 1 distribution The parameters are a 0 and a2 1 4It turns out that ET 0 and VarT 3 T if WAITW SQ02 is a standard normal random variable divided k3 a chi square random variable We showed at the end of lecture 8 that the distribution of X is independent from the distribution of 2 and are Normal and Chi Squared The density function for the t distribution we get by writing Z T7 Vr with Z and V independent 1 l 7271 7112 fZ V V27T6 122 Pr22T2U 6 From what we know about transformations we get the joint distribution letting U V fTU hul and we integrate over U7 integrate out the Chi Squared random variable to get rwnm 1 Notice that this simpli es to Yiw 7 5N5 which is the way you use it 221 Illustration and Setup Suppose we have a process X N M 02 unknown7 and our estimator 02 52 We want to look at the sample mean E 7 to gain inference about u 6 We then need to look at the probability distribution for T Example What is the probability of a sample of size 25 having a mean of 518 grams and standard deviation of 40 grams7 if the population mean yield is 500 grams Solution The t statistic is 7 4mm 7 Then MY gt 518 Mm gt 225 002 3 The Chi Squared Distribution and Test statistic Example Say we are interested in the fairness of a die Here is the observed distribution after 120 tosses lDieFace l1l2l3l4l5l6l lObsCountl30l17l15l23l24l21l What is the probability that the die is fair Using what we only what we have done so far we could test the hypothesis that the die is fair by doing a test of mean What is the probability that the mean is 35 We calculate the sample mean to be E 3433 Using the variance of a fair die7 02 2917 we can compute the sampling distribution and thus the value of the Z statistic is 3475 735 20 7161 129120 This yields M2 gt 20 4364 too high to say it is unfair For a better test5 in this situation 1711 point out that the expected number of counts for each die face funder the hypothesis that the die is fair and each face is equally likely7 should be 120 20 Looking at our data it seems we have more than 20 in some cases and less than 20 in others the positive and negative deviances tend to cancel out lDieFace 1l2l3l4l5 6 lObsCount 30l17l5l23l24 21 lExp Count 20l20l20l20l20 20 A better test statistic here is the Chisquare6 statistic X2 2 Zn obs 7 eat2 6m X The Chi squared distribution is strictly positive and takes one parameter V the degrees of freedom or number of independent observations number of observations minus the number of parameters to be estimated 31 The ChiSquare test for Goodness of Fit In general the Chi square statistic is a test of Goodness of Fit or how well the data ts distributionally Large values of the Chi square statistic indicate large deviations of the observed from the expected thus we reject the null hypothesis for large values of the test statistic For the Goodness of Fit type hypotheses tests the deviations have already been squared the tests are naturally one sided The observed count at each bin is obvious and collected in the data We must calculate the eapected count for each bm under the null hypothesis Here if the die is fair the probability of getting in any bin is lP Dz39e 239s 1 lP Dz39e 239s 6 16 So the expected number of counts in each bin is 16 gtk 120 20 This is the general procedure if I call lP Bz39n m then Expected 7T gtk n 5More powerful 6The X2 distribution in general is the sum of squared independent standard normal random variables That is X2 Zi where each Z N N0 1 We say X2 has n degrees of freedomi 8 and we are testing7 if 71139 7n 7 7 for at least 1 i n Here7 our observed test statistic is xi M w 1800 The number of degrees of freedom 71 7 17 here 6 7 1 5 Notice that the total number of observations is xed7 that is how we calculate the expected frequency Once the total is set we lose a degree of freedom Now 1mg gt 1800 lt 0005 From the table in the back of the book7 which you should familiarize yourself with So we say the die is unfair 32 The ChiSquare test for independence the TwoWay layout The Chi Square test is useful for the contingency table7 or two way setup Remember the contingency table from Chapter 1 a variable on rows7 a variable on columns7 each cell has the observed counts for each bivariate value of the variable We used this example Fashion Level Classlevel Low Middle High Total Graduate 6 4 1 11 PhD 5 1 2 8 Pre K 30 25 75 130 Total 41 30 78 149 So7 briefly7 if we let X Class Level and Y Fashion Level then the observed value of X GraduateandY Low 6 and the observed probability is lP X GraduateandY Low 6149 04 Under a null hypothesis that the distribution of Class level is independent from Fashion level the probability ll X Graduate andY Low ll X Graduate gtk MY Low7 we would just multiply the marginal probabilities l7ll use this notation 7117 E the number of observations in row i7 column j As well mung7n are the sum over row i7 column j and the total7 respectively Then llDX row 239 and Y col j 7 So the expected number of counts in row 239 and column j under a hypothesis of independence7 is Expected For our data here we calculate 6711892 7578265 2 X H 143992 1189 8265 1PXigt 1492 lt 0005 We conclude that class level and fashion are not independent 4 Fdistribution for ratio of variance lf X17 Xm is distributed Nn1of and Y1 WY is distributed NMg7 0 then the ratio 5202 Z Z 3 8202 has what we call an F distribution with numerator degrees offreedom mil and denominator degrees offreedorn n 7 1 F is the ratio of two independent chi squared variables7 call them U N X2071 7 1 and VX2n71 lfUm7 S2thenUX2m71 1 If V then V N X201 7 1 Then 2 Wont 71 F Waynil which just simpli es to 37 An important identity for the Fdistribution is F1HV1V2 F71 4 551294 You ll notice that you may have to use this fact in looking up values on the Ftable in some books 5 Miscellanea 51 Boxplots A boxplot is an illustration of the distribution of a a sample Data Summary Minimum 1 Lower Quartile 2 Median 35 Mean 425 Upper Quartile 6 Maximum 1 x 4 Min Lower Median Mean Upper quartile luamlc Figure 1 This boxplot displays data that is skewed to the right and with an QR 4 In R xltrnorm100 7It turns out that 152 and 39uarF W where U m X2I1V m X2I2F UV Vu2 and U is independent of V ylt C X rnorm20 5 boxplot X boxplotx horizontalT boxploty horizontalT 52 Quantile Quantile Plots A quantile qutmtile plot is a plot of data values on the ordinate y axis versus theoretical quantiles on the abscissa z axis So a Q 7 Q plot7 in the typical name7 should look like a 45 degree line when these values are similar the plot is X Us Y FXXin Us Fnxi Ly where Fn is the empirical cdf the cdf induced by the data7 ie 1 7L E Z 1ziSw i1 A qth quantile 7 0 lt q lt 1 7 say7 is the value of the random variable or data yielded by evaluating the inverse cumulative distribution function at q That is F 1q qth quantz39le OI Fqth quantz39le q ISYE 2028 A and B Lecture 5 Dr Kobi Abayorni January 29 2009 1 Joint Distributions Two given random variables X and Y have a general distribution 7 a joint distribution 7 that is an extension of the single variable de nition and notation we generate from rst principles Eggmy MX S 9675 S y 1 Taking derivatives7 dFXJLy7 yields ln the discrete case PX7Y 96711 pmy 2 the joint probability mass function ln the continuous case PX7Y iwi f967y 3 11 Marginal Distributions We generate the marginal distributions for X and Y alone just as we did for contingency tables by summing over all values of the other variable 1 10AM Zp7ypyy 21196711 mn4fmmmmmn4 mmm If you will recall our contingency table examples7 where we generated a marginal distribu tion by summing over columns of the tale to yield the distributions of the margins llDX S m llDX S LY 3 oo llDliTmX S Y S y 00 liTm llDX S Y S y 00 11m Era9571 yTOO The joint survival distribution can be generated from rst principles as well lP X gt x Y gt y 1 ilP X gt Y gt y0 1 i IPX gt x0 U Y gt y0 by deMorgan7s laws 1pnguYi by the inclusion exclusion principle limX MY m7MXSLYSM by changing notation 17 FXz 7 Fyy Egg957 You can use the facts above to verify ll a S X S b70 S Y S Fxyb7d Pkg16 7 Fxyad 7 Fxyb70 This is just the fact that a positive area on the X7 Y plane has a positive FXy volume above it if F is really a distribution function This is in direct extension ofthe non decreasing quality of FX is one dimension 2 Examples 21 Example Take an experiment where we toss a coin three times Let X E the number of heads on the 15 two tosses Let Y E the number of total heads The sample space is Q TTT7 each event with probability 18 The event space for the random variables is X7 Y a 07 07 07 17 17 17 17 27 27 27 27 The full joint is lXY l00 01 11 12 22 23 llPXzYyl18 18 28 28 18 18 The marginal for X is llDX 2 1818 0 2828 1 1818 2 22 Example Let 1 EA 1meA 07 W39 7 Let fxyy 62y10ltmlt10ltylt1 Then ll 0 lt X lt 34713 lt Y lt 2 3 2 34 fxyx7 ydzdy 13 0 1 34 2 34 6x2ydxdy Odzdx 13 0 1 0 38038 which is the volume under the curve 6x2y 23 Example Let fxy7y nal2932 c and R sorne constants What is c What are the marginal distributions What is the distribution of the distance of any point X7 Y First 0 dzdy 1 12442932 1 d d 1 7 c m2y2 R2 y WRZ Second 1 d 7 d fxx fXY9671 y WRZ agKm y Substitute y VP 7 2 1 y 7 dt WRZ y 1 1 72 72132 7 2 WR2 y WRZ z Third Let D x2 yz Then7 from rst principles7 FDd 1PX2 Y2 lt d 4 1PX2 Y2 3 d2 1 7d d m2y2gd27TR2 y 1 7 d d WR2m2y2 d2 y i 7Td2 7 TIRZ 24 Example Multinomial Distribution The multinomial distribution is an extension of the binomial distribution In this model there are 71 identical experiments each with k possible outcomes each outcome having probability pi 2110 1 From rst principles the probability mass function is PUG n17 7Xk 71k 0771 ka ka 8 CZIWM is the multinomial coe cient which you have seen before 21 71 71 Example Roll an unfair die let the probability of rolling a one be 12 the probability of rolling either a two or three be equal and twice that of rolling a four or ve the probability of rolling a six 14 What is the probability of rolling 3 ones 2 twos 2 threes 1 four 1 ve and no 67s l3i2i2ilil10 312121111101 2 12 12 24 24 4 3 The conditional distribution Recall from rst principles PX Y fXYlyfX7y EfyieziifieMXexielYEiie lt9 This is the conditional probability density function for X given Y y We generate the conditional cumulative distribution function in the usual way 5 llDX 3 MY y FX yyly 11 leYytlY ydt 10 As well7 we generate the moments of the conditional distribution in the standard ways the conditional expectation7 stated in general for any function 9 X117 X2 EgX2lX1 xi 00 gltx2gtfx2xlmlltzziz2dz2 lt11 and the conditional variance we can express as the difference between the squared expec tation and the expectation squared VarX2lX1 1 12lX1 1 12 12 Of course7 the indexes X2 given X1 are arbitrary 7 switch them around for X1 given X27 etc 4 Example Take two random variables X17X2 fX1X21z2 2 10ltm1ltm2lt1 We generate the marginal distributions in the usual way7 paying special attention to the speci cation of the distribution 7 the limits in the indicator function 1 fX11 2d2 7 1 39 10ltm1lt1 1 2 fXZWZ 261 2M 39 10lt12lt1 0 Which yields f fX1X2 i i X1lX2m2 fX2 2152 2 for the conditional distribution of X1 given X2 2 The conditional expectation m2 1 2 EltX1lX2 2 1id1 i 0 2 2 in this instance is a function of 2 The conditional variance VarX lX ixm2 J Z lm 7955 1 27 2 7 0 1 2 2 1712 is also in this instance a function of 2 NB nota bone EX1 f01z2 7 2x1dz1 23 but EX1lX2 x2 The ex pectation of X1 is a constant but the conditional empectation of X1 given X2 is a random vaiiable Are X1 and X2 independent Heuristically by just looking at the pdf 7 with the indicator 7 we could conclude no Does M0 lt X1 lt X2 g 7 M0 lt X1 lt Well On the one hand 7 ll 0 lt X1 lt ng g fol2fX1 X234x1lx2 dz1 fol2 gdxl g but on the other7ll 0 lt X1 lt fol2 fX1d1 fol2 2171dz1 g 5 Example Take two random variable X1 X2 beXZ 6x2 10lt12lt11lt1 The marginal pdf for X1 is 11 2 fX11 62d2 31 39 10ltm1lt1 0 the conditional pdf for Xng1 1 is fX1X2 6 22 7m X X 4 i m m anXii 1 2l 1 961 le 395 96 olt 2lt 1lt1 and the conditional expectation is 1 22 2 EX2lX1 951 2i2d2 96110ltz1lt1 0 1 3 7 a random variable Let Y EX2lX1 1 then Y is a random variable7 dependent upon the value of X17 where 0 lty lt 237 since 0 lt1 lt1 The Cdf for Y is my 1W s y Mm y IPltX1 7 which can be computed using the pdf for X1 as 3212 27 3 3x dx1 7y 0 8 The pdf for Y is 2 may My 81 the expectation for Y is 23 81y2 1 Eco yltTgtdy and the variance is 23 81 2 1 1 varm WT m 7 7 i 0 So Y EX2lX1 x1 is a random variable with Y My i703 i NB fX2x2 f 6x2dx1 6217z210ltm2lt1 which yields EX2 i and VarX2 l 2 6 EEX1X2 EX1 and VarltEltX1X2 3 mm The big result7 implied by the last example is EEX2lX1 EX2 13 The expectation of the conditional expectation is the expectation VarEX2lX1 S VarX2 14 The variance of the conditional expectation is less than or equal to the variance ln general7 for any X17X2 EltX2 7 2fX2d2 7 2fX1X2d1d2 zz f2d2le1d1 EX2lX1fX1dx1 EEX2lX1l andl7 letting p2 EX2 V07 X2 EX2 M2 EltlX2 EX2lX1 EX2lX1 lez EltlX2 EX2lX1l2 EltlEX2lX1 lez 2EltlX2 EX2lX1llEX2lX1 MD The last term 2EX2 7 EX2lX1EX2lX1 7 Ml equal zero 7 2 X2 7 EX2lX1llEX2lX1 7 mifmdmldzz fX1 f X 7 2 lEX2lX1 7 2 m 7 EX2lX1l Mammal BUt EX2lX1 7 EX2lX1 07 Which implies VWXz 137le EX2lX1l2 ElEX2lX1l2 137le EX2lX1l2 ElY MYlZ 137le EX2lX1l2 VaTEX2lX1 ISYE 2028 A and B Lecture 12 Con dence Intervals and Hypothesis Testing cont Dr Kobi Abayomi March 25 2009 We have looked at hypothesis testing generally but we have used only the speci c example of a test for the population mean For instance if X N 002 is a random variable model and we collect some data E xi Then the hypotheses H03MM0VS Haiw uo we use in a two sided test of the population mean You will recall that we use the sampling distribution E N Np02n to construct the test statistic M0 xUzn which has the standard normal distribution N0 1 This setup is often suf cient the Z statistic is the deviation of the data from the null hypothesis over its standard deviation ln words abs 7 ex Z E p SD0bs is the statistic we want to use if we want to test the proportion of people who vote for Pedro the mean income of Njoroge7s in Kisumu if the sample mean is representative of our population mean Situations often arise where the sample mean cannot suf ciently describe7 or test for7 im portant hypothetical differences in populations We must appeal to other distributions to other quanti cations of difference7 to test other hypothesis A useful alternative is 1 The Chi Squared Distribution and associated Hy potheses Tests Recall this example from Lecture 10 Say we are interested in the fairness of a die Here is the observed distribution after 120 tosses lDieFace l1l2l3l4l5l6l lObsCountl30l17l15l23l24l21l The appropriate test statistic here is the Chisquare 11 The ChiSquare test for Goodness of Fit Formally7 here7 we are going to test H0 The die is fair vs Ha The die is not fair In general the hypotheses tests are n H0 7n J for allz 71 vs 71139 Ha 7n 7 7 for atleast 1 z 71 Remember here7 our observed test statistic is xi W 1650 1800 The number of degrees of freedom 71 7 17 here 6 7 1 5 Notice that the total number of observations is xed7 that is how we calculate the expected frequency Once the total is set we lose a degree of freedom A x955 1170 Here xi gt x955 1170 so we reject the null hypothesis We conclude the die is unfair1 12 The ChiSquare test for independence the TwoWay layout The Chi Square test is useful for the contingency table7 or two way setup Remember the contingency table from Lecture 10 a variable on rows7 a variable on columns7 each cell has the observed counts for each bivariate value of the variable We used this example Fashion Level Classlevel Low Middle High Total Graduate 6 4 1 11 PhD 5 1 2 8 Pre K 30 25 75 130 Total 41 30 78 149 The formal hypotheses tests are 71 H0 7n i for allaj 71 vs 711739 Ha m 7 7 for atleast1zjpazr 71 2 7 6711892 75782652 7 2 7 For our data here we calculated X0 7 W W 7 1492 A X954 i 948 9 We reject the null hypothesis and conclude that class level and fashion are not independent 13 The T Distribution Remember if we cannot assume that we know the variance of the sample mean we appeal to the t distribution as the sampling distribution for the sample mean 1From the table in the back of the book7 which you should familiarize yourself with We often have not enough samples to apply the central limit theorem to the sampling dis tribution In these situation we construct the t statz39stz39c as well Formally the two sided hypothesis test is still one of location of the true mean H0 n 0472 unknown vs Ha n 31 n002 unknown A con dence interval here is 82 fit I7 ozdf n the associated margin of error 52 ME tutby the appropriate number of samples for a xed 1 7 04 con dence level titmfg2 MEZ For the hypothesis testing setup of H0 n no vs Ha n 31 no our observed test statistic is iii o 5W 2 Samples Independent or Dependent 1 or 2 or many In general always remember 1 The sampling distribution which will yield the 2 The con dence interval which is immediately analogous to 3 The test statistic Everything is a variation on this theme just a slightly different scenario 21 Scenario 1 Two sample proportions Say we wish to gain inference on the support for election reform in California and Georgia Let p1 E the proportion who support in Georgia and p2 E the proportion who support in California We estimate these7 in the usual way7 p1 71132 72 the sample proportions of voters who supported the reform over total voters7 for each state We know from the sampling distribution of 13 Mg p1Ep2 p2 and Varp1 m Varggz m n1 7 n2 39 The difference pl 7 p2 is distributed 10 10 P1 P2N1011027i1 1 2 2 711 712 This is the sampling distribution for the di erenoe in proportions The appropriate rescaled statistic is I31 152 i 101 102 SD61 7 p2 and it will have a standard normal distribution Thus7 a con dence interval for the difference in two proportions is 131 7132i ZaZ 10191 10212 n1 712 For the two tailed hypothesis test Hoipi 102 VS 101 102 we exploit the fact p1 p2 implies pl 7 p2 0 and write 12 711712 pooled 13p to pool the estimate of the population proportion7 since7 under the null7 here7 p1 p2 Then our test statistic is 22 Scenario 2 Two samples in general In general if we have data coming from two samples X1 7512 and X2 7522 and we cannot assume knowledge of the variances we get a sampling distribution for the difference in the population mean 1 7 2 as 7 7 512 522 172NM17M277 711 712 which we approximate with a t distribution with 711 712 7 2 degrees of freedom Thus the con dence interval is 812 822 f1 7 f2 l t 2 2 gtllt 04 in2 n1 712 The two sided hypotheses test for differences in the population mean H03M17M2A0VS Ha3M17M27 A0 would use this test statistic Of course one sided tests are the usual variations on this If you are Willing to assume that 51 52 then you can pool the variance estimates with S 2 711 7 l812 712 71822 p 711 712 7 2 and use this test statistic 2The exact calculation for degrees of freedom here is more involved Using n1 n2 7 2 is good EI EZ AO Sp211 n1 mg to 23 Scenario 3 Two samples dependent In many cases it is not reasonable to assume that your two samples have arrived indepen dently We call data pained when it is natural to think of each sample as bivariate Like errors while playing piano with the right hand versus the left hand In these cases7 we believe that the samples come from one element7 perhaps7 but two separate samplings Let D X1 7 X2 thus di 11 12 and E X11 X12quot39Xn1 an 71 Here we have taken the differences in eachiobservation7 and then computed the average difference A sampling distribution for D is 5 N 1 M27SDZn where SDZ L idl i a n 7 l 1 We again approximate with the t distribution Here the degrees of freedom are the number of pairs minus 1 df n 7 l The con dence interval for paired differences of the population mean is then 7 2 d j tun27L71 Si 71 And the hypotheses test for paired differences of the population mean also known as a paired ttest is H0AA0vs HaA7 A0 uses this test statistic tigiAO 0 Sd 3 Beyond the sample mean 82 Thus far all of our con dence intervals and hypothesis test have been restricted to tests of the mean test of location it We have used the sample mean E as the natural estimator Now we introduce tests and intervals based upon the variance test of scale 31 A con dence interval for 02 We have to accept as fact3 that for a random sample of size n from a normal distribution with parameters 1102 that n 71SZ U2 W 71gt lt1 ie chi squared with n 7 1 degrees of freedom We use this fact to set up a con dence interval now for 02 using the estimator SZ Since P061042 n71 lt 70952 for 04 xed4 lt xiZ il 1 7 04 then a 1 7 04 percent con dence interval is 3The proof involves techniques not introduced in this class but look at lectures 79 and you ll get the avor 4Notice that xiiaQmil 7xi2m71i The chisquared distribution is not symmetric nor is the associated con dence interva n 7152 n 7152 2 7 2 Xa2n71 X17a2n71 l 2 An intelligent reader like you understands that the interval for o is just the square root of that for 02 32 Ratio of variance Remember from lecture 10 If X17 Xm is distributed Nn1of and Y1 WY is distributed Nn2o then the ratio 1 512012 7 5305 3 has what we call an F distribution with numerator degrees offreedom 77171 and denominator degrees offreedorn n 7 1 From what we just learned in the previous section F is the ratio of two chi squared variables7 call them U N X2071 7 1 and V N X201 7 1 If U 7952 1 then U N X2071 71 If V 7071252 then V N X201 71 Then 2 F Wont 7 1 ftyn 71 which just simpli es to 35 Remember this identity for the F distribution is F17aV1V2 F0212le 4 You7ll notice you have to use this fact in looking up values on the F table in some books Lastly7 we can construct a con dence interval for the ratio of two variances using this fact we7ve seen this reasoning before PFjD 212 lt F lt Fat2M 1 7 Oz and rewriting so 72 that we get a statement PF1a21V27 lt 07 lt Fa2V1V27 1 7 a This yields 511 turns out that EF V32 and varF where U N X211V N X2V2F and U is independent of Vi m m 5 2 7 2 51 51 as a 1 7 04 percent con dence interval for the ratio 0303 4 R example continued from lecture 11 41 part b Here we are comparing costs of accidents in the non ABS year 1991 and the ABS year 1992 We can treat the cost as a continuous non proportion variable Remember the data from lecture 4 A hypothesis test H0 3 AM MNOABS MABS 0 vs H1 3 AM MNOABS MABS gt 0 The variances are unknown so we know we need to use a t test But can we assume they are equal and use a pooled variance estimator First things rst this data has missing values meandata Cost1991 Cost1992 2074952 NA we could also use data3742 meandatanarmT Cost1991 Cost1992 2074952 1714474 here we have removed the missing values 10 var datana rmT Cost1991 Cost1992 Cost1991 441529218 7008193 Cost1992 7008193 390409445 this is the covariance matrix for now we only need the diagonal elements We should do an F test for equality of variances 1711 skip the hypothesis notation for this intermediate test to know which form of the t test to apply gt data11data22 1 1233898 our observed value of the f statistic pfdata11data224137lowertailFALSE 1 02597970 the p value for our inherently two tailed test We can assume that the variances are equal7 so our test statistic is TNOABS TABS 0 5122niviyABs REES where 52 n1 15in2 05 p n1n272 The calculations in R gt meandata1narmT meandata2narmT 1 360 4787 Zthe difference in the sample means slsquaredlt vardatanarmT11 s2squaredlt vardatanarmT22 spsquaredlt 42 1slsquared 38 1s2squared4238 2 spsquared V VVV lSYE 2028 A and B Lecture 9 Estimation and Sampling Dr Kobi Abayomi March 25 2009 1 Sampling When we sample we draw observations from a population with a distribution Our samples are our observations We often almost always really say that our samples are independent and identically distributed From our samples we generate estimates for our population parameters In general our estimates are statistics quantities whose value can be calculated from sample data Prior to obtaining data there is uncertainty as to what value of any particular statistic will result There a statistic is a random variable and will be denoted by an uppercase letter a lowercase letter is used to represent the calculated or observed value of the statistic As a random variable a statistic has a probability distribution we call it the sampling distribution The sampling distribution depends upon the population distribution normal uniform etc and the sample size n and on the method of sampling We say that the rv7s random variables X1 XL are a simple random sample of size n if 1 The Xs are independent rv7s 2 Every X has the same probability distribution ie identical 11 An example in R xlt c4o4o4545455o5o5o5o5o nlt 1O Xlt sampleXnreplaceTRUE Here X is our population from which we7ll draw and z is our representative sample Remember that helpsample will give you help This little function will allow us to get the probability of a sample from a population jointplt functionsmple pop klt lengthsmple lpoplt lengthpop plt rep0k fori in 1k hlt smplei tlt poph ttslt sumt pilt ttslpop ppltprodltpgt printpp We can calculate the true likelihood for this simple example using this information print jointp X X ln general7 the likelihood for observed data is the probability under the model Llt6gt fx 0 and for a simple random sample7 usually looks like Example the likelihood of x 0 10100 1 if X N Berp is lz39k oltp317p4 Example the likelihood of a random sample of an Ezp Ang A 2211 mi Often we take examine the log likelihood lan in the above example lnLA nlogA i A z i1 Or We can resample from this distribution and calculate estimates Notice that the sampled values tend to the distributional values parmfrowc22 forn in c101oo1ooo1ooo histsample XnreplaceTRUE freqFALSEylimc O 6 mainquot quot Here is a function to calculate the sample mean and variances for many samples smeenvarlt functionnumssizepop Xbarveclt rep0num varveclt rep0num fori in 1num smplelt samplepopssizereplaceTRUE Xbarvecilt meansmple varvecilt varsmple Cbind Xbarvec varvec We can look at the sample distributions of these quantities hlt smeenvar100 50 X histh 1 histh2 2 Remember the rst sampling distribution and esti mators Remember that we observe data only not values of the population parameters We can use though our assumptions or beliefs about the population and facts about random variables to draw very powerful conclusions about our data The facts are about the sampling distiibution or the distribution of the observed data One very powerful fact concerns the distribution of estimates or averages from the data Succinctly when we take many samples we believe our estimates are normally distributed 21 The sampling distribution Let7s make this assumption We have sampled some data 1zn from a population that we believe is distributed X N 02 Remember that E Z xi So using what we know about expectations 1 lnl Xl E MX 1 ln words the empeetation of the sample mean is the population mean 4 As well using what we know about the variance h Varf Vaiquot ila z 1 2 n2 Varf UYXZ 2 ln words the vaiianee 0f the sample mean is the population variance divided by the size of the sample These statements equations 1 and 2 are true for any random variable X N a 02 These are the mean and variance of the sample mean 211 Example Say X E the number of success in n trials each success has probability p Then X has a Binomial distribution with parameters n and p X N Binnp We know already that EX np and VarX np17p So if I have a sample 1an1 that is l have n experiments 7 each of n trials Then Ezl np since the expected value of each draw from the population is the same We also know that Va 1 Varxn np1 7 p Here E E the mean number of successes in n samples of a Binomial Binnp experiment Thus 212 Example Say we let Y E the proportion of successes in a Binomial experiment Y So Em EltXgt gm p 1S0 the size of each sample is n and the number of samples is n VarY VarX lnp1ipl We often call Y our estimate for the true population proportion7 p We often call Y 13 22 Central Limit Theorem Let7s step back to the general case lets say that we have a random variable X N M 02 We know that if X N NW 02 in words X is normally distributed with expectation M and variance 02 Then M and Vow Regardless of the sample size There is another result The central limit theorem states that if X N p02 7 from y distribution 7 as the size of the sample7 n7 increases2 E N NW BUD DDS BTU UTE BED DES 18 Figure 1 Histogram of means of samples of observed shoe size with graph of approximating function the normal curve Remember that the area of the histogram sums to one when the y axis is density The area under the normal curve sums to 1 2to in nity 3 Estimators and Estimation In general point estimation is the procedure of selecting a 77best77 value for a parameter and thusly specifying a probability model The setup using mathematical notation is like this Let a random variable X have a pdf fx6 Some Examples we have seen X N Poi so fp 6 A META X N NM0 so m 90702 fx6w2 27T02 12e 202 etc The procedure of point estimation is using a function of the data X p1pn statistic which will be a good estimator of the parameters 6 asa 31 Bias We7ve already introduced some notation 77Theta hat77 or 6 is our estimator for 6 and we call the bias the difference between the expected value of the estimator and the parameter being estimated Bias E6 7 6 3 We call an estimator unbiased if the bias is zero For example 52 n 7 1 1 7 E is an unbiased estimator for 02 since ESZ 02 32 Methods of Moments Estimation Say we have an X N Gamma046 We know then that EX 6X 046 and VarX a 0462 Well take a simple random sample X1 Xn we believe each X N Gamma046 Our task is to use the sample Xn to generate estimates for 6 046 We know immediately EX 046 A 77reasonable estimate of 046 using this fact is 046 E This in general is called the method of moments for generating point estimates of parame ters A moment is an expectation of some power of a random variable The kth population moment is the expectation of the random variable raised to the kth power The kth sample moment is the arithmetic average of the sample data raised to the kth power Mk Xikn The Method of Moments estimators for 0 when 6 has p elements are generated by 0 Calculating the population moments EXk for h 1 p o Calculating the sample moments Mk Xikn 0 Setting them equal for each h and then solving for the p parameters I nish the f N Gammaa example We set E 046 We set 52 0462 Which yields 0 g and B g as the method of moments estimators for 046 33 Maximum Likelihood Estimation A nother3 way to estimate parameters involves using 77all77 of the distribution not just the moments Say we have a random variable random process whatever X N ft9 and a sample 1 xn simple and random each d N fx 0 ln words we have some random process with a pdf distribution and a sample where each of the observations has the same pdf 7 6 is the value of the parameter that speci es the exact distribution from the family The probability of observing a particular sample zl xn is the joint probability of 1 x We call this the likelihood xii fltxni Notice that here we are considering the likelihood as a function of parameter 0 the likelihood is the probability that we would observe a sample at a value of 0 When we have a simple random sample we assume each observation in the sample is dis tributed independently and equivalently We can rewrite the likelihood4 We xi H m 6 lt5 3ha ha is there a word notherl there should be 4Product notation Remember the summation notation The product notation is the same with a separating each term instead of a Now7 we just need to pick a value of 6 such that 5 is maximized So we turn the estimation problem into a calculus problem we have to maximize a function Often we take the logarithm of the likelihood function5 109027609 x7 7 WE m 6 7 2 Iowa 6 6 There are at least two good reasons for doing this First7 taking the logarithm allows us to consider a sum instead of a product We like sums6 Second7 taking the logarithm introduces concavity This allows us to nd the value of the parameter 6 6 which maximizes the log likelihood This value 6 we will call the maximum likelihood estimate for 6 and use it as our estimate For example Take X N Np702 and observe a sample 17zn We know that 2 712 7mm z N Maw 27w 6 Then n r 2 likuazx H27r02 12e7 202 this is the function we have to maximize to generate the maximum likelihood estimate The log likelihood is Wm 33 gt2 7 33 gt2 7 glogQW 7 nloga We take a derivative with respect to 6 and set each to zero and solve for M and 02 6 1 mloglik i p I 1m 7 M i 51 always write 090 I always mean ln loge 5The CLT refers to sums7 not products and 3 7n 1 n 2 Eloglzk 7 7 E 7 M Equating these equal to zero yields the normal or estimating equations two equations and two unknowns here This yields The estimator for n the mean MLE T So the sample mean is the maximum likelihood estimate for n and is also an unbiased estimator The estimator for 02 the variance UTZMLE 7 E2n So the maximum likelihood estimate differs from S2 7 7 1 by a factor of 7 1 So the mle for 02 is biased we already know 52 is unbiased The mle has a nice property The estimate of a function of 6 equivalent to the value of a function at the estimate of 0 ln notation7 for any function I109 the value of the estimate of the function I109 is h0 That is h0 I109 This is the invariance property of the maximum likelihood estimator lSYE 2028 A and B Lecture 7 Conditional Expectation and Prediction Dr Kobi Abayorni February 10 2009 1 Conditional Expectation again We should recall the de nition of the conditional expectation 7 7 Ex 95PltlYy discrete EltX Y i y 7 f quadi continuous 1 In the case that either 10y7 fy are functions of only y ie when Y 31 9X W discrete EX Y y f mfxflyqdm continuous 2 Y 7 If you are in doubt 7 ie you are trying to solve a problem 7 calculate fy fnydx explicitly 11 Example Let X7 Y N ny W hagdope Find the conditional expectation EXlY Well Which yields So7 the conditional expectation is EXlYy gymyen 0 y 1 But this is just the expectation of an exponential random variable with parameter A y So EXlY y 2 Computing Expectations by Conditioning Recall EX EEXlY the expectation of the conditional expectation is the original7 or unconditioned expectation We saw a proof ofthis in an earlier lecture This fact allows us to compute expectations by conditioning on any random variable which makes the computation easy 21 Example Expectation of a Random Sum Let X17 XN with N a random number and Xi l N Find the expectation of the random N sum 11 Xi Condition on N 22 Example p in N2 is correlation Let XY N2MX7Ly7039 7039327p7 the bivariate normal distribution We7ve de ned the EXY 4 M correlation pr 7 TH Notice that the conditional expectation for the bivariate normal is EXlY y MX py 7 My This implies ElEXYlY yl Ely 15le yl 0X ElyMX piyz 7 WW 0y MXMY PUXUY And note this MXMY PUXUY MXMY i PXY i P UXU39Y 23 Example Mean and Variance of Geometric Distribution Recall if N N G60p then EN i and VarN This can calculated this directly using properties of geometric sums in the exercises Here de ne a new random variable to condition on let Y 1 if the rst Geometric trial is a success7 Y 0 if it is not Then EN ENY 1i1PgtY p11p ENY 0iPgtY 0 1 ENl VH i BM The variance can be derived by conditioning on Y as well7 EN2 EN2lY 1lPY 1 EN2lY 0lP Y 0 p 17 pE1 N2 117 pE2N N2 27P Which yields 3 Conditional Variance Just applying the de nition yields varltXiYgt EltltX 7 EXlY2lY which is just the variance expected squared deviation applied over the conditional pdf Using the expectation of the conditional expectation ElVaTXlYl ElEX2lYl ElEXlY2l EX2 ElEXlY2l 3 Taking the variance of the conditional expectation yields VaTEXlY EEXlY2 lEEXlYl2 ElEXlY2l EON2 4 Adding equations 3 and 47 yields VarEXlY EVarXlY VarX which is a familiar result NB the conditional variance is less than the variance 31 Example Variance of a Random Sum Take X1XN with N random and X l N N EVarZ N mam XN i1 EN VarX EX2VarN 4 Conditional Expectation and Prediction The notion of conditional expectation expectation of a random variable or process given additional information 7 ie another random variable In this class and in almost every other statistics class 7 the conditional expectation is exploited in the regression setup A regression is usually a conditional expectation In a sense we describe or predict random variables by their expectations a regression is a conditional prediction 5 The best predictor Mean Squared Error Criteria Take X Y two random variables and call X the independent7 random variable Y the de pendent7 random variable ie Observe gX Predict X x Y We usually want some gX that yields the best prediction for Y with respect to some criteria One criteria is to minimize the expected squared difference m n9XEY 7 gX2 5 or the Mean Squared Error As it turns out the best for the MSE criteria is gX EYX The claim is that EKY 7 9002l 2 EKY 7 EY002l for any choice of predictor 9X A sketch of the proof goes EKY 7 9002le 7 EKY 7 EY00 EY00 7 9002le which is the usual addsubtract and partition thing we do all the time 7 EKY 7 EY002le EllEY00 7 900l2le MW 7 EYlXElYle 7 900le and we argue that 2EY 7 EYlXEYlX 7 gXlX since EY 7 EYlXlX EYlX i EYlX 0 so EW 7 9002le 7 EW 7 EY002le EllEY00 7 900l2le If we take an expectation of both the left hand and right hand sides again ELHS ERHS7 then ElLHSl 7 EKY 7 9002l and ElRHSl 7 EKY 7 ElYle ElElYle 7 9002l which implies EKY 7 9002l Z EKY 7 EY002l with equality if and only if gX EYlX You can prove this to yourself mincEKY 7 c2 i c EY and we can say 0 EY is the best MSE predictor when there is no X to improve prediction 0 EYlX is the best MSE predictor when an X is available 6 The best linear predictor We have shown examples in previous lectures where imposing a linear constraint on EYlX that the conditional expectation be a linear function 7 ie EYlX a bX 7 implies U 900 MY PXYlY Mx 0X The mean squared error here is 7 EKY 7 MY pxngw 7 mm EKY 7 M pgygmx e Mi 7 2pxng EKY e MyX 7 ma 7 2 2 2 22 UYPXYUY 2P UY HZ1 PXY2 Exercises Let Y X2 and X N U71 1 Find the mean square estimator of Y in terms of X and its mean square error ie EY 7 gX2 Let Y X2 and X N U71 1 Find the linear mean square estimator of Y in terms of X and its mean square error We call an estimator gX of Y unbiased if E YigX Let gX Y X2 be an estimator of 02 Show that it is unbiased Til Z1Xi Let X be exponential with mean 1 Find EXlX gt 1 Let X be Poisson with mean A Let the parameter be an Exponential random variable with mean 1 Show that ll X n 1 Show that EX fo f EXlY ylfyydy You and a good friend ip a coin that is heads with probability p The rst one to obtain a head is declared the winner Call the probability that you who goes rst win fp Do you think that fp is a monotone function of p lncreasing or decreasing What is the value of limZHL fp7 What is the value of limp0 fp7 Find fp lSYE 2028 A and B Spring 2009 Lecture 17 Dr Kobi Abayomi April 25 2009 1 Introduction NonParametric Tests We call X N F9X a parametric distribution for X We have already investigated inference on 0 o Hypothesis Tests H0 0 00 vs Ha t9 7 00 0 Con dence Intervals 9 E i Za385 For Large sample we used the Central Limit Theorm CLT for the sampling distribution of the estimator E In other settings t dist7 F dist we needed the explicit assumption X N NW 02 If the parameter we look for inference on is a quantile7 we can use functions of the order statis tics We call such methods Non Parametric in the sense that the distribution X need only be continuous 2 Con dence interval 0n Quantiles Take X17 Xn FX7 parameter unvailable or unknown7 and the associated order statistics X17 XW Let s look at the probability two order statistics cover a quantile 5p l le lt 522 lt Xm Since lPX lt 51 F p 197 by de nition of the CDF 73971 EDAX139lt 522 lt X05 Z W 7 k 7k 1 n 1 WP p kil n This is the probability that a quantile 57 is between two particular order statistics observed data will yield a con dence interval for the quantile For Example Let X1 X4 N Generate the order statistics X1 X4 471 WXlt1gt lt 55 lt Xlt4gtgt Z k1 4x WW24 875 This means that the Xlt1gt and X4 yield a 875 Con dence Interval for the median In general though non parametric methods are less e cient ie will have greater variance for equal sample size than parametric methods When parametric methods are available use them 3 The Sign Test Remember the X2 test for Goodness of Fit We had a random variable X that could take values in A1 Ak HotllDX 614139 pi VS Ha lP X E 140 pi for at least 1i Let X N Fz and consider the test H0 FE p0 that is we wish to test the location of the poth quantile The chi squared test with k 2 tests against ail alternatives say we want to test H03F5P0 VS HaFEgtpO We need a different test we want to test the t at just a quantile not several jointly and the alternative hypothesis includes the direction of departure Let Y E the number of successes in n independent trials or draws of X A success is a value of Y that is less than 510 Then Y E the number of observation less or equal to E in a sample of X of size n Under H0 Y N Binnp0 Under Ha Y N Binnp7 some other p gt p0 We reject H0 iff Y 2 e such that lPH0Y 2 c oz The power function here is 71 K0 23ng 29 2 S p lt1 yic Again7 p To test against Ha p lt p0 the rejection region is Y lt c to test against p 7 p0 the rejection regions is y lt 01 U y gt 02 Often p0 127 ie we are testing that E is the median Example Take X1X10 N Test H0 F7212 vs HaF72gt12 Let Y E num obs lt 727 say Let the critical region be Y 2 18 Remember that the critical region for rejection of H0 vs a particular Ha sets 04 Then 04 PHoip12Y gt 8 i 05012y1210y 7128 118 To gure out the p E value for a particular observed y say yobs for this hypothesis test p 7 value i 010y12y1210y yyobs This is called the sign test7 since Y E num of positive signs in X1 E E Xn E E If we test vs Ha F lt p0 then Y E num of non Epositive signs in X1 E E Xn E E For the two sided test the de nition of Y is arbitrary either positive or non positive7 since Y N Bin and we reject for large and small values of Y Example Say we observed this data 152270913720716718187207127177 n 11 We seek to test at 04 05 H0 F15 5 versus Ha F18 lt 5 We use Y E num of non Epositive signs in X1 E E Xn E 5 here 5 18 Then y 7 and the p value is p E value 2 llDH03FV5Y gt 7 10 2 Z 05012y1210y z 3438 118 We fail to reject the null hypothesis 4 Wilcoxon Signed Rank Test The signed rank test uses the signs and magnitude of the deviations X 7 5 Assume X N Take X17 Xn Rank lX1l7 anl this yields Rl E rank of the magnitude of Xi For Example for X1 57X2 767X3 71 R1 37R2 3133 1 Thus R1Rn is an arrangement of the rst 71 positive integers Let Z i 71 if Xi lt 0 l T 1 if Xi lt 0 Since lPXl 0 0 we need not concern ourselves with X1 07 either Zl 1 or Zl 71 it does not matter We call the wilcoxon statistic and it turns out under the null hypothesis H0 FE 5 T N01 nn12n16 Example Test H0 55 75 against Ha 3 55 gt 75 at 04 01Say n 18 and we calculate observed w 135 and nn 12n 16 4592 Then 01 PW4592 gt 2376 lPW gt 1068 We reject H0 lSYE 2028 A and B Spring 2009 Lecture 1 Dr Kobi Abayomi January 87 2009 1 Introduction What is Data In statistics we worry about what there is to observe what we expect to see and what we have actually observed what we do in fact see Data are the quantitative characterizations of what we see7 often based upon what we expect or often even desire to see Data are our observations observed and quanti ed We call the population the set of all possible observations We call a sample the observations we see at a glance7 inspection or study A statistic is any quantity we derive from the observed data 7 ie any quantity we can generate from the sample A parameter is any quantity we cannot observe about the data one that is speci c to the population We often seek to estimate parameters using samples from the population 2 Classifying and Observing Data We often call the individual objects described by a set of data call units of obsemation or cases A variable is an object that holds information about the same characteristic for many cases A data table is an arrangement of data 7 the convention is to let rows represent cases and columns represent variables Here is an example of a possible data table ID Name ShoeSize TestScore Classlevel FashionLevel 1 Kobi 11 95 Graduate Low 2 Djleroy 11 100 PhD High 3 Ronald 22 5015 Pre K Very Low 1These are my heuristic de nitions We classify variables as either quantitatiue where the numbers act as numerical values or categorical which are either word or numerals that are treated as non numeric For the above example which variables are which Quantitative data can be either discrete 7 in that we can list all the possible values 7 or continuous in that we cannot A more particular distinction is to say a variable is discrete if it can take countably many values and that a variable is continuous otherwise The distinction is often apparent in use Quantitative variables in which the order in the greater than less than sense and distance be tween data can be determined are called interual variables Percent scores are examples of interval variables Quantitative variables in which the order of data points can be determined but not the distance are called ordinal Examples are letter grades Categorical variables which are determined by categories that cannot be ordered such as gender and color are called nominal ln math notation a data table is a multivariate vector 7 ie matrix 7 x with dimension n z k or nk or n rows and k columns The observations are the rows n the number of measurement types is k the number of columns We select the ith observation of the jth variable with element mm from x In R data are held in arrays Think of an array as the broadest class of matrices having at least two dimensions A data table then is a two dimensional array a matrix In R matrices are restricted to be only numeric Generally we work in R with data table objects called a data frames The data frame is the most general object for holding data in R the convention remains rows as observations and columns as variables For example lnR wormsltread csvfilequotworms csvquot reads in the data frame as a comma delimited file i like this format replace filequotquot with your file name notice the quotquot isdataframeworms checks if the object is a data frame namesworms dim worms returns the column names ie the variable names the number of rows and columns worms 7 5 returns the pH of the soil in the church field observation 7 row attributesworms wormsArea mode wormsArea data frame objects carry attributes we ll use these a bit later 3 Summation Notation We ll need this to work mathematically with data n stum stuff1 stum stuffn 1 i1 is translated as Start with stuffl and add it to Stuffg and keep on adding until stuff The stuff to do n times can be as simple to do as taking a bunch of numbers z1m2 3 and dividing it by the total number ie n 9039 1 x1m2 lt2gt 11 or more complicated like taking each of those numbers subtracting off some other number squaring the result and dividing that by n 7 12 90F2 7171 3 i1 R is very good at doing thing in blocks taking groups of things and doing stuff to each member of the group Here is a long way of interpreting the summation notation in R stuffltC55345678710 lengthstuf f lt length stuf f summlt O fori in 1lengthstuff summlt summstuff i summ 2For now humor me and agree to call the result of the above 5 7 77xbar R has a built in shorter way sumstuff How useful You should familiarize yourself with this notation now Prove to yourself that I m not telling you any lies below 1 21Ci 0 Elli 2Zil1 3i 2295 ELM C 2211 4 Descriptive Techniques Data Tables The rst thing to do with data7 where you can7 is to make a picture Part of being a statistician is using graphical methods to display and interpret data Rudimentary graphical methods are data tables which we use to illustrate data A frequency table is a list of the categories in a categorical variable and the counts or percentage of observations in each category Example Classlevel Count Percent Graduate 2 4 PhD 1 2 Pre K 30 60 What should the counts column in a frequency table always add up to What should the percent column add up to7 always A frequency table is one way of illustrating the distribution of a variable The distribution the complete information about a variable its possible values and the relative frequency of each value R can give you a frequency table real7 real easy table stuff for quantitative as well as categorical data hammltcquottoequot quotfootquot quoteyequot quottailquot quottailquot quotfootquot quotsnoutquot tablehamm R can also give you the percents Again real real easy table stufflengthstuff table hamm lengthhamm We usually rotate these tables when we include them in reports and such A contingency table shows how cases are distributed along each variable contingent on all other variables Let s look at our example Fashion Level Classlevel Low Middle High Graduate 6 4 1 5 1 2 Pre K 30 25 75 How many graduate students have a low fashion level How many Pre K s are well dressed The contingency table can be used to reveal patterns in variables that may be contingent on the category of others In a contingency table the marginal distribution of a variable is the distribution of that variable alone The marginal distributions are displayed in the margins of the table Fashion Level Classlevel Low Middle High Total Graduate 6 4 1 11 PhD 5 1 2 8 Pre K 30 25 75 130 Total 41 30 78 149 The conditional distribution in a contingency table is viewed by looking at one column or one row of the table The remaining distribution of a variable then is conditional on the value of the other variables in that restricted view For illustration Fashion Level Classlevel Low Middle High Total Graduate 4 1 Pre K 25 Total 30 is the distribution of class level after conditioning on the middle fashion level We say that two variables are independent if the conditional distributions of one are the same no matter what we condition on in the second and Vice versa For instance let s look at the marginal distribution of fashion level by class level Fashion Level Classlevel Low Middle High Total Graduate 6 4 1 11 PhD 5 1 8 Pre K 30 25 75 130 Total 41 30 78 149 Notice how we look at the conditional distribution of fashion level by holding class level in the row constant So the conditional distribution is generated by holding the variable we condition on constant Here if we want the conditional distribution of class level we hold fashion level constant by looking at the columns one by one In R we can enter classtableltmatrixc653041251275nrow3 notice we enter the data columnwise classtable recalls the data isdataframeclasstable ctablelt asdataframeclasstable ctable assigns the data to a new object which is a data frame isdataframectable ismatrixctable namesctableltcquotlowquotquotmiddlequotquothighquot ctable rownamesctable rownamesctableltcquotgradquotquotphdquotquotprekquot rownamesctable ctablectablequotgradquotctablequotlowquot we can give data frames row and column names Do the Pre K students seem to be the best dressed Why Does fashion level appear to be independent of class level 5 Continuing with more descriptive techniques Above we began to look at ways of describing data principally via tabular illustration We intro duced the data table with cases on rows and variables on columns and the contingency table with two or more variables at a time Now we ll continue to look at statistical methods methods that depend on functions of the data as a prelude to when we will begin to draw pictures or plots of our observed data 6 Statistical Methods Functions of Data Recall from our de nition of a statistic any quantity that we derive from our observed data It is useful to think of statistics in the way they are used to summarize and condense information 61 Measures of Central Tendency Mean Median ln passing we introduced this function V L V L mil 7 12 n77 ZginZwZ n 7m 4 11 11 Explicitly this E3 is called the sample mean It is the arithmetic average of the observed data Remember that we can call X m1 m2 zn our observations the results of some experiment n is the number of cases or observations And ml can be the shoe size of case 1 for instance It is worth highlighting here the distinction between population and sample The sample mean is an estimate of the population mean Remember that we don t usually see the population entirely we only see a sample We take the arithmetic average of the sample and we use it as an estimate for the unknown true mean of the population By example let s refer to the 77data77 from lecture 1 377x bar77 in words ID Name ShoeSize TestScore Classlevel FashionLevel 1 Kobi 11 95 Graduate Low 2 Djleroy 11 100 PhD High 3 Ronald 22 5015 Pre K Very Low In this example we can think of the population as all statistics students with shoes and fashion 7 we take a sample that is we only record data on the class here in Math 417 during fashion week Let s say our class size n 10 and we get these observations for shoe sizes x 11 11 22 6 8 10 10 9 7 9 The sample mean our estimate for the population mean is 103 How did I get that In R XltC111122681010979 meanx The mean is known as a measure of central tendency or a location measurethe mean tells us where the arithmetic center of the data is Another measure of central tendency is the sample median While the sample mean is the arithmetic center of the distribution of the data the sample median is the 77physical77 center of the data ie the point in the very middle of the distribution We nd the median by 1 Ordering the data from least to greatest 2 When n is odd the median is the point at the 71 position 3 When n is even we take the g and the g 1 points and average them Allow some more notation Call mg mg mm the ordered sample values from least to greatest or the order statistics for the sample Then I can restate how to generate the sample median in a formulaic way 1 Generate 1 2 2 When n is odd the median is the 13L1 observation 2 m n m n 3 When n is even the median is In R XltC111122681010979 medianx What could be easier 62 Measures of Spread Variance Range Quartiles Interquartile Range Measures of spread are also called scale measurements Statistics which measure spread yield information on the dissimilarity or similarity of our observations When scale measurements are large we think 7 on average 7 that there are highly dissimilar bits of data in our observation When scale measurements are small we think 7 on average 7 that our bits of data are quite alike one another The most often estimate of spread in the population is the sample variance also known as the average of squared deviations Let s look at how we calculate it 82 Z1mi 7 EV n 7 1 We call 52 the sample variance4 Notice how the estimate the sample variance depends upon our estimate of the sample mean If I were to read the equation in words I would say The sample variance s squared is the sum of the squared differences or deviations of each observation from the sample mean divided by 11 minus 1 A deviation then is a difference We usually refer to the differences between an estimate such as the sample mean and observed values as deviations Heuristically the sample variance can be seen as an estimate for the average of squared differences We often work with the square root of this estimate the sample standard deviation The sample range is the difference between the maximum and minimum sample values Using the notation we introduced for the order statistics range mm 7 ma 7 Commonly we think of the sample quartiles that is the 25th percentile and the 75thpercentile The interquartile range is the difference between these values or QR zltv75im 7 zltv25im upperquartile 7 lowerquartile 8 The ve number summary is the minimum 0 the maximum mm the quartiles lower 13V253cn and upper zltv75in and the median age 4Notice how all of these sample statistics are lowercase English Characters This is important and we ll talk about it more later on In It XltC111122681010979 rangex rangex 2 rangex 1 orderx orderedxltxorderx orderedx fractionalpartltfunctionxceilingxx fractionalpart15 orderedx16 orderedx25 nltlengthx upperquartilelt fractionalpart75n orderedxceiling75norderedxfloor75norderedxfloor75n upperquartile lowerquartilelt fractionalpart25n orderedxceiling25norderedxfloor25norderedxfloor25n iqrltupperquartilelowerquartile 63 Calculations by hand The summation notation is good to interpret for programming7 or for more theoretical work 7 also perhaps for quick understanding Though you will rarely be asked to calculate statistics by hand7 it s good to have a quick way to do so A simple way is to arrange the data in a columnar fashion with the observations being the rst 10 column and the following columns being the statistics you will be calculating For example let s take our data x of shoe sizes i pg deviations z 7 E squared deviations 7 E2 1 6 43 1849 2 7 33 1089 3 8 2 3 529 4 9 13 169 5 9 13 169 6 10 03 009 7 10 03 009 8 11 7 049 9 11 7 049 10 22 117 13689 Esumn 32sumn71ssi2 7 Graphical Techniques Plotting the Data Remember our de nition for a statistic a function on observed data That is we observe something take obseruations and we provide summaries of it report statistics Remember that we call our observations samples representatives from a population of all possible outcomes Remember statistics like the sample mean or sample uariance are estimates of population parameters which usually we do not know but seek to gain inference about Most graphical techniques or plots of the data can be viewed as extensions of the introductory statistics covered in chapter 1 8 Univariate Graphs The Histogram Let s refer again to our example from above The data table ID Name ShoeSize TestScore Classlevel FashionLevel 1 Kobi 11 95 Graduate Low 2 Djleroy 11 100 PhD High 3 Ronald 22 5015 Pre K Very Low Let s take our observations from lecture 2 for shoe size Remember x 11 11 22 6 8 10 10 9 7 9 are our shoe sizes A frequency table for just the shoe sizes could be xltC111122681010979 table x Remember both are illustrations of the observed frequency distribution or the frequency of each all of the possible outcomes A graph of the observed frequency distribution is called a histogram lnR histx the function to generate a histogram of x histxcolquotredquot shade the bars red histxdensity20labelsTbreaks2 shade the bars with lines 20inch include labels histx density20 labelsTfreqTRUE include relative frequencies as labels helphist you can use help or helpsearch In the frequency tables we listed each of the observed values explicitly though our histograms grouped or binned the data into classes In general a histogram is a graph of binned data Often it is useful to bin data when we don t need the exact distribution or when we wish to highlight a feature of the distribution For example let s say the below frequency table is of test scores from our initial example ShoeSizes 7 0 Lo 3 V c I 3 039 9 LL 0 2 N 1 o l I O 5 1O 15 20 25 30 X Figure 1 A histogram of the shoe size data I used this code hist xdensity10 labelsT breaksc O 10 20 30 colquotredquot mainquotShoeSizesquot would call this graph unimodal7 asymmetric7 and right skewed ShoeSizes v5 007 o39 0 Q 0 LO 0 039 V Q 3 O 396 C I D 039 Q 0 g 002 o39 5 001 o O O 4 o39 0 5 10 15 20 25 30 X Figure 2 Here7 I used this code hist x density10 labelsT freqFALSE breaksc 0 10 20 30 colquotredquot main ShoeSizesquot Notice that the labels above the have changed TestScore Count 30 1 a H HHMMMHHHHMMHHHMHHMH How many rows would the original data table have It is often very useful to bin data for display in both the frequency table and a histogram For example TestScore Count Below 50 9 50 64 10 6 81 90 4 91 100 2 lnR testscoresltc303637374445494949525658595959 6161636567687777778585909092100 histtestscores the simplest case automatic binning histtestscoresfreqFALSE histogram with observed frequenciescounts histtestscoresfreqFALSElabelsTRUE histogram with observed densitiesprobabliities histtestscores breaksc050658090100 R will bin the data if you give the endpoints of the bins histtestscoresbreaks3 R will also bin the data with a suggested number of bins histtestscoresbreaks10 hist testscores breaks30 Notice how the histogram differs with the number and size of bins Notice how you can obscure or highlight features of the observed distribution with binning Notice also for the histograms with a frequency axis that the sum of all the heights is the number of cases Notice that for the histograms with a density axis the sum of the widths of the bins times the heights of each bin sum to 1 81 Descriptions of Histograms We call the most frequent bin or class the modal class A unimoddl histogram has a single mode or peak A bimodal histogram has two modes not necessarily equal in height We call a histogram symmetric if when we draw a vertical line down the center of the histogram the two sides are identical in shape and size A histogram that is at5 is called uniform Here are some examples in R unimodaldataltc 1 2 333345 hist unimodaldata bimodaldataltC1112344445 hist bimodaldata uniformdataltC1122334455 histuniformdata breaksc051525354555 bellshapedltrnorm100 hist bellshaped skewedleftltc bellshaped 15 16 hist skewedleft skewedrightltc bellshaped 15 16 hist skewedright 5which illustrates data that are equally frequent 82 Some things to note Remember that the sample median is at the physical center of a distribution of observed values and the sample mean is at the arithmetic center A symmetric distribution will have the sample median and sample mean relatively close A distribution that is negatively skewed or skewed left will have the sample median greater than the sample mean A distribution that is positively skewed or skewed right will have the sample median less than the sample mean In R medianslltmedianskewedleft meanslltmeanskewedleft hist skewedleft abline vmediansl colquotredquot ablinevmeansl colquotbluequot mediansrltmedianskewedright meansrltmeanskewedright hist skewedright abline vmedi ansr colquotredquot abline vmeansr colquotblue quot We call outliers values that differ greatly from the distribution of the sample data 9 Other Univariate graphs A stem and leaf plot is basically a text histogram A dotplot is a histogram with dots instead of bars6 A borplot is an illustration of the interquartile range median and range of the data lnR stemtestscores 6I ve never used a dotplot lSYE 2028 A and B Lecture 11 Con dence Intervals and Hypothesis Testing Dr Kobi Abayomi March 13 2009 1 A con dence interval is an interval estimate for a population parameter First an interval estimate is a range of possible values If I say 77 is between 8 and 1277 or M 6 812 I am presenting an interval estimate for M 11 An Extraterrestrial Example We set this illustration on another world for emphasis There is what really exists the population and there is what we see the sample We can only suggest a model for the population and then seek to gain insight about it via the data In this example we of course can never see the true population But we can take the time and expense to observe some data and then make a statement about an assumed model for the population A good Biologist goes to Mars and collects a sample of Martians She records how many eyes each martian had in her sample and then the average number of eyes per Martian At a local bar I ask her 77How many eyes does a Martian have77 She responds 77Oh I don7t known for sure My sample average my estimate is 10 Pm 95 percent sure the true mean is between 8 and 12 Between 8 and 12 that7s my 95 percent con dence interval7 This means if she were able to repeat her experiment 100 times 95 out of one 100 times she would generate a interval estimate that covered the true mean Notice that this does not mean that she will get a sample mean of 10 on 95 out of 100 experiments Notice also that this does not mean that her interval estimate will be 812 on 95 out of 100 experiments Notice also that implicit in her answer is the model assumption Martians have eyes and there is an expected number or population mean number of eyes for every Martian Somewhere God who loves the Martians too knows the true number of eyes each Martian has and as well of course the mean number of eyes for all of his Martian children The Biologist in most belief systems cannot know the mind of God However the Biologist does know the sampling distribution and can make an inference about what He knows n from what she saw E It would be expensive to go back to Mars so we use the distribution of the sample mean to make statements about the true unknown population parameter A con dence interval is our interval estimate attached to some probability From one experiment we get one con dence interval which suggests a range of values for the true population parameter We feel pretty good about this Think about this example more if you have to Ask me to draw you a picture I will 12 The general setup Take a random variable X N n 02 and with n very very large Then Y N NW L2 n We make our observed zstatistic and we ask What are the interval limits for n at a given probability This is the same thing as asking after rescaling to Z what are the interval limits for a Z at a given probability say 95 Using our notation this looks like M7196 Zo 196 95 This we know is the answer to the last question from our work with the Normal distribution y M7196 7 U S 196 95 This we know but just substituting for Z0 P67 196gtk07 M73Y 196gtk07 95 This is just algebra l rewrote everything so that we have a statement about the limits to the interval for M So now I just read the last equation in English The probability that the true mean is within 196 times the standard deviation of the sample mean is 95 percent Thus7 the 95 percent con dence interval for the true mean is 196gtkthe standard deviation Notice how we set the con dence level we wanted 95 percent Notice how this set the upper and lower limits for Z to be 196 and 7196 In general7 and 1 7 a level con dence interval for M the population mean is Y t Zag gtk 0397 ln 77ideas the con dence interval always looks something like point estimate i Probability Distribution ptile gtk SD of point estimate With a point estimate and a distribution for it we can usually generate a con dence interval for the parameter we seek to estimate 2 Margin of Error The margin of error the absolute deviation of the observed value of the sample rnean7 E from its expected value7 a the population mean In general 2 7 7 UX ME l illl ZaZ 0 5 l illl Za2 7n12 l7rn just rewriting the inequalities from above as an equality 3 If E is the sample mean then the margin of error for the sample mean is 012 77 We pl ME Za2 If f is the sample proportion then the margin of error for the sample proportion is A pq lp Pl ME Za212 2 21 Sample size calculation for Margin of Error Of course we can rewrite this in terms of n7 the size of the sample 2 2 ZaZUX MEZ 3 For a set 1 7 oz con dence level and a desired margin of error ME we can estimate the number of samples we need Notice the inverse relationship between margin of error and sample size We need to increase samples to decrease margin of error we need more samples if we desire to make a 77tighter77 con dence statement Nota Bene Some textbooks replaces ME with a quantity w Simply7 w 2ME w is the width of the interval In that case we would rewrite equations 123 as margin of error for the sample mean is U381 lt4 LU 2 gtk 20427 Margin of error for the sample proportion w 2 zed2 5 and the required sample size for a xed interval error 2 2 ZaZUX 714 2 6 w 3 Hypothesis Testing the complement to Con dence Intervals 31 Introduction A con dence interval gives us an interval estimate for the population parameter A Hy pothesis Test yields a decision about the value of the population parameter at a desired con dence level Say I believed that the true mean number of eyes of Martians was 10 before I spoke with the Biologist This initial belief7 before observing the sample mean and interval estimate we call a Null Hypothesis we often notate it H0 pronounced H naught or H zero So H0 M 10 If we observe data consistent with our Null Hypothesis we state that accept it That is7 we fail to reject it This is a minimal statement we never really say whether something is true or not we only consider what the data points to On the other hand7 an Alternative Hypothesis7 notated Ha7 is the conclusion we accept when the our data is inconsistent with the null hypothesis Ha M 31 10 is a possible alternative hypothesis Notice that saying M 31 10 is equivalent to saying M gt 10 or M lt 10 This is a two sided alternative A one sided alternative could be HaMgt10 OI HaMlt10 32 The general setup The hypothesis testing procedure is as follows We suppose a xed value for the parameter of interest say no and we set our null hypothesis H0 to be the statement that the true value of the parameter is this xed value H03MM0 We set two sided or one sided alternatives to this hypothesis Ha Hai 7 LO OF Hai gtM0 OF Hai ltM0 We use our test statistic that is the rescaled distribution of the sample esti mate to determine which hypothesis to accept Here we are testing the population mean M The estimate of the population mean is the sample mean E We know the distri bution of the sample mean is E N NW 0277 so we use a Z statistic Z M0 xUzn The Z statistic is the distance in standard normal units of what we observed E from what we expect to observe under the null hypothesis M We divide the deviation fin by standard deviation of our estimate 1 02n Look at the footnote1 We complete the hypothesis test by computing the probability we would have seen such an extreme result in the data under the assumptions of the null hypothesis This is called the p value llDObseried Test Statistic given H0 is true observed valueiexpected value std d 39 1Have I said this before Test statistics have a general form of test statistic 81L of estimate If the p value is bigger than the signi cance level for the test a we cannot reject the null hypothesis This is actually the way we say itl If the p value is smaller than 04 we reject the null hypothesis in favor of the alternative Wordy but ef ciently minimal statements that rely on the distribution of the sample estimate which by the power of the central limit theorem tends to be Normal More about this in the next section 4 Tails 1 and 2 Notice that we can see the test statistic as the magnitude of the deviation of our sample from the null hypothesis H0 This deviation measured by the magnitude ofthe test statistic need not be towards the alternative hypothesis in question 41 2 tailed test Say we have a hypothesis test HouuovsHaM7 Mo A sample of data is taken 1zn of which we take the sample mean E We construct the appropriate test statistic from the distribution of the estimate Here of course the sample mean is distributed NW 72 Our test statistic n then is the rescaling of the sample mean and thus has the distribution N0 1 In this setup where the alternative hypothesis is twosided any large deviation from the null hypothesis is evidence for the alternative Thus a large magnitude test statistic 20 positive or negative is evidence for rejection of the null hypothesis In this the two sided hypothesis test the p value is MZZ l20l since we look at both sides of the distribution for evidence against the null hypothesis 42 1 tailed test In the 1 tailed test the alternative hypothesis considers only departures from the null in one direction These hypothesis tests Ho M MovsHa M gt M0 01 H0nnovsHanltno are examples of 1 tailed tests For each of the tests7 only a large deviation 7 ie a large rnagnitude test statistic 7 in the direction of the alternative hypothesis is evidence against the null A large test statistic not in the direction of the alternative is not evidence against the null hypothesis here Let7s look at this 43 Example Rep Ren A ngogetirn is concerned that referendurn 7777 on increasing congressional salaries7 will not pass in his district A referendurn in his district fails if garners less than 6 of the vote Here is the hypothesis test of whether the referendum will fail H0 The proportion of voters who approve of referendurn 777 is 6 p 6 Ha The proportion of voters is less than 6 p lt 6 Notice here l7rn using a one sided test At a signi cance level 04 057 ie a con dence level of 957 we will reject H0 if and only if llDObsemed Test Stattstte7 gtven H0 ts true 3 05 since 04 05 Here7 since 04 057 we can set llDZ 3 2a 05 and look up in a book or use a computer to discern that we will reject H0 if and only if Zobsemed S 20 and here 205 7164 so we will reject when zobsemed S 7164 We survey 71 225 people7 147 of which state their support of 777 The test statistic is i ip 7 H n Z Here we calculate 20 m 165 165 is not less than 7164 20 f 2a thus we must accept the null hypothesis There is no evidence to suggest that the true population proportion is less than 6 Sen Boxan D Duck7 on the other hand7 is worried about the referendum 777 and tests whether it might pass Her hypothesis test is H0 The proportion of voters who approve of referendurn 777 is 6 p 6 Ha The proportion of voters is greater than 6 p gt 6 Notice here this is still a one sided test At a signi cance level 04 057 ie a con dence level of 957 we will reject H0 if and only if zobmmd 2 164 We already know that our observed 20 165 so here we reject the null hypothesis in favor of the alternative We conclude there is evidence that the referendum will pass Lastly7 Gov Dave Gravis7 a former statistician and currently a notorious political hack7 conducts a two sided test to determine the true support of referendurn 777 This will dictate which side of the issue he should publicly advocate His hypothesis tests are H0 The proportion of voters who approve of referendurn 777 is 6 p 6 Ha The proportion of voters is greater than 6 p 31 6 Here7 he7ll reject the null hypothesis only if lzol 2 1962 The observed 20 165 He concludes that there is not enough evidence to reject the null hypothesis and decides to keep his mouth shut 5 De nitions Testing Errors 51 Pvalue We7ve already de ned the pvalue as the probability that a test statistic would be more extreme if the null hypothesis H0 is true Say we conduct a 1 tailed test7 using 20 as our test statistic Then7 our p value is llDZ gt 20 01 llDZ lt 20 depending if our alternative hypothesis looks for deviations above or below7 respectively Notice that in the rst case 20 is often positive we look for large positive deviations from the null In the second 20 is often negative we look for large negative deviations from the null For a 2 tailed test7 again using 20 as our test statistic7 the p value is M2 gt W Here7 any large deviation for the null is evidence for the alternative hypothesis In general7 when the p value is small we reject the null hypothesis We can also say that the p value is the probability the test statistic is in the rejection region For a 1 tailed test7 the rejection region is far above or below the null hypothesis 7 determined by the signi cance level of the test For a 2 tailed test7 the rejection region is both above and below the null hypothesis deviations in either direction are cause for rejection of the null hypothesis ZWhy 52 oz 5 and testing errors We have been calling oz the signi cance level 1 7 oz is the familiar con dence level I want to point out that oz lll HO A test statistic is in rejection region 7 That7s why when the p 7 value lt oz we reject the null hypothesis To restate7 when the probability any test statistic is in the rejection region is greater than the probability of our observed statistic 7 we reject the null hypothesis On the other hand 6 lP Ha A test statistic is NOT in rejection region 8 Notice that both oz and B are probabilities of making a hypothesis testing error 7 choosing the wrong hypothesis when the other is true oz is commonly called type 1 error a rejection of the null when it is in fact true 6 is commonly called type 2 error a selection of the null when it is in fact false In these hypothesis testing setups you have but two choices and there are but four possible outcomes You make the right choice 27 the correct decision 27 and what actually happens 2 gtk 2 4 Below7 is the same sort of table you would see in your text The Truth Your Decision H0 H2 H0 Con dence Level 6 Ha oz Power 53 Power calculations We call 1 7 B the Power of a test It is the probability that we will 7pick up7 a deviation from the null when in fact we should It is the probability that we select Ha when it is true We usually set 6 as how often we are willing to not choose the alternative when we should We can calculate B for particular deviations of from the null hypothesis Then 1 7 B is the power of the test for that deviation Your textbook works out power calculations for several setups You may nd it useful to 11 use these formulae In general though7 you need only understand equations 77 8 and the distribution of the test statistic under the 77correct hypothesis used to calculate probabilities ln general7 again7 you will have either a 2 or 1 sided hypothesis test HouuovsHaLMo is the 2 sided setup with rejection region for a xed a all lZl gt 2042 while H0MuovsHapgtpoORHaMltM0 is the 1 sided setup with rejection region for a xed a all Z gt 2a or Z lt 2a What you need for a power calculation is the probability of not rejecting ie the probability of not being in the rejection region under an alternative M M Remember that the rejection region is set by 04 and refers to the value under the null hypothesis 0 Under H0 you would construct Z0 and expect it to have mean zero7 variance one But it doesnt lnstead EZ0 U 1p 7 no 31 0 when M M But you still need to use the table7 so you still need a test statistic with a mean zero and variance one So you construct a Zgtk such that EZgtk 0 by 2 Z0 Now EZ Um lw 7 0 no 7 M 0 You just set 20 2012 or 20 20 as appropriate I and use 2 2a and the ordinary standard normal table 6 A few simple examples in R Read in the lectureZand126ampledata t le And take a look at it It is the cost of accidents for a certain type of car from a sample of 500 cars in two years 1991 and 1992 61 Part a To investigate whether there is suf cient evidence that the accident rate is lower in cars with ABS you are given data from two years 1991 and 1992 on accident rates In 19917 the year before ABS became available on a certain car call it the Celico 42 accidents were recorded out of 500 drivers In 1992 the accident rate was 38 accidents out of 500 drivers Test is ABS made a difference in the accident rate A possible hypothesis test H0 3 Ap pNoABS PABS 0 vs H1 3 Ap pNoABS PABS gt 0 We will use sample estimates f for population proportion7 pl E proportion of cars in group 239 involved in an accident Here 239 E ABSNOABS only zhni are the number of accidents and total number of cars7 respectively This is a test for difference of proportions a we use a Z test making the assumption that the population proportions are normally distributed and independent Recall the general format for a Z statistic Yw Z i 86X V where 56anythmg Varanythmg Under the null hypothesis Ap pNoABS pABS 0 NoABsU NoABs Z3ABS1 ABs nNoABs ABS V07 Ap V07 PN0ABS pABS In this case7 under H07 the null hypothesis7 pNOABS pABS so the observed test statistic will look like NoABs 13AM 131 13X 7 7 nNoABs nABS gt 42500 1 0084 pNoABS gt 38500 1 0076 pABS gt 4238500500 1 008 pooled p gt 142500 1 0916 qNoABS gt 138500 1 0924 qABS gt 4238500500142385005001500 1500 1 00002944 Variance of the difference estimator gt 084 076sqrt0002944 1 04662524 z stat observed gt pnorm084 076sqrt0002944lowertailFALSE 1 03205174 the above is the p value of the observed z stat gt qnorm95 1 1644854 the above is the z critical at alpha05 for the one tailed test The p Value for the one tailed test is well above a reasonable say 05 alpha level We fail to reject the null hypothesis In the context of the narrative there is not enough evidence to concluded that the proportion of accidents is lower for cars equipped with ABS part b of this exercise Will be in lecture 12 7 Exercises 0 Read section 106 in your book pages 341 342 0 Do Exercises 947 987 911 on pages 285 286 14