Mathematical Statistics STAT 709
Popular in Course
Popular in Statistics
Mrs. Triston Collier
verified elite notetaker
This 103 page Class Notes was uploaded by Mrs. Triston Collier on Thursday September 17, 2015. The Class Notes belongs to STAT 709 at University of Wisconsin - Madison taught by Staff in Fall. Since its upload, it has received 58 views. For similar materials see /class/205087/stat-709-university-of-wisconsin-madison in Statistics at University of Wisconsin - Madison.
Reviews for Mathematical Statistics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/17/15
Lecture 30 UMVUE the method of conditioning The 2nd method of deriving a UMVUE is conditioning on a suf cient and complete statistic TX7 ie if UX is any unbiased estimator of19 then EUXlT is the UMVUE of19 We do not need the distribution of T But we need to work out the conditional expectation EUXlT From the uniqueness of the UMVUE it does not matter which UX is used Thus we should choose UX so as to make the calculation of EUX lT as easy as possible Example 33 Let X1 Xn be iid from the exponential distribution E00 F9x 1 7 e m9I0oox Consider the estimation of 19 1 7 F9t X is suf cient and complete for 6 gt 0 tooX1 is unbiased for 19 ElItooX1l PX1 gt t 19 Hence TX ElItooX1le PX1 gt th is the UMVUE of 19 If the conditional distribution of X1 given X is available then we can calculate PX1 gt th directly By Basu7s theorem Theorem 24 XlX and X are independent By Proposition 110vii PX1 gt a z PX1X gt tXlX z PX1X gt tz To compute this unconditional probability we need the distribution of X1XX1 X1Xgt Using the transformation technique discussed in 131 and the fact that 22 X is inde pendent of X1 and has a gamma distribution we obtain that X1 221 X has the Lebesgue pdf n 7 11 7 x 2101x Hence PX1 gt M z n 71t1M1 Wad lt1 Lyn TX lt17 yil and the UMVUE oil is Example 34 Let X17 7Xn be iid frorn NM702 with unknown M E R and 172 gt 0 From Example 2187 T X7 52 is suf cient and complete for 19 M7 02 X and n 7 1SZ02 are independent X has the NM702n distribution 52 has the chi square distribution xiil Using the method of solving for h directly7 we nd that the UMVUE for M is X the UMVUE of M2 is X2 7 5271 the UMVUE for UT with r gt 1 7 n is knAJSC where i nT2Pn2 4 2121 lt r m and the UMVUE of MU is knililXS7 if n gt 2 Suppose that 19 satis es PX1 S 19 p with a xed p E 071 Let ltIgt be the cdf of the standard normal distribution Then 19 M 01711 and its UMVUE is X kn11Sltlgt 1p Let c be a xed constant and 19 PX1 S c ltIgt We can nd the UMVUE of 19 using the method of conditioning Since IOocX1 is an unbiased estimator of 197 the UMVUE of 19 is ElI7oocX1lTl PX1 S WT By Basu7s theorern7 the ancillary statistic ZX X17X S is independent ofT X7 52 Then7 by Proposition 110vii7 P X1 clT 2752 P z 3 7 92752 PltZC gt It can be shown that Z has the Lebesgue pdf fz WP 771 1 7 22 Til2H MW 7 1r 17 n 71 Hence the UMVUE of 19 is I0n71Elzl HOs mxgd fwm 7n71 Suppose that we would like to estimate 19 ii the Lebesgue pdf of X1 evaluated at a xed 07 where lt13quot is the rst order derivative of ltlgt 2 By the previous result7 the conditional pdf of X1 given X i and 2 52 is 5 1f Let fT be the joint pdf of T X7 52 Then 7 1flt1gtmgtdtElam Hence the UMVUE of 19 is 7 1 f c 7 X S S 39 Example Let X17 7Xn be iid with Lebesgue pdf few 19x 21900x7 where 19 gt 0 is unknown Suppose that 19 PX1 gt t for a constant It gt 0 The smallest order statistic X1 is suf cient and complete for 19 Hence7 the UMVUE of 19 is PX1 gt th1 PX1 gtth1 MD X1 t P 7 gt 7 X z X0 X0 1 1 X1 t P 7 gt 7 X z X0 951 1 1 X1 P 7 gt 5 X0 gt Basu7s theorern7 where s t1 lf 5 S 17 this probability is 1 Consider s gt 1 and assume 6 1 in the calculation X1 X1 gt P 7 gt s P 7 gt 57X Xi X0 gt X0 1 1L 1 H 72dz1 dxn w1gtszmw2gtzmwzn71gtzni1 1 n71OO 1dz 1dx 7 1 9 12 m 1 1 1 n711007dxn n71 51 71 7 gt SXn7X2 gt n7 7Xn1 gt This shows that the UMVUE of PX1 gt t is 1X1 X n7 1 ltt 1 X1 Z t Another way of showing hX1 is the UMVUE Note that the Lebesgue pdf of X1 is 716 Wlwpo If 6 lt t7 EWXmN 9 he 9 dz zn1 t n 71z 716 00 n6 7 9 nt n1dt xn d 6 6 6 tgnil tn tn 9 t If 6 2 t7 then PX1 gtt 1 and hX1 1 as P9 since Pt gt Xm 0 Hence7 for any 6 gt 07 gt t Lecture 8 Conditional expectation Conditional probability PBfA PA BPA for events A and B with PA gt 0 PX 6 BD 6 A PX e BlY y De nition 16 Let X be an integrable random variable on 977 P i Let A be a sub o eld of f The conditional empectation of X given A7 denoted by EXA7 is the as unique random variable satisfying the following two conditions a EXA is measurable from 97A to R713 b A EXAdP AXdP for any A E A Note that the existence of EXA follows from Theorem 14 ii Let B E f The conditional probability of B given A is de ned to be PBlA EIBA iii Let Y be measurable from 977 P to A79 The conditional expectation of X given Y is de ned to be EXY EXUY oY contains the information in Y77 EXY is the expectation of X given the information provided by Y Lemma 12 Let Y be measurable from 977 to A79 and Z a function from 977 to 73 Then Z is measurable from 970Y to 72126 if and only if there is a measurable function h from A79 to 721213 such that Z h 0 Y The function h in EXY h 0 Y is a Borel function on A79 Let y E A We de ne EXY y W to be the conditional expectation of X given Y Note that My is a function on A7 whereas h 0 Y EXY is a function on 9 For a random vector X7 EXA is de ned as the vector of conditional expectations of components of X Example 121 Let X be an integrable random variable on Q7 77 P7 Al7 Ag7 be disjoint events on 977 P such that UAZ Q and PAi gt 0 for all i7 and let 117027 be distinct real numbers De ne Y 111141 aZIA2 We now show that fAXdP E XY E 72 I i i1 AZ We need to verify a and b in De nition 16 with A oY Since oY 0Al7 Ag7 it is clear that the function on the right hand side is measurable on 970Y For any B E 37 Y 1B Ui3aiEBAi Using properties of integrals7 we obtain that XdP XdP Y1B Z A 11263 7 0 AiXdP I 1 gipmi PltAmy 3 7 0 AiXdP AWB Pm 1 dP39 This veri es b and thus the result Let h be a Borel function on R satisfying Mai fAi XdPPAi Then EXlY h oY and EXlY y Proposition 19 Let X be a random n vector and Y a random m vector Suppose that X7 Y has ajoint pdf ay wrt V gtlt A7 where V and A are U nite measures on 72quot7 B and 727quot71 m7 respectively Let 9x7y be a Borel function on 727 for which ElgX7 Y lt 0039 The f lt Yf m lt gt g 7 L V s EgXYlY i W as Proof Denote the right hand side by hY By Fubini7s theorem7 h is Borel Then7 by Lemma 127 hY is Borel on 970Y Also7 by Fubini7s theorem7 fyy ff7ydVz is the pdf of Y wrt A For B 6 8m Y71BhYdPB hydPy W 73 fflt 7ydyz fYydAy d A WB 9967yf967y V X dP WB 995711 xy X Y dP Y71B 7 7 where the rst and the last equalities follow from Theorem 127 the second and the next to last equalities follow from the de nition of h and pdf7s7 and the third equality follows from Theorem 13 Fubini7s theorem X7 Y a random vector with a joint pdf ay wrt V gtlt A The conditional pdf of X given Y y fX yzly fxyfyy fyy ff7ydVz is the marginal pdf of Y wrt A For each xed y with fyy gt 07 fx yly is a pdf wrt V Then Proposition 19 states that EigltXYgtm 97YfxiylydV ie7 the conditional expectation of gX7 Y given Y is equal to the expectation of gX7 Y wrt the conditional pdf of X given Y Properties Proposition 110 Let X7 Y7 X17X27 be integrable random variables on 977713 and A be a sub U eld of f i If X c as7 c E 727 then EXlA c as ii If X S Y as7 then EXl4 S EYlA as iii If a E R and b E 727 then EaX bYlA aEXl4 bEYlA as iv EX v EEXlAlel EXl40 EEXlel4l as7 where A0 is a sub U eld of A vi lf UY C A and ElXYl lt 007 then EXYlA YEXlA as vii If X and Y are independent and ElgX7Yl lt 00 for a Borel function 97 then El9X7YlY 7 yl 7 El9X7yl aS Py viii lf EX2 lt 007 then EXl42 S EX2l4 as ix Fatou7s lemma lf Xn 2 0 for any 717 then E liminfn anA S liminfn Ean4 as X Dominated convergence theorem Suppose that anl S Y for any n and Xn 7 X Then EltXnmgt 7 EXlA Example 122 Let X be a random variable on 977 P with EX2 lt 00 and let Y be a measurable function from 977 P to A79 One may wish to predict the value of X based on an observed value of Y Let gY be a predictor7 ie7 g E N all Borel functions 9 with EgY2 lt 00 Each predictor is assessed by the mean squared prediction error77 EX 7 gY2 We now show that EXlY is the best predictor of X in the sense that EX 7 EXlY2 mi 1EX 7 gY2 ye First7 Proposition 110viii implies EXlY E N Next7 for any 9 6 N7 ElX 7 9Yl2 ElX 7 EXlY EXlY 7 9Yl2 ElX 7 EXlYl2 ElEXlY 7 9Yl2 2ElX 7 EXlYllEXlY 7 9Yl ElX 7 EXlYl2 ElEXlY 7 9Yl2 2EElX 7 EXlYllEXlY 7 9YllY ElX 7 EXlYl2 ElEXlY 7 9Yl2 2EEXlY 7 YEX 7 EXlY ElX 7 EXlYl2 ElEXlY 7 9Yl Z ElX 7 EXlYl27 where the third equality follows from Proposition 110iv7 the fourth equality follows from Proposition 110vi7 and the last equality follows from Proposition 110i7 iii7 and vi Lecture 2 Product measure measurable function and distribution Product space I 17 7k7 k is nite or 00 D7 2 6 L are sets 111611 P1 gtlt gtlt Pk a177ak al 6 D7261 RgtltRR27RgtltRgtltRR3 Let 9173 2 6 L be measurable spaces Hid is not necessarily a o eld o Hid is called the product o eld on the product space Hid 9i HiEI Q v T HiEI is denOted by Hie iv Example Hi1vkR7 8 W7 Bk Product measure Consider a rectangle abbl gtlt a27b2 C R2 The usual area of abbl gtlt a27b2 is 51 a0092 12 mal7bllm027 bzl ls ma17b1ma27b2 the same as the value of a measure de ned on the product o eld A measure V on 977 is said to be o rtz39te if and only if there exists a sequence ALAZ7 such that UAZ Q and VAi lt 00 for all2 Any nite measure such as a probability measure is clearly o nite The Lebesgue measure on R is o nite7 since R UAn with An 7727727 n 17 27 The counting measure in is o nite if and only if Q is countable Proposition 13 Product measure theorem Let 9177 V07 2 17 7 k7 be measure spaces with o nite measures7 where k 2 2 is an integer Then there exists a unique o nite measure on the product o eld 0f1 gtlt gtlt 1707 called the product measure and denoted by V1 gtlt gtlt V such that V1 gtlt gtlt ykA1 gtlt gtlt Ak V1A1VkAk for all Ai E E7 2 17 quotWk Let P be a probability measure on Rk78k The cdf or joint cdf of P is de ned by Fp17 P7oo7p1 gtlt gtlt foowkl7 xi 6 R There is a one to one correspondence between probability measures and joint cdf7s on Rk lf Fx17 is a joint cdf7 then F177i177i177k lim mjaooj1wV27121mk is a cdf and is called the 2th marginal cdf Marginal cdf7s are determined by their joint cdf But a joint cdf cannot be determined by k marginal cdf7s Fx177xk F11 177xk 6 RR then the probability measure corresponding to F is the product measure P1 gtlt gtlt Pk with P1 being the probability measure corresponding to F1 Measurable function f a function from Q to A often A 72k lnverse image of B C A under f fAWU BW 9i WEB The inverse function f 1 need not exist for f 1B to be de ned f 1Bc f 1Bc for any B c A f 1UBZ Uf 1BZ for any Bl C A7z391727 Let C be a collection of subsets of A De ne f 1C f 1C C E C De nition 13 Let 977 and A79 be measurable spaces and f a function from Q to A The function f is called a measurable function from 977 to A79 if and only if f 1g C f If f is measurable from 977 to A797 then f 1g is a sub U eld of f verify It is called the U eld generated by f and is denoted by 0f If f is measurable from 977 to R7637 it is called a Borel function or a random variable A random vector X1L7 7Xn is measurable from 977 to 7328 each Xi is a random variable Examples If f is the collection of all subsets of 97 then any function f is measurable lndicator function for A C Q 1 wEA IAW 0 ngA For any B CR7 0 0 B71 ZB A 0 B71 B 11113 f A0 0 B71 B Q 0 B71 B Then7 0IA 07A7A07Q and IA is Borel iff A E f 0f is much simpler than f Simple function k ltPM 26th i1 where A17 7Ak are measurable sets on Q and 117 7ak are real numbers Let A17 7Ak be a partition of Q7 ie7 As are disjoint and A1 U U Ak 9 Then the simple function p with distinct as exactly characterizes this partition and Ultp 0A17 7Ak Proposition 14 Let 97 be a measurable space i f is Borel if and only if f 1a7oo E f for all a E R ii If f and g are Borel7 then so are fg and af bg7 where a and b are real numbers also7 fg is Borel provided 9a 31 0 for any in E 9 iii If fh f27 are Borel7 then so are supn fn7 infn fn7 limsupn fn7 and liminfn f Further more7 the set A w E Q fnw exists is an event and the function limiH00 fnw w E A f1w w Z A hw is Borel iv Suppose that f is measurable from 977 to A79 and g is measurable from A79 to A771 Then the composite function g o f is measurable from 977 to A771 v Let Q be a Borel set in 72 If f is a continuous function from Q to RR then f is measurable Distribution law Let Q7 77 V be a measure space and f be a measurable function from 97 f to A7 9 The induced measure by f7 denoted by V o f l7 is a measure on 9 de ned as uof 1B w e B V Wm B e g If V P is a probability measure and X is a random variable or a random vector7 then P o X 1 is called the law or the distribution of X and is denoted by PX The cdf of PX is also called the cdf or joint cdf of X and is denoted by FX Examples 13 and 14 Lecture 23 Suf ciency and Rao Blackwell theorem unbiasedness and invariance Suppose that we have a suf cient statistic TX for P E 73 lntuitively7 our decision rule should be a function of T This is not true in general7 but the following result indicates that this is true if randomized decision rules are allowed Proposition 22 Suppose that A is a subset of 73 Let TX be a suf cient statistic for P E 73 and let 60 be a decision rule Then 61P7A tl7 which is a randomized decision rule depending only on T7 is equivalent to 60 if R50P lt 00 for any P E 73 Proof Note that 61 is a decision rule since 61 does not depend on the unknown P by the suf ciency of T Then R51 P E A LP ad61X 0 EE LPad60xa T E LP ad60X 0 R50P7 where the proof of the second equality is left to the reader Note that Proposition 22 does not imply that 60 is inadmissible If 60 is a nonrandomized rule7 WA ElIAlt6oltXgtgtlT t Plt60ltXgt e AlT t is still a randomized rule7 unless 60X hTX as P for some Borel function h Exercise 75 Hence7 Proposition 22 does not apply to situations where randomized rules are not allowed The following result tells us when nonrandomized rules are all we need and when decision rules that are not functions of suf cient statistics are inadmissible Theorem 25 Suppose that A is a convex subset of 72k and that for any P E 73 LPa is a convex function of a i Let 6 be a randomized rule satisfying A Halld6z7a lt 00 for any x E X and let T1z A ad6z7 1 Then LPT1 S LP7 6 z or LP7 T1z ltLP57 z if L is strictly convex in a for any zEX and PEP ii Rae Blackwell theorem Let T be a suf cient statistic for P E 737 To 6 72k be a nonrandomized rule satisfying EHTOH lt 007 and T1 ET0XlT Then RT1P S RTOP 1 for any P E P If L is strictly convex in a and To is not a function of T then To is inadmissible The proof of Theorem 25 is an application of Jensen7s inequality and is left to the reader The concept of admissibility helps us to eliminate some decision rules However usually there are still too many rules left after the elimination of some rules ac cording to admissibility and suf ciency Although one is typically interested in a S optimal rule frequently it does not exist if S is either too large or too small Example 222 Let X1 X be iid random variables from a population P E P that is the family of populations having nite mean 1 and variance 02 Consider the estimation of M A R under the squared error loss It can be shown that if we let S be the class of all possible estimators then there is no S optimal rule exercise Next let 31 be the class of all linear functions in X X1 X ie TX ELI ciXl with known 0 E R 239 1 n Then RTP 11 crif rz c 1 We now show that there does not exist T1 ELI ciXl such that RTP S RTP for any P E P and T 6 91 If there is such a T1 then 0102 is a minimum of the function of 01cn on the right hand side of Then 0102 must be the same and equal to pzUz npz which depends on P Hence T1 is not a statistic This shows that there is no Sl optimal rule Consider now a subclass 32 C 31 with cs satisfying EL 0 1 From 1 RTP 02 EL 02 if T E 32 Minimizing 02 EL 02 subject to EL 0 1 leads to an optimal solution of c n l Thus the sample mean X is Sg optimal There may not be any optimal rule if we consider a small class of decision rules For example if 33 contains all the rules in 32 except X then one can show that there is no Sg optimal rule Example 223 Assume that the sample X has the binomial distribution Bz396n with an unknown 6 E 01 and a xed integer n gt 1 Consider the hypothesis testing problem described in Example 220 with H0 6 E 060 versus H1 6 E 601 where 60 E 01 is a xed value Suppose that we are only interested in the following class of nonrandomized decision rules 8 j 01n 71 where le X From Example 220 the risk function for under the 0 1 loss is 7 BTW PX gt j1oeo9 PX S j19o19 For any integers k and j 0 S k lt j S n 71 7PkltXjlt0 0lt6 60 BTW RTiw PkltXjgt0 190lt19lt1 7 Hence neither nor Tk is better than the other This shows that every is S admissible and thus there is no S optimal rule In view of the fact that an optimal rule often does not exist statisticians adopt the following two approaches to choose a decision rule The rst approach is to de ne a class 3 of decision rules that have some desirable properties statistical andor nonstatistical and then try to nd the best rule in 3 ln Example 222 for instance any estimator T in 32 has the property that T is linear in X and ETX M In a general estimation problem we can use the following concept De nition 28 Unbiasedness In an estimation problem the bias of an estimator TX of a real valued parameter 19 of the unknown population is de ned to be bTP ETX 7 19 which is denoted by bT6 when P is in a parametric family indexed by 0 An estimator TX is said to be unbiased for 19 if and only if bTP 0 for any P E P Thus 32 in Example 222 is the class of unbiased estimators linear in X In Chapter 3 we discuss how to nd a S optimal estimator when S is the class of unbiased estimators or unbiased estimators linear in X Another class of decision rules can be de ned after we introduce the concept of invaiiance De nition 29 Let X be a sample from P E P i A class Q of one to one transformations of X is called a group if and only if g E 9 implies 91092 E Q and 971 E 9 ii We say that P is invaiiant under 9 if and only if PX P900 is a one to one trans formation from P onto P for each 9 E 9 iii A decision problem is said to be invaiiant if and only if P is invariant under 9 and the loss LP a is invariant in the sense that for every 9 E g and every 1 E A there exists a unique 9a E A such that LPXa L P9Xgagt Note that gX and 9a are different functions in general iv A decision rule T is said to be invaiiant if and only if for every 9 E g and every 96 6 X7 T996 9T96 lnvariance means that our decision is not affected by one to one transformations of data In a problem where the distribution of X is in a location scale family P on 7219 we often consider location scale transformations of data X of the form gX AX c where c E C C Rk and A E T a class of invertible k gtlt k matrices ln 42 and 63 we discuss the problem of nding a S optimal rule when S is a class of invariant decision rules Lecture 26 Asymptotic approach and consistency Asymptotic approach In decision theory and inference a key to the success of nding a good decision rule or inference procedure is being able to nd some moments andor distributions of various statistics There are many cases in which we are not able to nd exactly the moments or distributions of given statistics especially when the problem is complext When the sample size n is large we may approximate the moments and distributions of statistics that are impossible to derive using the asymptotic tools discussed in 15 In an asymptotic analysis we consider a sample X X1Xn not for xed 71 but as a member of a sequence corresponding to n 710710 1 and obtain the limit of the distribution of an appropriately normalized statistic or variable TX as n a 00 The limiting distribution and its moments are used as approximations to the distribution and moments of TX in the situation with a large but actually nite n This leads to some asymptotic statistical procedures and asymptotic criteria for assessing their performances The asymptotic approach is not only applied to the situation where no exact method is avail able but also used to provide an inference procedure simpler eg in terms of computation than that produced by the exact approach the approach considering a xed In addition to providing more theoretical results andor simpler inference procedures the asymptotic approach requires less stringent mathematical assumptions than does the exact approach The mathematical precision of the optimality results obtained in statistical decision theory tends to obscure the fact that these results are approximations in view of the approximate nature of the assumed models and loss functions As the sample size increases the statistical properties become less dependent on the loss functions and models A major weakness of the asymptotic approach is that typically no good estimates for the precision of the approximations are available and therefore we cannot determine whether a particular 71 in a problem is large enough to safely apply the asymptotic results To overcome this dif culty asymptotic results are frequently used in combination with some numericalempirical studies for selected values of n to examine the nite sample performance of asymptotic procedures Consistency A reasonable point estimator is expected to perform better at least on the average if more information about the unknown population is available With a xed model assumption and sampling plan more data larger sample size 71 provide more information about the unknown population Thus it is distasteful to use a point estimator Tn which if sampling were to continue indef initely could possibly have a nonzero estimation error although the estimation error of Tn for a xed n may never equal 0 De nition 210 Consistency of point estimators Let X X1 Xn be a sample from P E P and TX be a point estimator of 19 for every n i TX is called consistent for 19 if and only if TX a 19 wrt any P E P ii Let an be a sequence of positive constants diverging to 00 TX is called a consistent for 19 if and only if anTnX 7 19 0171 wrt any P E P iii TX is called strongly consistent for 19 if and only if TX a 19 wrt any P E P iv TX is called LT consistent for 19 if and only if TX L 19 wrt any P E P for some xed 7 gt 0 Consistency is actually a concept relating to a sequence of estimators Tm n no n0 1 but we usually just say consistency of T7 for simplicity Each of the four types of consistency in De nition 210 describes the convergence of TX to 19 in some sense as n a 00 In statistics consistency according to De nition 210i which is sometimes called weak con sistency since it is implied by any of the other three types of consistency is the most useful concept of convergence of Tn to 19 Lg consistency is also called consistency in mse which is the most useful type of LT consistency Example 233 Let X1 X be iid from P E P If 19 M which is the mean of P and is assumed to be nite then by the SLLN Theorem 113 the sample mean X is strongly consistent for M and therefore is also consistent for 11 If we further assume that the variance of P is nite then X is consistent in mse and is consistent With the nite variance assumption the sample variance 52 is strongly consistent for the variance of P according to the SLLN Consider estimators of the form Tn ELI cmXi where is a double array of constants If P has a nite variance then Tn is consistent in mse if and only if ELI c a 1 and 291 Ufa 0 lf we only assume the existence of the mean of P then Tn with cm cin satisfying n l ELI c 1 and sup lt 00 is strongly consistent Theorem 113ii One or a combination of the law of large numbers the CLT Slutsky7s theorem Theorem 111 and the continuous mapping theorem Theorems 110 and 112 are typically applied to establish consistency of point estimators In particular Theorem 110 implies that if Tn is strongly consistent for 19 and g is a continuous function of 19 then 9Tn is strongly consistent for g19 For example in Example 233 the point estimator X2 is strongly consistent for 112 To show that X2 is n consistent under the assumption that P has a nite variance 172 we can use the identity V730 7 M2 WltX 7 MX M and the fact that X is consistent for M and X M Op1 X2 may not be consistent in mse since we do not assume that P has a nite fourth moment Alternatively7 we can use the fact that V5022 7 2 7d N074u202 by the CLT and Theorem 112 to show the consistency of X2 The following example shows another way to establish consistency of some point estimators Example 234 Let X177Xn be iid from an unknown P with a continuous cdf F satisfying F09 1 for some 6 E R and lt 1 for any x lt 0 Consider the largest order statistic XW For anyegt07 F676 lt1 and Plt1Xltngt7 612 e 7 PltXltngt 6 7 61F0 7 an which imply according to Theorem 18v XW 7 t9 ie7 XW is strongly consistent for 9 If we assume that Fit977 the z th order left hand derivative of F at 0 exists and vanishes for any 239 S m and that Fm107 exists and is nonzero7 where m is a nonnegative integer7 then 1 Fm19 1 ng mnx 9 7 Xnm1 0 l9 7 Xnlm1gt as This result and the fact that P ltn1 7 FXn 2 s 17571 imply that 67Xnm1 Opnquotl7 ie7 XW is nm171 consistent If m 07 then XW is n consistent7 which is the most common situation If m 17 then XW is consistent The limiting distribution of nm171Xn 7 9 can be derived as follows Let 1 lt71gtmltm1gtx 1 Mg nFltm167 For t S 07 by Slutsky7s theorem7 Xn76 67X hmPltltgtlttgthmPlt 7 m1 1 741330 137111 FXltngtl 2 PM le 177tm1nl n cotm139 woo hnw woo 577 It can be seen from the previous examples that there are many consistent estimators Like the admissibility in statistical decision theory7 consistency is a very essential requirement in the sense that any inconsistent estimators should not be used7 but a consistent estimator is not necessarily good Thus7 consistency should be used together with one or a few more criteria We discuss a situation in which nding a consistent estimator is crucial Suppose that an estimator Tn of 19 satis es cannX 719 ad UY7 1 where Y is a random variable with a known distribution7 039 gt 0 is an unknown parameter7 and on is a sequence of constants For example7 in Example 2337 V5027 ad N07 02 in Example 2347 1 holds with on nmi lli1 and 039 71mm 1lFm107m171 If a consistent estimator in of 039 can be found7 then7 by Slutsky7s theorem7 cnTnX 7 19 ad Y and7 thus7 we may approximate the distribution of cannX 7 l n by the known distribu tion of Y TA Yuan Jiang Email jiangy statwiscedu STAT 709 Discussion 4 September 257 2007 1 Conditional Expectation Example 1 Let X be an integrable random variable de ned on the proba bility space 977 P Let A1 and A2 be two sub U elds of f and A1 Q A2 Show that ElEXlA1lA2l EXlA1 ElEXlA2lA1laS Example 2 Let X7 Y be a random vector having a Lebesgue pdf f7 Suppose that Ele lt 00 and Z X Y Show that fz z7 Z 7 xd ffz7Z7dx39 Example 3 Let X7Y and Z be random variables having a positive joint Lebesgue pdf Let fX yzly and fxmzwlyg be the conditional pdf7s of X given Y and X given Y7 Z respectively quzi a Show that for almost all 7 1 l 7 E X z 7 fX95 leWlY l where fX is the marginal pdf of X b Show that for almost all z and y 1 1 7 E Y leYltly S leixZWlth yl Example 4 Let X and Y be independent random variables on a probability space Show that if Ele lt 00 form some a 2 1 and ElYl lt 007 then ElX Yl Z ElX EYl Of ce 1275A M80 1 Phone 262 1577 Lecture 1 Measurable space measure and probability Random experiment uncertainty in outcomes 9 sample space or outcome space a set containing all possible outcomes De nition 11 Let f be a collection of subsets of a sample space 9 f is called a U eld or U algebra if and only if it has the following properties i The empty set U E 7 ii If A E 7 then the complement A0 E f iii If A E f 2 1 2 then their union UA E f f is a set of sets Two trivial examples f contains Q and 9 only and f contains all subsets of Q Why do we need to consider other U eld f AACQ where A C Q C a collection set of subsets of Q UC the smallest U eld containing C called the U eld generated by C UC C if C itself is a U eld PffisaU eldonQandCCf UC ferf 0A 0A7Ac 0A7 9 0A7 0 07 A7 A 9 73 the k dimensional Euclidean space R1 R is the real line Bk the Borel U eld on 73 Bk 00 0 is the collection of all open sets 0 E 81280 0 B B E Bk is the Borel U eld on 0 Measure length area volume De nition 12 Let 9 be a measurable space A set function V de ned on f is called a measure if and only if it has the following properties i 0 S VA S 00 for anyA 6f 11 V 0 iii If A E f 2 1 2 and As are disjoint ie A A Q for any 2 31 j then 2 A i Wt 21 9 a measurable space 9 V a measure space If VQ 1 then V is a probability measure we usually use notation P instead of V A measure V may take 00 as its value 1 For anyx R7ooxoo7xooooifzgt07zooiooiflt07and0000 2 oooooo 3 oo oo for anya gt0 4 oo 7 00 or 0000 is not de ned Examples 00 A6f7A7 VA H 0 A Counting measure Let Q be a sample space7 f the collection of all subsets7 and VA the number of elements in A E f VA 00 if A contains in nitely many elements Then V is a measure on f and is called the counting measure Lebesgue measure There is a unique measure in on 728 that satis es ma7b b 7 a for every nite interval a7b7 foo lt a S b lt 00 This is called the Lebesgue measure If we restrict m to the measurable space 07 17 3104 then m is a probability measure Proposition 11 Let 97f7y be a measure space i Monotonicity If A C B7 then VA S VB ii SubadditiVity For any sequence A17A27quot7 41min 3 gm iii Continuity lf A1 C A2 C A3 C or A1 3 A2 3 A3 3 and VA1 lt oo7 then V 313 An 321 V Ant where TLliHmOAnUAi or Algt i1 i1 Let P be a probability measure The cumulative distribution function cdf of P is de ned to be P 7007xl7 z E R Proposition 12 Let F be a cdf on R Then a F7oo limmnooFx 0 b Foo limmH00 1 c F is nondecreasing7 ie7 S if x S y d F is right continuous7 ie7 limynmygtmFy ii Suppose that a real valued function F on R satis es a d in part Then F is the cdf of a unique probability measure on R78 Lecture 36 The UMVUE and BLUE Theorem 37 Consider model X Z a 1 with assumption A1 8 is distributed as Nn07021n with an unknown 02 gt 0 i The LSE V3 is the UMVUE of F3 for any estimable V3 ii The UMVUE of 02 is 32 n 7 r 1HX 7 Z3H27 where r is the rank of Z Proof Let 3 be an LSE of 3 By ZTZb ZTX7 X 7 ZBYZMG 7 B XTZ XTZW 7 B 0 and7 hence7 HX 7 ZBHZ HX 7 23 23 7 Z llz HX Z llz H26 7 Z llz HX Z llz QBTZTX HZBHZ HZBHZ Using this result and assumption A17 we obtain the following joint Lebesgue pdf of X 7 r T Z 2 Z 2 Z 2 2W2 nZexp UZ2 z Hz leggy ll Hz l By Proposition 21 and the fact that Z3 ZZTZ ZTX is a function of ZTX7 the statistic ZTX7 HX 7 Z3H2 is complete and suf cient for 6 302 Note that 3 is a function of ZTX and7 hence7 a function of the complete suf cient statistic lf l73 is estimable7 then F3 is unbiased for F3 Theorem 36 and7 hence7 F3 is the UMVUE of F3 11 From HX i 23H HX 7 ZBHZ H23 7 23H and EZB Z Theorem 36 EHX 7 ZBHZ 7 EltX 7 ZWX 7 Z 7 Em 7 Brzizw 7 3 tr ltVarX 7 VarZ3gt 02n 7 tr ltZZTZ ZTZZTZ ZTgt 02n 7 tr ZTZ ZTZgt Since each row of Z 6 72Z7 Z3 does not depend on the choice of ZTZ in 3 ZTZ ZTX Theorem 36 Hence7 we can evaluate trZTZ ZTZ using a particular ZTZ From the theory of linear algebra7 there exists a p gtlt p matrix C such that CCT p and A 0 07ZTZO 7 0 0 where A is an r gtlt r diagonal matrix whose diagonal elements are positive Then7 a particular choice of ZTZ is ZTZY O A71 0 gt or 2 0 0 I 0 zrzrzrz o or 0 0 whose trace is r Hence 72 is the UMVUE of 02 since it is a function of the complete suf cient statistic and E n 7 r 1EHX 7 23H U2 ln general7 A VarlT lTZTZ ZTVar8ZZTZ l 3 If l E RZ and Var6 UfIn assumption A27 then the use of the generalized inverse matrix in 2 leads to VarlT UZZTZTZ Z7 which attains the Crame r Rao lower bound under assumption A1 Proposition 32 The vector X 7 Z3 is called the residual vector and HX 7 Z llz is called the sum of squared residuals and is denoted by SSR The estimator 72 is then equal to SSRn 7 r Since X i 23 In 7 ZZTZ ZTX and VB 17zrzrzrx are linear in X they are normally distributed under assumption A1 Also7 using the generalized inverse matrix in 27 we obtain that In 7 ZZTZ ZTZZTZ ZZTZ i ZZTZ ZTZZTZ 0 which implies that a and FIG are independent Exercise 58 in 16 for any estimable lT Furthermore7 ZZTZ ZT2 ZZTZ ZT ie7 ZZTZ ZT is a projection matrix and SSR XTIn i ZZTZ ZTX The rank of ZZTZ ZT is trZZTZ ZT r Similarly7 the rank of the projection matrix In 7 ZZTZ ZT is n 7 r From XTX XTZZTZ ZTX XTIn i ZZTZ ZTX and Theorem 15 Cochran7s theorem7 SSE02 has the chi square distribution xiirw with 6 Heran 7 ZZTZ ZTZB 0 2 Thus7 we have proved the following result Theorem 38 Cpnsider model 1 with assumption A1 For any estimable parameter NB the UMVUE7s VB and a are independent the distribution of VB is NZTBUZZTZTZ Z and n 7 ramp202 has the chi square distribution xii Example 315 In Examples 312 3147 UMVUE7s of estimable VB are the LSE7s ZTB7 under assumption A1 ln Example 3137 SSR Z Xij 7 xi i1 j1 in Example 3147 if 0 gt17 a b c SSR Z Z 09 i X i1 j1 k1 We now study properties of FIG and 72 under assumption A27 ie7 without the normality assumption on 8 From Theorem 36 and the proof of Theorem 37ii7 VB with an l E and a are still unbiased without the normality assumption In what sense are VB and 72 optimal beyond being unbiased We have the following result for the LSE VB Some discussion about a can be found7 for example7 in Rao 19737 p 228 Theorem 39 Consider model 1 with assumption A2 i A necessary and suf cient condition for the existence of a linear unbiased estimator of TB ie7 an unbiased estimator that is linear in X is l E ii Gauss Markov theorem lf l E 7ZZ7 then the LSE FIG is the best linear unbiased estimator BLUE of VB in the sense that it has the minimum variance in the class of linear unbiased estimators of lf Proof The suf ciency has been established in Theorem 36 Suppose now a linear function of X7 07X with c E R is unbiased for VB Then ZTB ECTX CTEX CTZB Since this equality holds for all 67 l Z707 ie7 l E ii Let l E RZ RZTZ A A Then l ZTZ for some C and VB CZTZW TZTX by ZTZb ZTX Let 07X be any linear unbiased estimator of lf From the proof of i7 Z70 l Then 30ICTZTX7 07X 7 CTZTX EXTZltCTX i EXTZCCTZTX UZtFZltCT BTZTZCCTZB i 02trZCCZT i BTZTZCCTZTZB UZCTZ 76 7 UZCTZ 7 76 0 Hence VarcTX VarcTX 7 TZTX TZTX VarcTX 7 TZTX VarKTZTX 200VltltTZTX7 07X 7 CZTX Vaer 7 CZTX Vamp 2 VarlTB Lecture 27 Asymptotic bias variance and mse Asymptotic bias Unbiasedness as a criterion for point estimators is discussed in 232 In some cases7 however7 there is no unbiased estimator Furthermore7 having a sligh 77 bias in some cases may not be a bad idea Let TnX be a point estimator of 19 for every ii If ET exists for every n and limpH00 ETn 7 19 0 for any P E 737 then Tn is said to be approccimatcly unbiased There are many reasonable point estimators whose expectations are not well de ned It is desirable to de ne a concept of asymptotic bias for point estimators whose expectations are not well de ned De nition 211 Let 7 17 27 be random variables and an be a sequence of positive numbers satisfying an 7 00 or an 7 a gt 0 If an n 7d g and El l lt 007 then E an is called an asymptotic cmpcctatioii of g ii Let Tn be a point estimator of 19 for every n An asymptotic expectation of Tn 7 197 if it exists7 is called an asymptotic bias of Tn and denoted by 5THP or 5T 19 if P is in a parametric family If limiH00 5TnP 0 for any P E 737 then Tn is said to be asymptotically unbiased Like the consistency7 the asymptotic expectation or bias is a concept relating to sequences a and Ewan or Ti and mum The exact bias on P is not necessarily the same as anP when both of them exist Proposition 23 shows that the asymptotic expectation de ned in De nition 211 is essentially unique Proposition 23 Let n be a sequence of random variables Suppose that both E an and Enbn are asymptotic expectations of n de ned according to De nition 211i Then7 one of the following three must hold a E E77 0 b E 71 07 E77 07 and bnan 7 0 or E1 0 En a 0 and anbn 7 0 e E6 a 0 En a 0 and E anE77bn 71 If Tn is a consistent estimator of 197 then Tn 19 0171 and7 by De nition 211ii7 Tn is asymptotically unbiased7 although Tn may not be approximately unbiased ln Example 2347 XW has the asymptotic bias 5XnP hn19EY7 which is of order n m rl When anTn 7 19 7d Y with EY 0 eg7 Tn X2 and 19 112 in Example 2337 a more precise order of the asymptotic bias of Tn may be obtained for comparing different estimators in terms of their asymptotic biases Suppose that there is a sequence of random variables n such that annn 7d Y and aTn 7 19 7 77 7d W 1 where Y and W are random variables with nite means7 EY 0 and EW 71 0 Then we may de ne a2 to be the order of anP or de ne EWai to be the a2 order 1 asymptotic bias of Tn However7 nn in 1 may not be unique Some regularity conditions have to be imposed so that the order of asymptotic bias of Tn can be uniquely de ned We consider the case where X17 7Xn are iid random k vectors with nite E VarX1 Let X n 1 2 X1 and Tn 9X7 where g is a function on 72k that is second order differentiable at a EX1 6 73 Consider Tn as an estimator of 19 9a By Taylor7s expansion7 Tn 7 19 7 MWHX 7 u 3X 7 mtvzgaxx 7 u o where V9 is the k vector of partial derivatives of g and V29 is the k gtlt k matrix of second order partial derivatives of 9 By the CLT and Theorem 110iii7 739 2 0 7 MYV29ltMgtltX 7 M 7d M where Z Nk07E Thus7 ElZ V29MZzl tr WWW 2 271 271 is the 714 order asymptotic bias of Tn 9X7 where trA denotes the trace of the matrix A Example 235 Let X17 7Xn be iid binary random variables with PXl 1 p7 where p E 071 is unknown Consider rst the estimation of 19 p1 7 p Since VarX p1 7 pn7 the 71 order asymptotic bias of Tn X1 7 X according to 2 with 9a z17 a is 7p1Pn On the other hand7 a direct computation shows EX17 X EX 7 EXZ p 7 EX2 7 VarX 7 p17 p 7 p17 pn Hence7 the exact bias of Tn is the same as the 714 order asymptotic bias Consider next the estimation of 19 p l In this case7 there is no unbiased estimator of p 1 Exercise 84 in 26 Let Tn X l Then7 an 71 order asymptotic bias of Tn according to 2 with 9a z 1 is 1 7pp2n On the other hand7 ET 00 for every n Asymptotic variance and mse Like the bias7 the mse of an estimator Tn of 197 mseTnP ETn 7 1927 is not well de ned if the second moment of Tn does not exist We now de ne a version of asymptotic mean squared error amse and a measure of assessing different point estimators of a common parameter 2 De nition 212 Let Tn be an estimator of 19 for every n and an be a sequence of positive numbers satisfying an 7 00 or an 7 a gt 0 Assume that anTn 7 19 7d Y with 0 lt EYZ lt 00 i The asymptotic mean squared error of Tn7 denoted by amseTnP or amseTn 6 if P is in a parametric family indexed by 07 is de ned to be the asymptotic expectation of Tn 71927 ie7 amseTn P EYZa The asymptotic variance of Tn is de ned to be 013 VarYa ii Let T be another estimator of 19 The asymptotic relative e ciency of T7 wtr Tn is de ned to be emf P amseTn PamseTnP iii Tn is said to be asymptotically more e cientthan T if and only if lim sup emf P S 1 for any P and lt 1 for some P The amse and asymptotic variance are the same if and only if EY 0 By Proposition 237 the amse or the asymptotic variance of Tn is essentially unique and7 therefore7 the concept of asymptotic relative ef ciency in De nition 212ii iii is well de ned ln Example 2337 amsegz P U 2P 4M202n ln Example 2347 03 P hn02VarY and amseXnP hn02EY2 When both mseTnP and mseT P exist7 one may compare Tn and T by evaluating the relative ef ciency mseTnPmseTTLP However7 this comparison may be different from the one using the asymptotic relative ef ciency in De nition 212ii7 since the mse and amse of an estimator may be different Exercise 115 in 26 The following result shows that when the exact mse of Tn exists7 it is no smaller than the amse of Tn It also provides a condition under which the exact mse and the amse are the same Proposition 24 Let Tn be an estimator of 19 for every n and an be a sequence of positive numbers satisfying an 7 00 or an 7 a gt 0 Suppose that anTn 7 19 7d Y with 0 lt EYZ lt 00 Then i EYZ S liminanla Tn 7 192 and ii EYZ limiH00 EaiTn 7 192 if and only if a Tn 7 192 is uniformly integrable Proof By Theorem 110iii7 minaTn 7 1927t 7d minY27t for any t gt 0 Since mina LTn 7 1927t is bounded by t7 Emina LTn 7 1927 t EminY27 t Theorem 18viii Then EYZ tlim Emma27a tlim EminaTn 7 1927t lim inf Emina LTn 7 1927 t S limninf EaiTn 7 1927 3 where the third equality follows from the fact that EminaiTn 7 1927 t is nondecreasing in t for any xed 71 ii The result follows from Theorem 18viii Example 236 Let X17 Xn be iid from the Poisson distribution 130 with an unknown 6 gt 0 Consider the estimation of 19 PXl 0 6 9 Let TM Fn07 where Fn is the empirical cdf Then TM is unbiased and has mseTW t9 6 91 7 6 9n Also7 Tm 7 19 7d N07 6 91 7 64 by the CLT Thus7 in this case amseT1nt9 mseT1n6 Consider Tgn e X Note that ETgn NWTL1 Hence an2n6 7 66 92 Using Theorem 112 and the CLT7 we can show that WT2n 7 19 7d N07 290 By De nition 212i7 amseT2n t9 e 290n Thus7 the asymptotic relative ef ciency of TM wrt Tgn is eTlnT2n6 669 7 17 which is always less than 1 This shows that Tgn is asymptotically more ef cient than TM The result for Tgn in Example 236 is a special case with Un X of the following general result Theorem 26 Let g be a function on 72k that is differentiable at 6 6 72k and let Un be a k vector of statistics satisfying anUn 79 71 Y for a random k vector Y with 0 lt EHYH2 lt 00 and a sequence of positive numbers can satisfying an 7 00 Let Tn 9Un be an estimator of 19 90 Then7 the amse and asymptotic variance of Tn are7 respectively7 Eivglt6gtrY2a3 and ivglt6gtrvarltYgtvglt6gta Lecture 35 The LSE and estimability One of the most useful statistical models XiBTZi8i7 17quot397n7 where Xi is the 2th observation and is often called the 2th response 6 is a p vector of unknown parameters main parameters of interest7 p lt 71 Z is the 2th value of a p vector of explanatory variables or covariates 817 7 an are random errors not observed Data X17Z177XmZn Zs are nonrandom or given values of a random go vector7 in which case our analysis is conditioned on Z17 7 Zn X X1L7 7Xn7 8 817 W78 Z the n gtlt p matrix whose 2th row is the vector Z t 17 quot771 A matrix form of model 1 is X Z6 8 2 De nition 34 Suppose that the range of B in model 2 is B C 72 A least squares estimator LSE of B is de ned to be any 6 E B such that 7 A2 7 2 HX Z ll EggIHX Zbll 3 For any l E 72177 lT is called an LSE of lT Throughout this book7 we consider B RP unless otherwise stated Differentiating HX 7 ZbH2 wrt b7 we obtain that any solution of ZTZb ZTX 4 is an LSE of If the rank of the matrix Z is p7 in which case ZTZ 1 exists and Z is said to be of full rank7 then there is a unique LSE7 which is B ZTZ 1ZTX 5 If Z is not of full rank7 then there are in nitely many LSE7s of B Any LSE of B is of the form 6 ZTZVZTX 6 where ZTZ is called a generalized inverse of ZTZ and satis es ZTZZTZ ZTZ ZTZ Generalized inverse matrices are not unique unless Z is of full rank7 in which case ZTZ ZTZ 1 and 6 reduces to To study properties of LSE7s of B we need some assumptions on the distribution of X or 8 conditional on Z if Z is random Assumption A1 8 is distributed as Nn0021n with an unknown 02 gt 0 Assumption A2 E8 0 and Var6 02 with an unknown 02 gt 0 Assumption A3 E8 0 and Var6 is an unknown matrix Assumption A1 is the strongest and implies a parametric model We may assume a slightly more general assumption that 8 has the Nn0ozD distribution with unknown 02 but a known positive de nite matrix D Let D lZ be the inverse of the square root matrix of D Then model 2 with assumption A1 holds if we replace X Z and 8 by the transformed variables X D lZX Z D lZZ and 5 D lZe respectively A similar conclusion can be made for assumption A2 Under assumption Al the distribution of X is NnZ UZIn which is in an exponential family 73 with parameter 6 602 6 RP gtlt 000 However if the matrix Z is not of full rank then 73 is not identi able see 212 since Z61 Z62 does not imply 61 82 Suppose that the rank of Z is r S p Then there is an n gtlt r submatrix Z of Z such that and Zr is of rank 7 where Q is a xed 7 gtlt p matrix and Z ZQB 73 is identi able if we consider the reparameterization 3 Q6 The new parameter 6 is in a subspace of R with dimension 7 In many applications we are interested in estimating some linear functions of B ie i9 VB for some l 6 72 From the previous discussion however estimation of VB is meaningless unless l Q70 for some 0 E R7 so that VB QB anquot The following result shows that VB is estimable if l Q70 which is also necessary for VB to be estimable under assumption A1 Theorem 36 Assume model 2 with assumption A3 i A necessary and suf cient condition for l 6 RP being Q70 for some 0 E R7 is l E 7ZZ 7ZZTZ where Q is given by 7 and RA is the smallest linear subspace containing all rows of A ii If l E RZ then the LSE VB is unique and unbiased for VB iii If l Z RZ and assumption A1 holds7 then lT is not estimable Proof Note that a E RA if and only if a ATb for some vector b If l Q707 then l Q76 Qizgzngzrrlc 2124212944 Hence l E lf l E 72Z7 then l ZTC for some C and l Z7QYC Q70 with c ZIC ii If l E RZ 72ZTZ7 then l ZTZC for some C and by 67 Euro EZTZTZ ZTX CZTZZTZ ZTZB CTZTZB VB If B is any other LSE of B then7 by 47 m 7 VB 7 CZTZXB 7 B 7 CZTX 7 27X 0 iii Under assumption A17 if there is an estimator hX7 Z unbiased for VB then lT hx7 Z27T Zo exp 7 ZBHZ dx Rn Differentiating wrt B and applying Theorem 21 lead to r 27 m memoZonal 7 Z exp 7 ui 7 MP dx Rn which implies l E Example 312 Simple linear regression Let B 6061 E R2 and Z 17127 ti 6 R i 17 7 n Then model 1 or 2 is called a simple linear regression model It turns out that ZTZ n ELI ti 2 ti 2 E This matrix is invertible if and only if some ts are different Thus7 if some ts are different7 then the unique unbiased LSE of lT for any l E R2 is lTZTZquotlZTX7 which has the normal distribution if assumption A1 holds The result can be easily extended to the case of polynomial regression of order p in which n 50751op1 and 21 17ti7t 1 Example 313 One way ANOVA Suppose that n 221 71 with m positive integers 7117 7nm and that XiLj8i7 Zkj1177kj7 j177m7 where k0 07 k Z1nl7j 177m7 and p177am B Let Jm be the m vector of ones Then the matrix Z in this case is a block diagonal matrix with Jn as the jth diagonal column Consequently7 ZTZ is an m gtlt m diagonal matrix whose jth diagonal element is 717 Thus7 kZTZ is invertible and the unique LSE of B is the m vector whose jth component is 71771 Ziggth X1 7 17 quot7771 Sometimes it is more convenient to use the following notation Xij iniiwv 5M 81mm j177m7 177m7 and Maai7 z39177m Then our model becomes XijLOti8ij7 j177ni7z39177m7 which is called a one way analysis of variance ANOVA model Under model 87 B M70417 mom E 72 The matrix Z under model 8 is not of full rank An LSE of 6 under model 8 is B X7X15 X 7 X where X is still the sample mean of Xis and Xi is the sample mean of the 2th group ij 177ni39 The notation used in model 8 allows us to generalize the one way ANOVA model to any 5 way ANOVA model with a positive integer 5 under the so called factorial experiments Example 314 Two way balanced ANOVA Suppose that XijkMOti8j yij8ijk7 239177a7j177b7k177c7 where a7 b7 and c are some positive integers Model 9 is called a two way balanced ANOVA model If we view model 9 as a special case of model 27 then the parameter vector 6 is 8 M70417quot397O a7817quot3978b7 117quotWW1177quot397 a17quot397 ab39 One can obtain the matrix Z and show that it is n gtltp7 where n abc and p 1abab7 and is of rank ab lt p It can also be shown that an LSE of B is given by the right hand side of 10 with M7 0417 677 and 71 replaced by m 0217 377 and Aym respectively7 where A X7 021 Xi 7 X7 Xi 7 X7 AyZj X 7 Xi 7 Xi X7 and a dot is used to denote averaging over the indicated subscript7 eg7 Xj i Z Z Xijk i1 k1 with a xed j Lecture 21 Complete statistics A statistic VX is ancillary if its distribution does not depend on the population P VX is rst order ancillary if EVX is independent of P A trivial ancillary statistic is the constant statistic VX E c E R If VX is a nontrivial ancillary statistic7 then 039VX C TX is a nontrivial U eld that does not contain any information about P Hence7 if SX is a statistic and VSX is a nontrivial ancillary statistic7 it indicates that USX contains a nontrivial U eld that does not contain any information about P and7 hence7 the data77 SX may be further reduced A suf cient statistic T appears to be most successful in reducing the data if no nonconstant function of T is ancillary or even rst order ancillary De nition 26 Completeness A statistic TX is said to be complete for P E P if and only if7 for any Borel f7 EfT 0 for all P E P implies f 0 as P T is said to be boundedly complete if and only if the previous statement holds for any bounded Borel f A complete statistic is boundedly complete If T is complete or boundedly complete and S 717T for a measurable 7177 then S is complete or boundedly complete Intuitively7 a complete and suf cient statistic should be minimal suf cient Exercise 48 A minimal suf cient statistic is not necessarily complete for example7 the minimal suf cient statistic X717Xn in Example 213 is not complete Exercise 47 Finding a complete and suf cient statistic Proposition 21 If P is in an exponential family of full rank with pdf7s given by M96 exp777T96 C77h967 then TX is complete and suf cient for 77 E E Proof We have shown that T is suf cient Suppose that there is a function f such that EfT 0 for all 77 E E By Theorem 2117 ft exp777t 7 C77dA 0 for all 77 E E7 where A is a measure on 72177817 Let 770 be an interior point of E Then ftequotTtdA ftequotTtdA for all n e N070 1 where N770 77 6 RP H77 7 770M lt 6 for some 6 gt 0 In particular7 fte 3tdA fte 3tdA c If c 07 then f 0 ae A If c gt 07 then c 1fte 3t and c lfte 5t are pdf7s wrt A and 1 implies that their mgf7s are the same in a neighborhood of 0 By Theorem 16ii7 c 1fte 3t c lfte 3t7 ie7 f f1 7 f 0 ae A Hence T is complete 1 Example 215 Suppose that X1 Xn are iid random variables having the Np02 distribution M E R 039 gt 0 From Example 26 the joint pdf of X1 Xn is Wyn2 exp 771T1 77sz i 76 where T1 21Xi T2 7 1X2 and 77 771772 Hence the family of distributions for X X1Xn is a natural exponential family of full rank E R gtlt 0 By Proposition 21 TX T1T2 is complete and suf cient for 77 Since there is a one to one correspondence between 77 and 6 02 T is also complete and suf cient for 6 It can be shown that any one to one measurable function of a complete and suf cient statistic is also complete and suf cient exercise Thus XSZ is complete and suf cient for 6 where X and 52 are the sample mean and sample variance respectively Example 216 Let X1 Xn be iid random variables from P9 the uniform distribution U06 6 gt 0 The largest order statistic XW is complete and suf cient for 6 E 000 The suf ciency of XW follows from the fact that the joint Lebesgue pdf of X1 Xn is 6 I09zn From Example 29 X has the Lebesgue pdf nmn 16 IO9x on R Let f be a Borel function on 0 00 such that EfXn 0 for all 6 gt 0 Then 9 fxz 1d 0 for all 6 gt 0 0 Let G6 be the left hand side of the previous equation Applying the result of differentiation of an integral see eg Royden 1968 53 we obtain that G 6 f66 1 ae 7711 where m1 is the Lebesgue measure on 0oo8000 Since G6 0 for all 6 gt 0 f66 1 0 ae m and hence fx 0 ae 7711 Therefore XW is complete and suf cient for 6 E 0 00 Example 217 In Example 212 we showed that the order statistics TX X1 XW of iid random variables X1 Xn is suf cient for P E P where P is the family of distri butions on R having Lebesgue pdf7s We now show that TX is also complete for P E P Let P0 be the family of Lebesgue pdf7s of the form fx C61 6n exp7z2 61 62x2 6nmn where 6 E R and C616n is a normalizing constant such that ffxdx 1 Then Po C P and P0 is an exponential family of full rank Note that the joint distribution of X X1Xn is also in an exponential family of full rank Thus by Proposition 21 U U1Un is a complete statistic for P 6 730 where U 221 Since as P0 implies as P UX is also complete for P E P The result follows if we can show that there is a one to one correspondence between TX and UX Let V1 21Xi V2 ZKJXin V3 Siglt1 XiXJXk V X1Xn From the identities Uk V1qu V72qu quotquot 1k71Vk71U1 1kak 07 k 1n there is a one to one correspondence between UX and VX V1 Vn From the identity t 7 X1 t 7 X t 7 V1th van 7 71 V there is a one to one correspondence between VX and TX This completes the proof and hence TX is suf cient and complete for P E 73 In fact both UX and VX are suf cient and complete for P E 73 The relationship between an ancillary statistic and a complete and suf cient statistic is characterized in the following result Theorem 24 Basu7s theorem Let V and T be two statistics of X from a population P E 73 If V is ancillary and T is boundedly complete and suf cient for P E 73 then V and T are independent wrt any P E 73 Proof Let B be an event on the range of V Since V is ancillary PV 1B is a constant Since T is suf cient ElIBVlT is a function of T independent of P Since EEIBVlT 7 PV 1B 7 0 for all P e 73 PV 1BlT ElIBVlT PV 1B as 73 by the bounded completeness of T Let A be an event on the range of T Then PT 1A V 1B EElIATIBVlTl EIATElIBVlTl 7 EIAltTgtPltV1ltBgtgt 7 PltT1ltAgtgtPltV1ltBgtgt Hence T and V are independent wrt any P E 73 Basu7s theorem is useful in proving the independence of two statistics Example 218 Suppose that X1 Xn are iid random variables having the NM02 distribution with M E R and a known 039 gt 0 It can be easily shown that the family NM02 M E R is an exponential family of full rank with natural parameter 77 MU39Z By Proposition 21 the sample mean X is complete and suf cient for 77 and M Let 52 be the sample variance Since 52 n 71 1 21Zi7 Z2 where Z X 7 M is N002 and Z n 1 22 Z1 52 is an ancillary statistic 02 is known By Basu7s theorem X and 52 are independent wrt NM02 with M E R Since 02 is arbitrary X and 52 are independent wrt NM02 for any M E R and 02 gt 0 Using the independence of X and 82 we now show that n 7 1SZ02 has the chi square distribution xii Note that 02 From the properties ofthe normal distributions nX7M2UZ has the chi square distribution x with the mgf 17 2042 and 21Xi 7M202 has the chi square distribution xi with 3 the rngf 172194 7 t lt 12 By the independence of X and 827 the rngf of n71Sz02 is 1 i 2t 21 i 2042 1 i 2t 12 for t lt 1 2 This is the In f of the Chi s uare distribution 2 and therefore the result E q Xn 1 7 7 follows Lecture 33 Ustatistics and their variances Let X17 7 Xn be iid from an unknown population P in a nonparametric family P lfthe vector of order statistic is suf cient and complete for P E P then a symmetric unbiased estimator of any estimable 19 is the UMVUE of 19 In a large class of problems7 parameters to be estimated are of the form 19 7 ElhltX1Xmgtl with a positive integer m and a Borel function h that is symmetric and satis es EhX1Xml lt 00 for any P E P It is easy to see that a symmetric unbiased estimator of 19 is Un ltlgt71hmeXim7 1 where 20 denotes the summation over the combinations of m distinct elements 117 71m from 17771 De nition 32 The statistic Un in 1 is called a U statz39stz39c with kernel h of order m The use of U statistics is an effective way of obtaining unbiased estimators ln nonparametric problems7 U statistics are often UMVUlEYs7 whereas in parametric prob lems7 U statistics can be used as initial estimators to derive more ef cient estimators If m 17 Un in 1 is simply a type of sample mean Examples include the empirical cdf evaluated at a particular It and the sample moments n l ELI X for a positive integer k Consider the estimation of 19 Mquot where 11 EX1 and m is a positive integer Using hz17 1 mm we obtain the following U statistic unbiased for 19 11m 71 n Un Xi 2 m 0 1 lt gt Consider the estimation of 19 172 VarX1 Since 172 VarX1 VarX22 EX1 7 X2227 we obtain the following U statistic with kernel h17x2 1 7 x222 2 Xi 7 X 2 1 n 7 Un7 E g lt2X 7nX2gtSZ 71 i 1 1giltj n 2 n 7 1 11 which is the sample variance In some cases7 we would like to estimate 19 Ele 7 Xgl7 a measure of concentration Using kernel h172 lsl 7 wk we obtain the following U statistic unbiased for 19 E Xl 7 Xgli 2 Un 7 HQ 7 Xl7 nn 7 1 1922 7 which is known as Gmi s mean dz erence Let 19 X2 S Using kernel hx17 x2 Ioo0x1 27 we obtain the following U statistic unbiased for 19 2 Un 7ooOXi Xj7 n 7 1 1909 which is known as the me sample Wilcomzm statistic lehX177Xm2 lt 007 then the variance of Un in 1 with kernel h has an explicit form To derive VarUn7 we need some notation For k 177m7 let hkx17 EhX177XmlX1 17 7Xk zk Ehx17 7xk7Xk17 7Xm Note that hm h It can be shown that hk177xk Ehk1x177zk7Xk1 3 De ne 13k bk 7 EhX1 Xm 4 k 177m7 and IE Then7 for any Un de ned by 17 71 n U 7 EltUngt 7 2mm Xgt lt5 Theorem 34 Hoeffding7s theorem For a U statistic Un given by 1 with EhX177Xm2 lt 007 1 m n m n 7 m WW m k m 71 where Ck VE1 hkAXl7 7 Proof Consider two sets 23917 7z39m and j177jm of m distinct integers from 17 with exactly k integers in common The number of distinct choices of two such sets is m7k By the symmetry of hm and independence of X17 7 X 7 FLXJ3917quot397ijl Ck for k 177m Then7 by 57 This proves the result Corollary 32 Under the condition of Theorem 347 1 quotL729 yaw so ii 71 1VarUn1 S nVarUn for any n gt m iii For any xed m and k 17 quot7771 if C 0 forj lt k and k gt 07 then 2 Mg c 1 It follows from Corollary 32 that a U statistic Un as an estimator of its mean is consistent in mse under the nite second moment assumption on h In fact7 for any xed m7 if C 0 for j lt k and k gt 07 then the mse of Un is of the order 71 and7 therefore7 Un is nkZ consistent Example 311 Consider rst h172 127 which leads to a U statistic unbiased for 2 7 M 7 M 7 EXl Note that h1951 M9517 h1951 M951 W7 1 Elh1X1l2 M2V3FX1 M2027 Ha z 1M M27 and C2 V3FX1X2 EX1X22 M4 M2 022 i 4 71 By Theorem 347 for Un Elsiltan XlXj7 W 34 K3612 3 l 3 2gtlt2l 271 7 QWUZ M2 02V 7 M4 nn 7 1 4 202 204 M 7 n nn 7 1 Comparing Un with X2 7 0271 in Example 3107 which is the UMVUE under the normality and known 02 assumption7 we nd that 204 VarUn 7 VarX2 7 0271 Next7 consider h17x2 Im0x1 x27 which leads to the one sample Wilcoxon statistic Note that h11 P1 X2 3 0 F7z17 where F is the cdf of P Then 1 VarF7X1 Let 19 EhX17X2 Then 2 VarhX17X2 1917 19 Hence7 for Un being the one sample Wilcoxon statistic7 2 VarUn m 201 7 am 1907 19gt If F is continuous and symmetric about 07 then 1 can be simpli ed as 1 VarF7X1 Var17 FX1 VarFX1 7 since FX1 has the uniform distribution on 071 Finally7 consider h17x2 lsl 7 2l7 which leads to Gini7s mean difference Note that ma 7 Eiz17X2i lx17yldPy7 and 2 lt17varlth1ltX1gtgt 7 lx7yldPyl dPltzgt 71927 where 19 Ele 7 Xgl Lecture 38 Asymptotic properties of LSE7s We consider rst the consistency of the LSE VB with l E RZ for every 71 Theorem 311 Consider model X Z6 8 1 under assumption A3 0 and Var6 is an unknown matrix Suppose that sup AVar6 lt 00 where HA is the largest eigenvalue of the matrix A and that limH00 AZTZ 0 Then VB is consistent in mse for any l E Proof The result follows from the fact that VB is unbiased and Vamp lTZTZ ZTVareZZTZ l AVarelTZTZ l Without the normality assumption on 8 the exact distribution of VB is very hard to obtain The asymptotic distribution of VB is derived in the following result Theorem 312 Consider model 1 with assumption A3 Suppose that 0 lt infn AVar8 where AJA is the smallest eigenvalue of the matrix A and that gagglzi Z Z Z 0 2 Suppose further that n 21 771 for some integers k m j 1 k with ms bounded by a xed integer m 8 1 k 7 6 72quot and j7s are independent i If sup Eleilz s lt 00 then for any l E RZ W9 e mxvarw a Nlto1gt lt3 ii Suppose that when m m7 1 S 239 ltj S k g and 7 have the same distribution Then result 3 holds for any l E Proof Let l E Then lTZTZ ZTZ 7 VB 0 and k NB 7 B 17ZTZ ZTE 2cm j1 where C is39the mj vector whose components are lTZTZ Z 239 Ira1 1 kj k0 0 and k 21mtj1k Note that k E HCMHZ lTZTZ ZTZZTZ l Hzrzrz 4 j1 Also I 2 739 739 7 392 rgfgkllCmH 77159ng Z Z Z S mlTZTZ l max Z7ZTZ Z 1 l n 1 which7 together with 4 and condition 27 implies that k 2 I 2 I 2 1732an llcmll gt 039 The results then follow from Corollary 13 Under the conditions of Theorem 3127 Var8 is a diagonal block matrix with Var j as the jth diagonal block7 which includes the case of independent as as a special case Exercise 80 shows that condition 2 is almost a necessary condition for the consistency of the LSE The following lemma tells us how to check condition Lemma 33 The following are suf cient conditions for a AZTZ a 0 and ZZTZ Zn a 07 as n a 00 b There is an increasing sequence can such that an a 007 canandrL a 17 and ZTZan converges to a positive de nite matrix Proof a Since ZTZ depends on 717 we denote ZTZ by An Let in be the integer such that hi maxlgign hi lf limyH00 in 007 then 22120 gang ZiaMin gang Ziawin 07 where the inequality follows from in S n and7 thus7 A 7 An is nonnegative de nite If in S c for all 717 then i i T i 2 lt nlim hm nlim ZMAnZM nlim An HZZH 0 Therefore7 for any subsequence C with limyH00 jn a E 07 oo7 limyH00 hjn 0 This shows that limyH00 hi 0 b Omitted If 714 XL t a c and 714 XL ti a d in the simple linear regression model Example 3127 where c is positive and c gt d27 then condition b in Lemma 33 is satis ed with an n and7 therefore7 Theorem 312 applies In the one way ANOVA model Example 3137 lrggZ ZTZYZi MKZTZH 123 Hence conditions related to Z in Theorem 312 are satis ed if and only if minj nj a 00 Some similar conclusions can be drawn in the two way ANOVA model Example 314 Functions of unbiased estimators If the parameter to be estimated is 19 90 with a vector valued parameter 6 and Un is a vector of unbiased estimators of components of 0 then Tn 9Un is often asymptotically unbiased for 19 Assume that g is differentiable and CnUn 7 6 ad Y Then amserP ElV99lTY2Ci Theorem 26 Hence7 Tn has a good performance in terms of amse if Un is optimal in terms of mse such as the UMVUE or BLUE Example 322 Consider a polynomial regression of order p XiBTZi8i7 2177717 where B 6061 p17 ZZ 17127 t l7 and efs are iid with mean 0 and variance 02 0 Suppose that the parameter to be estimated is t E T C R such that p71 I p71 7 i 2 i EW 70 770 Note that t 96 for some function 9 Let B be the LSE of n Then the estimator f is asymptotically unbiased and its amse can be derived under some conditions Example 323 In the study of the reliability of a system component7 we assume that Here Xij is the measurement of the 2th sample component at time t 2t is a q vector whose components are known functions of the time t 01 s are unobservable random q vectors that are iid from Nql9 37 where 6 and E are unknown EiS are iid measurement errors with mean zero and variance 02 01 s and EiS are independent As a function of t7 072t is the degradation curve for a particular component and 672t is the mean degradation curve Suppose that a component will fail to work if 072t lt 777 a given critical value Assume that 072t is always a decreasing function of t Then the reliability function of a component is Rt Home gt n lt1gt 7 where 5t ztTEzt and ltIgt is the standard normal distribution function For a xed t7 estimators of Rt can be obtained by estimating t9 and 27 since ltIgt is a known function It can be shown exercise that the BLUE of 6 is the LSE ZTZ 1ZTX where Z is the m gtlt q matrix whose jth row is the vector 2tj7 Xi Xi1L7 7Xlm7 and X is the sample mean of Xs The estimation of E is more dif cult It can be shown exercise that a consistent as k a 00 estimator of E is A 1k 739 71 739 7 T 71 A2 7 71 2ZXZZ ZXi XXiX ZZZ 70ZZ 7 11 where k 1 oz Xle 7 XZZ ZTZ 1Z7Xi Mm iq gl l Hence an estimator of Rt is where Y XZZTZ 12t XZZZTZ 12tl2 XJXi 7 XzzltztzgtlzTXiiltm 7 a K Yi17Yl 27Yl 3 It is apparent that can be written as 957 for a function 17 27 3 q ml y y lt 22yfiyslz zfzflzw Suppose that 817 has a nite fourth moment7 which implies the existence of VarYl The amse of Rt can be derived exercise Lecture 31 UMVUE a necessary and suf cient condition When a complete and suf cient statistic is not available7 it is usually very dif cult to derive a UMVUE In some cases7 the following result can be applied7 if we have enough knowledge about unbiased estimators of 0 Theorem 32 Let U be the set of all unbiased estimators of 0 with nite variances and T be an unbiased estimator of 19 with ET2 lt 00 i A necessary and suf cient condition for TX to be a UMVUE of 19 is that ETXUX 0for anyUEU andanyPEP ii Suppose that T hT7 where T is a suf cient statistic for P E 73 and h is a Borel function Let UT be the subset ofU consisting of Borel functions of T Then a necessary and suf cient condition for T to be a UMVUE of 19 is that ETXUX 0 for any U EMT and any P E 73 Proof Suppose that T is a UMVUE of 19 Then To T CU7 where U E U and c is a xed constant7 is also unbiased for 19 and7 thus7 VarTc 2 VarT c E 727 P E 737 which is the same as CZVarU QCCOVT7 U 2 0 c E 727 P E 73 This is impossible unless COVT7 U ETU 0 for any P E 73 Suppose now ETU 0 for any U E U and P E 73 Let To be another unbiased estimator of 19 with VarT0 lt 00 Then T 7 T0 6 U and7 hence7 ETT7T0 0 PEP7 which with the fact that ET ETO implies that VarT COVT7 To P E 73 Note that COVT7 T02 3 VarTVarT0 Hence VarT S VarT0 for any P E 73 ii lt suf ces to show that ETU 0 for any U 6 MT and P E 73 implies that ETU 0 foranyUEUandPEP Let U E M Then EUlT 6 MT and the result follows from the fact that T hT and ETU EETUlT EEhTUlT EhTEUlT Theorem 32 can be used to nd a UMVUE to check whether a particular estimator is a UMVUE and to show the nonexistence of any UMVUE If there is a suf cient statistic then by Rao Blackwell7s theorem we only need to focus on functions of the suf cient statistic and hence Theorem 32ii is more convenient to use As a consequence of Theorem 32 we have the following useful result Corollary 31 Let be a UMVUE of 197 j 1 k where k is a xed positive integer Then 21 chj is a UMVUE of 19 21 0719 for any constants 01ck ii Let T1 and T2 be two UMVUEYs of 19 Then T1 T2 as P for any P E 73 Example 37 Let X1 Xn be iid from the uniform distribution on the interval 019 In Example 31 1 n 1Xn is shown to be the UMVUE for 6 when the parameter space is 9 0 00 Suppose now that 9 1 00 Then XW is not complete although it is still suf cient for 0 Thus Theorem 31 does not apply to XW We now illustrate how to use Theorem 32ii to nd a UMVUE of 0 Let UXn be an unbiased estimator of 0 Since X has the Lebesgue pdf n0 1109x 1 9 0 Uxx 1dz Uxx 1dz 0 1 for all 6 2 1 This implies that Uz 0 ae Lebesgue measure on 100 and 1 Uxx 1dz 0 0 Consider T hXn To have ETU 0 we must have 1 hUxnildx 0 0 Thus we may consider the following function 0 0 S x S 1 M96 bx z gt 1 where c and b are some constants From the previous discussion EhXnUXnl 0 0 gt1 2 Since EhXn 67 we obtain that 9 CPXn S 1 bElXltngtIltLoogtltXltngtgt 06 16 7 6 Thus7 c 1 and b n The UMVUE of 6 is then hltX 1 0 S XW S 1 n 1 n 1Xn XW gt1 This estimator is better than 1 n lXn7 which is the UMVUE when 9 000 and does not make use of the information about 6 2 1 ln fact7 hXn is complete and suf cient for 6 It suf ces to show that 1 0 XW 1 90 XW XW gt 1 is complete and suf cient for 6 The suf ciency follows from the fact that the joint pdf of X17 7 Xn is 1 6n lf EfgXn 0 for all 6 gt 17 then 1 09 X00 6 09 9Xn 0 A9 gan ild o1 fUWHdz 19 fx 1dz for all 6 gt 1 Letting 6 a 1 we obtain that f1 0 Then 0 19 fzx 1dx for all 6 gt 17 which implies f 0 ae for z gt 1 Hence7 gXn is complete Example 38 Let X be a sample of size 1 from the uniform distribution U6 7 76 6 e R We now apply Theorem 32 to show that there is no UMVUE of 19 96 for any nonconstant function 9 Note that an unbiased estimator UX of 0 must satisfy 9 Uzdz 0 for all 9 e R 6 1 2 Differentiating both sizes of the previous equation and applying the result of differentiation of an integral lead to Uz Ux 1 ae m7 where m is the Lebesgue measure on R If T is a UMVUE of 967 then TXUX is unbiased for 0 and7 hence7 TxUz Tx 1Uz 1 ae m7 where UX is any unbiased estimator of 0 Since this is true for all U7 T Tx 1 ae m Since T is unbiased for 967 9 90 1 Tzdz for allQER 9 Differentiating both sizes of the previous equation and applying the result of differentiation of an integral7 we obtain that gQTlt0Tlt07O ae m Lecture 40 Vstatistics and the weighted LSE Let X17 7Xn be iid from P For every U statistic Un as an estimator of 19 EhX17 7Xm7 there is a closely related V statistic de ned by im1 1 Vni mm M As an estimator of 197 Vn is biased but the bias is small asymptotically as the following results show For a xed sample size 717 Vn may be better than Un in terms of their mse7s Proposition 35 Let Vn be de ned by 1 Assume that ElhXZ177Ximl lt 00 for all 1 3 2391 3 3 rm 3 m Then the bias of Vn satis es anP 0n 1 ii Assume that EhXi17 7Xim2 lt 00 for all 1 3 2391 3 3 rm 3 m Then the variance of Vn satis es VarVn VarUn 071 27 where Un is the U statistic corresponding to V To study the asymptotic behavior of a V statistic7 we consider the following representation of Vn in 1 m m V Z lt 39gtan7 11 j where 1 n n an 19 39 39 39 Z gjXi17 11 131 is a statistic77 with 739 gj17 7j hj17 7j 7 hjlt17 i1 Z hjx177zjdPxi1dPxi2 7 1 i1lti2 j 47hjz177zjdpz1dPzj and hjlt17 quot7 j Eh17 quot7 j7Xj17 quot7 Using an argument similar to the proof of Theorem 347 we can show that EijOn 77 j177m7 2 provided that EhXi17 Xim2 lt 00 for all 1 3 2391 3 3 im 3 m Thus7 V i 19 me M g llvnz 0n 1 3 which leads to the following result similar to Theorem 35 Theorem 316 Let Vn be given by 1 with EhXi17XZm2 lt 00 for all 1 3 2391 3 S mltm i lf 1 Varh1X1 gt 07 then WW 7 19 7d N0m21 ii lf 1 0 but 2 Varh2X17X2 gt 07 then m m 7 1 w 7 19gt a lt 2 l 2 w j1 where Js and Ms are the same as those in Theorem 35 Theorem 316 shows that if 1 gt 07 then the amse7s of Un and Vn are the same lf 1 0 but 2 gt 07 then an argument similar to that in the proof of Lemma 32 leads to 2 2 7 2 2 7 2 co Emma 13 w 1 W 2 A7 j1 2712 2 m2m 7 12 amseUnP A 4712 71 7 see Lemma 32 Hence Un is asymptotically more ef cient than V 7 unless 221 Aj 0 Example 328 Consider the estimation of M27 where M EXl From the results in 327 the U statistic Un w Elsiltan Xin is unbiased for M2 The corresponding V statistic is simply Vn X2 If M 31 07 then 1 31 0 and the asymptotic relative ef ciency of Vn wrt Un is 1 If M 07 then 71V 71 02 and nUn 7d 020 7 17 where x is a random variable having the chi square distribution x Hence the asymptotic relative ef ciency of Vn wrt Un is EXi 12EXi2 23 The weighted LSE In the linear model X Z6 87 4 the unbiased LSE of lT may be improved by a slightly biased estimator when V Var6 is not 02 and the LSE is not BLUE Assume that Z is of full rank so that every VB is estimable If V is known7 then the BLUE of VB is lT 7 where B ZTV lzleTV lX 5 see the discussion after the statement of assumption A3 in 331 If V is unknown and V is an estimator of V7 then an application of the substitution principle leads to a weighted least squares estimator 3w ZTV1Z 1ZTV 1X 6 The weighted LSE is not linear in X and not necessarily unbiased for B If the distribution of 8 is symmetric about 0 and V remains unchanged when 8 changes to 767 then the distribution of 6w 7 B is symmetric about 0 and7 if E w is well de ned7 6w is unbiased for B A A In such a case the LSEAZTB may not be a UMVUE when 8 is normal7 since VarlT w may be smaller than VarlT Asymptotic properties of the weighted LSE depend on the asymptotic behavior of We say that V is consistent for V if and only if llV lV 7 Inllmax p 07 7 where maxiJ laijl for a matrix A whose 2397jth element is aij Theorem 317 Consider model 4 with a full rank Z Let B and 3w be de ned by 5 and 67 respectively7 with a V consistent in the sense of Assume the conditions in Theorem 312 Then A 708111 7 d 17 where l E 72177 l 31 07 and a VarlT lTZTV 1Z 1l Proof Using the same argument as in the proof of Theorem 3127 we obtain that W wan w Nlt01 By Slutsky7s theorem7 the result follows from lTBw 7 ZTB 0an De ne A A 5 l7ZTV 1Z 1ZTV 1 i V 18 and A n FZTV1Z1 7ZTV 1Z 1ZTV 15 Then A V lT w 175 5n Q The result follows from n 0pan and n 0an details are in the textbook Theorem 317 shows that as long as V is consistent in the sense of 77 the weighted LSE 3w is asymptotically as ef cient as 67 which is the BLUE if V is known A By Theorems 312 and 3177 the asymptotic relative ef ciency of the LSE VB wrt the weighted LSE lT w is l7ZTV 1Z 1l lTZTZ lZTVZZTZquotll7 which is always less than 1 and equals 1 if VB is a BLUE in which case 3 Finding a consistent V is possible when V has a certain type of structure Example 329 Consider model Suppose that V Var8 is a block diagonal matrix with the 2th diagonal block 721 UiEUg7 239 8 where ms are integers bounded by a xed integer m7 02 gt 0 is an unknown parameter7 E is a q gtlt q unknown nonnegative de nite matrix7 Ul is an ml gtlt q full rank matrix whose columns are in q lt inflml7 and WZ is the p gtlt ml matrix such that ZT W1 W2 Wk Under 87 a consistent V can be obtained if we can obtain consistent estimators of 02 and 2 Let X 11771k7 where K is an 771i Vector7 and let RZ be the matrix whose columns are linearly independent rows of Wi Then 1 A2 039 7 nikq k 2 mm RszRiWRHY 9 i1 is an unbiased estimator of 02 Assume that Ys are independent and that supl Elei 00 for some 6 gt 0 Then amp2 is consistent for 02 exercise Let n K 7 Wf and 26 My UiTUi 1UrirUiUUi 1 i WWW4 10 l H It can be shown exercise that i is consistent for E in the sense that 7 EMU a1 0 or7 equivalently7 HE 7 EH a 0 see Exercise 116 Lecture 37 Robustness of LSE7s Consider model X Z6 8 1 under assumption A3 0 and Var6 is an unknown matrix An interesting question is under what conditions on Var6 is the LSE of VB with l E RZ still the BLUE lf VB is still the BLUE7 then we say that ZTB7 considered as a BLUE7 is robust against violation of assumption A2 A statistical procedure having certain properties under an assumption is said to be robust against violation of the assumption if and only if the statistical procedure still has the same properties when the assumption is slightly violated For example7 the LSE of VB with l E 72Z7 as an unbiased estimator7 is robust against violation of assumption A1 or A27 since the LSE is unbiased as long as E8 07 which can be always assumed without loss of generality On the other hand7 the LSE as a UMVUE may not be robust against violation of assumption A1 Theorem 310 Consider model 1 with assumption A3 The following are equivalent a VB is the BLUE of VB for any l E b EZTB777X 0 for any l E RZ and any 77 such that E777X 0 c ZTVar8U 07 where U is a matrix such that ZTU 0 and RUT 72ZT R d Var8 ZAIZT UAZUT for some A1 and A2 e The matrix ZZTZ ZTVar6 is symmetric Proof We rst show that a and b are equivalent7 which is an analogue of Theorem 321 Suppose that b holds Let l 6 722 lf 07X is unbiased for lT 7 then E777X 0 with 77 c 7 ZZTZ Z Hence Suppose now that there are l E RZ and 77 such that E777X 0 but 6 EZTB777X 77 0 Let 0 t77 ZZTZ Z From the previous proof7 VarcX tZVar777X Varl73 261 1 As long as 6 31 0 there exists a t such that VarctTX lt VarlTB This shows that VB cannot be a BLUE and therefore a implies We next show that b implies Suppose that b holds Since l E RZ l ZT y for some 7 Let 77 E RUT Then E777X nTZ 0 and hence 0 ElT nTX E yTZZTZ ZTXXT77 yTZZTZ ZTVar677 Since this equality holds for all l E RZ it holds for all 7 Thus ZZTZ ZTVar8U 0 which implies ZTZZTZ ZTVar8U ZTVar8U 0 since ZTZZTZ ZT Z7 Thus c holds To show that c implies d we need to use the following facts from the theory of linear algebra there exists a nonsingular matrix C such that Var6 CCT and C Z01 U02 for some matrices 07 since RUT RZT Let A1 01017 A2 020 and A3 Then yam ZAlzT UAZUT ZASUT UAgZT 2 and ZTVSI8U ZTZASUTU which is 0 if c holds Hence c implies 0 ZZTZ ZTZA3UTUUTU UT ZASUT which with 2 implies lf d holds then ZZTZ ZTVar8 ZA1ZT which is symmetric Hence d implies e To complete the proof we need to show that e implies b which is left as an exercise As a corollary of this theorem the following result shows when the UMVUEYs in model 1 with assumption A1 are robust against the Violation of Var6 02 Corollary 33 Consider model with a full rank Z 8 N0E and an unknown positive de nite matrix 2 Then VB is a UMVUE of VB for any l 6 R if and only if one of b e in Theorem 310 holds Example 316 Consider model 1 with 6 replaced by a random vector that is indepen dent of 8 Such a model is called a linear model with random coef cients Suppose that Var8 02L and E 6 Then XZ Z 8Z 67 3 where e Z 7 B 8 satis es Ee 0 and Vare ZVar ZT 02 Since ZZTZ ZTVare ZVar ZT UZZZTZ ZT is symmetric7 by Theorem 3107 the LSE VB under model 3 is the BLUE for any VB7 l E A If Z is of full rank and 8 is normal7 then7 by Corollary 337 VB is the UMVUE of VB for any 1 6 72 Example 317 Random effects models Suppose that XijLAi6ij7 j177ni7z39177m7 where M E R is an unknown parameter7 As are iid random variables having mean 0 and variance 07 gig7s are iid random errors with mean 0 and variance 027 and As and gig7s are independent Model 4 is called a one way random e ects model and As are unobserved random effects Let 8 6 Then 4 is a special case of the general model 1 with Var6 052 021m where E is a block diagonal matrix whose z th block is ijgi and Jk is the k vector of ones Under this model7 Z J 7 n 221 m7 and ZZTZ ZT n ljnjg Note that nljmjgl nZanjgg 39 39 39 Jng n1 JmJTZI nZJWZ Igz 39 nmjmjgm 7 n1 Jnm J7 nanm J7 nmJnm ng which is symmetric if and only if 711 n2 nm Since JnJTZVar8 is symmetric if and only if JnJgE is symmetric7 a necessary and suf cient condition for the LSE of M to be the BLUE is that all ns are the same This condition is also necessary and suf cient for the LSE of M to be the UMVUE when EiS are normal In some cases7 we are interested in some not all linear functions of For example7 consider VB with l E 7ZH7 where H is an n gtlt p matrix such that C Proposition 34 Consider model 1 with assumption A3 Suppose that H is a matrix such that C A necessary and suf cient condition for the LSE VB to be the BLUE of VB for any l E is HZTZ ZTVar8U 0 where U is the same as that in c of Theorem 310 Example 318 Consider model 1 with assumption A3 and Z H1 H2 where Hng 0 Suppose that under the reduced model X H151 57 FBI is the BLUE for any lT l l E RH1 and that under the reduced model X H252 57 ZTBZ is not a BLUE for some ZTBZ l E 7ZH2 where B 6162 and st are LSE7s under the reduced models LetHH10bengtltp Note that HZTZ ZTVar8U H1H17H1 H17Var8U which is 0 by Theorem 310 for the U given in c of Theorem 310 and ZZTZ ZTVar8U H2H27H2 H27Var8U which is not 0 by Theorem 310 A This implies that some LSE VB is not a BLUE of VB but VB is the BLUE of VB ifl E Finally we consider model 1 with Var6 being a diagonal matrix whose z th diagonal element is 02 ie as are uncorrelated but have unequal variances A straightforward calculation shows that condition e in Theorem 310 holds if and only if for all 239 31 j 02 31 072 only when hi 0 where h is the z jth element of the projection matrix ZZTZ ZT Thus an LSE is not a BLUE in general although it is still unbiased for estimable VB Suppose that the unequal variances of efs are caused by some small perturbations ie 8 61L u where Vare 02 Varu 6i and e and u are independent so that 02 02 6 VarlTB lTZTZ ZU ZZZTZ Z i1 lf 6 0 for all 239 no perturbations then assumption A2 holds and VB is the BLUE of any estimable VB with VarlT UZZTZTZ Z Suppose that 0 lt 6 3 026 Then VarwB 1 6UZZTZTZ Z This indicates that the LSE is robust in the sense that its variance increases slightly when there is a slight violation of the equal variance assumption small 6 Lecture 22 Decision rules loss and risk Statistical decision theory X a sample from a population P E 73 Decision an action we take after observing X A the set of allowable actions A7 724 the action space X the range of X Decision rule a measurable function a statistic T from X7fx to A7fA If X is observed7 then we take the action TX E A Performance criterion loss function LP7 a from 73 X A to 0700 and is Borel for each P If X z is observed and our decision rule is T7 then our loss77 is LP7Tx It is dif cult to compare LP7 T1 and LP7 T2X for two decision rules7 T1 and T27 since both of them are random Risk Average expected loss de ned as RTltPgt Elma TXl X LltP7TltzgtgtdPXltzm lf 73 is a parametric family indexed by 07 the loss and risk are denoted by L07a and RT0 For decision rules T1 and T27 T1 is as good as T2 if and only if RT1 P S RT2P for any P E 737 and is better than T2 if7 in addition7 RT1 P lt RT2P for at least one P E 73 Two decision rules T1 and T2 are equivalent if and only if RT1P RT2P for all P E 73 Optimal rule If Ti is as good as any other rule in 37 a class of allowable decision rules7 then Ti is S optimal or optimal if 3 contains all possible rules Sometimes it is useful to consider randomized decision rules Randomized decision rule a function 6 on X gtlt fA such that7 for every A E 1247 57 A is a Borel function and7 for every x 6 X7 6amp7 is a probability measure on A7 724 If X z is observed7 our have a distribution of actions 6amp7 A nonrandomized decision rule T previously discussed can be viewed as a special randomized decision rule with 6z7a aTz7 a 6 A7 z E X To choose an action in A when a randomized rule 6 is used7 we need to simulate a pseudo random element of A according to 5z7 Thus7 an alternative way to describe a randomized rule is to specify the method of simulating the action from A for each x E X For example7 a randomized rule can be a discrete distribution 5z7 assigning probability 10z to a nonrandomized decision rule j 1727 7 in which case the rule 6 can be equivalently de ned as a rule taking value with probability pj7 ie7 T1X with probability p1X TX TkX with probability pkX The loss function for a randomized rule 6 is de ned as Loam LP7ad6z7a7 A which reduces to the same loss function we discussed when 6 is a nonrandomized rule The risk of a randomized rule 6 is then R5P ELP767X AALP7ad6z7adPXz For TX de ned above7 and Example 219 Let X X1L7 7Xn be a vector of iid measurements for a parameter 6 e R Action space 4734 R7 8 A common loss function in this problem is the squared error loss LP7 a t9 7 127 a E A Let TX X7 the sample mean The loss for X is X 7 92 If the population has mean M and variance 02 lt 007 then R2PEt9 7502 6 i EX EEX i X t9 7 EX2 VarX M 7 92 0g to RXUJ 0 n is an increasing function of the population variance 02 and a decreasing function of the sam ple size 71 Consider another decision rule T1X X0 Xn2 2 RT1P does not have a simple explicit form if there is no further assumption on the popu lation P Suppose that P E P Then7 for some P X or T1 is better than T1 or X exercise7 whereas for some P neither X nor T1 is better than the other Consider a randomized rule X with probability pX T1X with probability 1 7 pX T2X The loss for T2X is X t9210X T100 glzll 10Xl and the risk of T2 is RT2P EUX t9210X T1X glzll 10Xl ln particular7 if pX 057 then The problem in Example 219 is a special case of a general problem called estimation In an estimation problem7 a decision rule T is called an estimator The following example describes another type of important problem called hypothesis testing Example 220 Let P be a family of distributions P0 C P and P1 P E P P Z PO A hypothesis testing problem can be formulated as that of deciding which of the following two statements is true H0 P 6 P0 versus H1 P E 731 1 Here7 H0 is called the null hypothesis and H1 is called the alternative hypothesis The action space for this problem contains only two elements7 ie7 A 0717 where 0 is the action of accepting H0 and 1 is the action of rejecting H0 A decision rule is called a test Since a test TX is a function from X to 0717 TX must have the form CX7 where C E f is called the rejection region or critical region for testing H0 versus H1 0 1 loss LP7 a 0 if a correct decision is made and 1 if an incorrect decision is made7 ie7 LP7j 0 for P E 73739 and LP7j 1 otherwise7 j 071 Under this loss7 the risk is PTX1 PX e 0 P e 730 PTX 0 PX g 0 P e 731 RTP See Figure 22 on page 127 for an example of a graph of RT0 for some T and P in a parametric family The 0 1 loss implies that the loss for two types of incorrect decisions accepting H0 when P 6 P1 and rejecting H0 when P E 730 are the same In some cases7 one might assume unequal losses LP7j 0 for P E 737 LP7 0 00 when P E 7317 and LP71 01 when P E 730 Admissibility De nition 27 Let S be a class of decision rules randomized or nonrandomized A decision rule T E S is called S admissz39ble or admissible when 3 contains all possible rules if and only if there does not exist any S E S that is better than T in terms of the risk If a decision rule T is inadmissible7 then there exists a rule better than T Thus7 T should not be used in principle However7 an admissible decision rule is not necessarily good For example7 in an estimation problem a silly estimator TX E a constant may be admis sible lf Ti is S optimal7 then it is S admissible lf Ti is S optimal and T0 is S admissible7 then T0 is also S optimal and is equivalent to Ti If there are two S admissible rules that are not equivalent7 then there does not exist any S optimal rule Lecture 7 Moments inequalities mgf and chf lf EXk is nite7 where k is a positive integer7 EXk is called the kth moment of X or PX lf E X lt 00 for some real number 17 EXP is called the ath absolute moment of X or PX If n EX and EX 7 Wk are nite for a positive integer k7 EX 7 ink is called the kth central moment of X or PX Variance EX 7 EX2 X X17 7Xk7 EX EX17 7EXk M Mob EM EAIo Covariance matrix VarX EX 7 EX X 7 EXT The t7jth element of VarX7 239 31 j7 is EXl 7 EXiX 7 EX77 which is called the covam39anee of Xi and X and is denoted by CovXl7 Xj VarX is nonnegative de nite CovXl7Xj2 S VarXlVarX7 279739 If CovXl7 Xj 07 then Xi and X are uncorrelated lndependence implies uncorrelation7 not converse If Y 07X7 c 6 72k and X is a random k vector7 EY CTEX and VarY CTVarXc Three useful inequalities Cauchy Schwartz inequality EXY2 S EXZEYZ for random variables X and Y Jensen7s inequality fEX S EfX for a random vector X and convex function f f 2 0 Chebyshev7s inequality Let X be a random variable and p a nonnegative and nondecreasing function on 07 00 satisfying ltp7t t Then7 for each constant t 2 07 WPXZt ltp H X Zt XdP S EltpX Example 118 If X is a nonconstant positive random variable with nite mean7 then EX 1 lt EX 1 and ElogX lt logEX7 since t 1 and 7 logt are convex functions on 0700 Let f and g be positive integrable functions on a measure space with a o nite measure V lf ffdy 2 fgdy gt 07 we want to show that flog dV 2 0 9 Let h fffdy Then h is a pdf wrt V Let Y gf be a random variable de ned on the probability space with P being the probability with pdf It By Jensen7s inequality7 Elog9f logE9f Note that logEgf log 0th log lt 0 fodV ffdu Eloggf log hdy log defdy 1 and Moment generating and characteristic functions De nition 15 Let X be a random k Vector i The moment generating function mgf of X or PX is de ned as om E5 t 6 72k ii The characteiistic function chf of X or PX is de ned as oon E55 EcostTX x71EsintTX7 t 6 72k If the mgf is nite in a neighborhood of 0 E Rh then Xt can be obtained by replacing tin ibXt by xilt If Y ATX c7 where A is a k gtlt in matrix and c 6 R772 it follows from De nition 15 that ibyu ecTuibXAu and yu e cmgbxviu7 u E Rm X X1L7 7Xk with mgf ibX nite in a neighborhood of 0 t not wXt Z M71m k 1 k Hv k T k me k 39 39 Tl 39 Special case of k 1 i0 239 Consequently7 oil39quotrwxt dg 39 t EX X EX1lt1gt0 X lt1 i WNW lt gt i ltgt W WWW 7 52 Xt 7 7 at no 7 EX7 6t6t7 no 7 EltXX lf 0 lt ibXt lt 007 then nXt log ibXt is called the cumulant generating function of X or PX lf ibX is not nite and ElX1 lt 00 for some nonnegative integers r17 777 then Wittma x lt71gtltT1rW2EltXr Xikgt t0 6 Xt 62 Xt at no 7 HEX 6t6t7 7EltXXTgt7 gtltogt lt71gti2EltXigt t0 Example a random variable X has nite EXk for k 1 2 but 7Xt 00 t 31 0 P the probability measure for N0n with pdf f n 12 P 2201 2 Pn is a probability measure with Lebesgue pdf 221 2 fn Exercise 35 Let X be a random variable having distribution P It follows from Fubini7s theorem that X has nite moments of any order for even k EXk 2de Z Hymn Z 2 2den Z 2 k71k73 1nk2 lt 00 721 721 721 and EXk 0 for odd k By Fubini7s theorem 72m etde Z 2 e dPn Z 2 em22 oo 27 0 1 721 Since the chf of N0n is e m22 6Mjltmdp Z 2772 exjltmdpn Z 27n67nt22 2622 7171 1 721 Fubini7s theorem Hence the moments of X can be obtained by differentiating 25X For example 25 X0 0 and pg0 72 which shows that EX 0 and EXZ 2 Theorem 16 Uniqueness Let X and Y be random k vectors i If Xt yt for all t 6 73 then PX Py ii If 7Xt 7yt lt 00 for all t in a neighborhood of 0 then PX Py Another useful result For independent X and Y XYt Xt yt and XYt Xt Yt7 t 6 73k Example 120 Let Xi 2 1k be independent random variables and X have the gamma distribution Hal7 Table 12 2 1k From Table 12 X has the mgf 7Xit 17 70 t lt 7 1 2 1k Then the mgf of Y X1 Xk is equal to 7yt 1 7 yt 1quot39 kl t lt 7 From Table 12 the gamma distribution Pa1 ak has the mgf 7yt and hence is the distribution of Y by Theorem 16 A random vector X is symmetric about 0 iff X and 7X have the same distribution Show that X is symmetric about 0 if and only if its chf x is real valued If X and 7X have the same distribution then by Theorem 16 Xt Xt But Xt X7t Then X7t Note that sin7tTX 7 sintTX and costTX cos7tTX Hence EsintTX 0 and thus x is real valued Conversely if x is real valued then Xt Ec0stTX and Xt X7t gtXt By Theorem 16 X and 7X must have the same distribution Lecture 18 Exponential and locationscale families Two important types of parai netric families De nition 22 Exponential families A parametric family Pg 9 6 6 dominated by a a finite measure 7 on 055 is called an exponential family if and only if g exp7 9TTw 9liw 6 Q 1 where expr e T is a random p Vector with a fixed positive integer p 7 is a function from 6 to R 17 is a nonnegative Borel function on 955 and w log1expimemolbwu In Definition 22 T and 17 are functions of only whereas 7 and E are functions of 9 only The representation 1 of an exponential family is not unique 779 D79 with a p X p nonsingular matrix D gives another representation with T replaced by T DTquotT A change of the measure that dominates the family also changes the representation If we define 17017 for any 6 73 then we obtain an exponential family with densities g M exp7 9TTw 59 2 In an exponential family consider the reparameterization 7 779 and no expwro can1a e 2 3 where 77 logfu exp7 TTwliwd7w This is the canonical form for the family not unique The new parameter 7 is called the natural parameter The new parameter space E 779 9 6 6 a subset of R is called the natural parameter space An exponential family in canonical form is called a natural exponential family If there is an open set contained in the natural parameter space of an exponential family then the family is said to be of full rank Example 26 The normal family NW 02 u 6 R 7 gt 0 is an exponential fai39nily since the Lebesgue pdf of Nu 02 can be written 1 M 1 2 2 lo 0 ix 7 7 7 i 9 02 202 202 5 Hence Tr if x2 779 9 u 02 59 log a and Mr 1 2W Let 7 771772 Then S R X 0 and we can obtain a natural exponential family of full rank with 77 7747 2 log1q27 2 1 A subfamily of the previous normal fai39nily NL M2 M 6 R M 7 0 is also an exponential family with the natural parameter r 3 27 and natural parai neter space E 3331 y 2x2 41 6 R 3 gt 0 This exponential family is not of full rank For an exponential fai39nily implies that there is a nonzero measure such that g d We can use this fact to show that a family of distributions is not an exponential fai39nily Consider the family of uniform distributions ie P9 is U0 9 with an unknown 9 6 0 If Pg 9 6 0 is an exponential family then holds with a nonzero measure For any t gt 0 there is a E lt t such that Pow 0 which with implies that t 30 0 Also for any t g 0 Pom 30 0 which with implies that rgtC 0 Since t is arbitrary E 0 This contradiction implies that Pg 9 6 0 cannot be an exponential fai39nily gt 0 for all and E Vhich of the parametric families from Tables 11 and 12 are exponential families39 An important exponential family containing i nultivariate discrete distributions Example 27 The i nultinoi nial family Consider an experiment having k 1 possible outcoi nes with pi the probability for the ith outcome 239 0 1 k ZLO pi 1 In n independent trials of this experii nent let Xi be the number of trials resulting in the ith outcome 239 0 1 R Then the joint pdf wrt counting i neasure of X0X1 Xk is n foxox1mxk xkPquotPT quot39PZkIBfJquotO th xkl ll where B 330461 33 s are integers Z 0 Ego xi n and E 190191 The distribution of X0 X1 Xk is called the multinomial distribution which is an extension of the binomial distribution In fact the marginal cdf of each Xi is the binomial distribution 81709 n f9 9 6 6 is the i nultinoi nial fai39nily where 6 9 6 RH 0 lt p lt 1 Ego pi 1 Let x x0x1xk 77 logpologp1logpk and M42 ml3303331xkl1gx Then fgx0x1 xk exp 77 M33 x 6 Rf 5 Hence the i nultinoi nial family is a natural exponential family with natural parameter 7 However representation does not provide an exponential family of full rank since there is no open set of Rf contained in the natural parai neter space A reparai neterization leads to an exponential family with full rank Using the fact that ZLO Xi n and 2amp0 pi 1 we obtain that fgx0x1xk exp 7733 COM 1103 x 6 Rf 6 where x x1xk 77 logp1p0 logpkp0 and 77 nlogp0 The wyt parameter space is R 39 Hence the family of densities given by is a natural exponential family of full rank 2 E39 KAI H I urn1 u 31u1 M4 Sow Hun 8ltgt7 dxa 1 71 8 mm Honmmsw mmmxgq aq 30 31839sz aq 3mg uqu am 92 a mmxg Hg mism aq pm 39Z 116 qu Sum1 31mm n 30 Swim upuauodxa mama 12 Sg xu 1 0 M4 Sow m dxa x f s j39p39d mm Swim aq pm 2L E uaq 380 a 3M am 31 39x 1 0 mm pm 9 Sow g Lgl 8m 911 2 ml ltlt lt x 4 58 1 0 u 9 Sow Tl ox dxa My 8 amsmm ununm aq Tun 0d 30 39j39p39d aq mugs S umj mumuodxa UP sg 1 0 9 g 0d Imql xa mug aApgsod paxg p s u amqm 9 mmmmzd mm u 9gg uonnqmsgp mmougq aq aq 0d 331 g aldmexg 1ng 1218mm aq mpun uogwpumqggp Sq pmndmm aq Smn saAmzAgmp aq pm ou 30 pooqmqq gau p ug axqzmumqggp mayo Smgugug sg mammmeMm m uopautg aq uaq oc gt quotdm f f SugSjsgws uogmun 103 p sg f 3g axommq3xn 1 W 1 Gupwm 2 Sq uaAg sg pm 0 30 pooqmqq gau p ug anng sg 1410 ad 30 um 3811 aq uaq aazzds mmmmzd mnmu aq 30 mmd xogmug UP sg on 31 q Sq paxaptxg Swim mumuodxa mnwu p ug sg qagqm arm mam MW n no Sugpuadap amsmm mug 0 p 3932x m 39j39p39d mp seq n 1 uaAg A 30 uonnqmsgp nuompum mp axommq3xn 1 Swim Rntmtmdxa mnwu p ug 39j39p39d p Spq LL x1gtmpngtd U1 39 um Sugpuadap amsmm mug 0 p 3932X39AA um mdxa 2qu 39j39p39d aq seq 3 uaql 39uogsmqu Gums aq GARq a pm A amqm dw 61 a pm 1 K LL 331 g 399 Sq muff Swim Rntmtmdxa mnmu p aq CL 331 3939Z maxoaql 9361 uumnqa39 ug punt aq um jOOJd 31 aqung nptmtmdxa 30 sapmdmd n33st mqm mums sazgxmxmms unsax Bumoqrg aql Swim Rnuauodxa In Hg M12812 Sg X quotquot 1X 30 39j39p39d aq uaq saqmwg mumuodxa ug s j39p39d mm mm mopmu umpuadapug aw X W IX H De nition 23 Location scale families Let P be a known probability measure on Rquot 8quot V C Rquot and Wik be a collection of k X k symmetric positive definite matrices The family POW p e v 2 e Mk 7 is called a location scale family on Rf where HMMB P 24203 m B e Bquot 2quot 8 p 24203 p x 6 B C Rquot and 242 is the inverse of the square rootquot matrix 212 satisfying 21221 E3 The parameters p and 212 are called the location and scale parameters respectively The following are some important examples of location scale families The family 11mm p 6 Rf is a location family where 1k is the k x k identity matrix The family 11052 2 6 Wily is a scale family In some cases we consider a location scale family of the form IiiMug fl 6 Rquot 7 gt 0 If X1 Xk are iid with a common distribution in the location scale family me fl 6 R 7 gt 0 then the joint distribution of the vector X1 Xk is in the location scale family Iii1502 p 6 Va gt 0 with V x 6 Rk x 6 R A location scale family can be generated follows Let X be a random k vector having a distribution P Then the distribution of 212 fl is PM On the other hand if X is a random k vector whose distribution is in the location scale family 7 then the distribution DX 6 is also in the same fai nily provided that Dfl39l t f 6 V and DZDT 6 Wk Let F be the cdf of P V Then the cdf of PM is F2 12x M x 6 Rk IfF has a Lebesgue pdf f then the Lebesgue pdf of PM is Det2quot2f 2420 M x 6 Rk Proposition 18 Many families of distributions in Table 12 131 are location scale or location scale families For example the family of exponential distributions Eall is a location scale family on R with location parameter a and scale parameter 9 the family of uniform distributions U0 E is a scale family on R with a scale parameter 9 The k dii nensional normal family is a location scale family on 78quot Lecture 4 Convergence theorems change of variable and Fubini7s theorem fn n 1 2 a sequence of Borel functions Can we exchange the limit and integration ie 42220 My 41120 My Example 17 Consider 728 and the Lebesgue measure De ne fnz nI0f1x n 12 Then limH00 fnx 0 for all x but z 0 Since the Lebesgue measure of a single point set is 0 limH00 fnz 0 ae and flimH00 fndz 0 On the other hand ffnd 1 for any n and hence limH00 ffnd 1 Suf cient conditions Theorem 11 Let f1 f2 be a sequence of Borel functions on 9 V i Fatou7s lemma lf fn 2 0 then fliminfn fndy S liminfnffndy ii Dominated convergence theorem lf limH00 fn f ae and there exists an integrable function 9 such that UM S g ae then flimH00 fndy limH00 ffndy iii Monotone convergence theorem lf 0 3 f1 3 f2 3 and limH00 fn f ae then flimH00 fndy limH00 ffndy Proof See the textbook Note a To apply each part of the theorem you need to check the conditions b If the conditions are not satis ed you cannot apply the theorem but it does not imply that you cannot exchange the limit and integration Example Let fnx z E Q 01 n 12 Then limn fnz To apply the 1 DOT note that 0 S fnx S 1 To apply the MCT note that 0 S fnz S fn1 Hence limnffnzdx flimn fnzdx dx 1 Example 18 Interchange of differentiation and integration Let Qfy be a measure space and for any xed 6 E R let fw 6 be a Borel function on 9 Suppose that 6fw 6 66 exists ae for 6 E ab C R and that 6fw666 S 9a ae where g is an integrable function on 9 Then for each 6 E ab 6fw666 is integrable and by Theorem 11ii d i 5fw79 Efw6dVi766 dy Theorem 12 Change of variables Let f be measurable from Qfy to Ag and g be Borel on Ag Then 9 o W gdltuo fl 9 A i e if either integral exists then so does the other and the two are the same For Riemann integrals fgydy fgfzf dz y For a random variable X on Q P EX f9 XdP R deX PX P o X 1 Let Y be a random vector from Q to 72k and g be Borel from 72k to R 1 E9Y In dPgm fm 9ydPY Example Y X1X2 and gY X1 X2 EX1 X2 EX1 EXZ why R deX R deXz We need to handle two integrals involving PX1 and PX2 On the other hand EX1 X2 R deX1X2 which involves one integral wrt PXWXT Unless we have some knowledge about the joint cdf of X1X2 it is not easy to obtain PXWXT lterated integration on a product space Theorem 13 Fubini7s theorem Let V be a U nite measure on 9 239 1 2 and let f be a Borel function on H 1Qi whose integral wrt V1 gtlt V2 exists Then 9W2 Q1 fW17w2dV1 exists ae V2 and de nes a Borel function on 92 whose integral wrt V2 exists and 21gtlt22 fltw17w2dyl X W 92 91 fltW17W2dV1 Note If f 2 0 then f fdyl gtlt V2 always exists Extensions to ELI 9i is straightforward dVZ Fubini7s theorem is very useful in 1 evaluating multi dimensional integrals exchanging the order of integrals 2 proving a function is measurable 3 proving some results by relating a one dimensional integral to a multi dimensional integral Example Exercise 47 Let X and Y be random variables such that the joint cdf of X Y is FXxFyy where FX and Fy are marginal cdf7s Let Z X Y Show that 1122 Fm 7 mama Note that 1122 yltzdFXzdFyy Fm 7 mama where the second equality follows from Fubini7s theorem Example 19 Let 91 92 0 1 2 and V1 V2 be the counting measure A function f on 91 gtlt 92 de nes a double sequence lf ffdyl gtlt V2 exists then de1X V2 22mm 22mm i0 j0 j0 i0 by Theorem 13 and Example 15 Thus a double series can be summed in either order if it is well de ned Proof of Fubini7s theorem Lecture 3 Integration Integration is a type of average De nition 14 a The integral of a nonnegative simple function p wrt V is de ned as ltpdy im fii b Let f be a nonnegative Borel function and let Sf be the collection of all nonnegative simple functions satisfying w 3 fw for any w E Q The integral of f wrt V is de ned as fdysupltpdy ESf Hence7 for any Borel function f 2 07 there exists a sequence of simple functions p17 p27 such that 0 3 pl 3 f for all 239 and lirnyH00 fltpndy ffdy c Let f be a Borel function7 fw maXfw7 0 be the positive part of f7 and flw maXfw7 0 be the negative part of f Note that fir and f are nonnegative Borel functions7 fw fw 7 f Lu7 and lfwl fw f We say that f de exists if and only if at least one of ffdy and f dy is nite7 in which case defdy7fdy When both f fdy and f fdV are nite7 we say that f is integrable Let A be a measurable set and IA be its indicator function The integral of f over A is de ned as Afdy1Afdy A Borel function f is integrable if and only if lfl is integrable For convenience7 we de ne the integral of a measurable function f from 97 f V to 7727 5 where 7 RU 700007 B 0BU oo7oo Let Air f 00 and A f 700 If VA 07 we de ne ffdy to be fIAgrerdV otherwise ffdy oo f dy is similarly de ned If at least one of ffdy and f dy is nite7 then ffdy ffdyif dy is well de ned Notation for integrals I W In W f fwdV f fwdvw f fwVdw ln probability and statistics fXdP EX EX and is called the empeetatz39on or expected value of X If F is the cdf of P on 7313 ffzdP fde Example 15 Let Q be a countable set f be all subsets of Q and V be the counting measure For any Borel function f de Z fw wEQ Example 16 lf 9 R and V is the Lebesgue measure then the Lebesgue integral of f over an interval ab is written as Mb fxd f5fxdx which agrees with the Riemann integral in calculus when the latter is well de ned However there are functions for which the Lebesgue integrals are de ned but not the Riemann integrals Properties Proposition 15 Linearity of integrals Let Q V be a measure space and f and g be Borel functions i If fde exists and a E R then fafdV exists and is equal to afde ii If both fde and fng exist and fde fng is well de ned then f gdV exists and is equal to fde fng A statement holds ae V or simply ae if it holds for all u in NC with VN 0 If V is a probability then ae may be replaced by as Proposition 16 Let Q V be a measure space and f and g be Borel i If f S g ae then fde S fng provided that the integrals exist ii If f 2 0 ae and fde 0 then f 0 ae Proof Exercise ii Let A f gt 0 and An f 2 n l n 12 Then An C A for any n and limH00 An UAn A why7 By Proposition 11iii limH00 VAn VA Using part i and Proposition 15 we obtain that n 1VAL n11AdV gfIAndVgde0 for any 71 Hence VA 0 and f 0 ae Consequences mam mew If f 2 0 ae then fde 2 0 If f g ae then fde fng TA Yuan Jiang Email jiangy statwiscedu STAT 709 Discussion 18 November 137 2007 1 Asymptotic mse Let Tn be an estimator of 197 and an be a sequence of positive real numbers such that an 7 00 or an 7 a gt 0 Suppose that anTn 7 19 i K with 0 lt EYZ lt 00 Then EY abiasTnP 7 an 2 amseTnP E a n Example 1 Let X17 7X be iid from P with EX1 lt 00 and unknown mean M E R and variance 02 gt 0 Consider the estimation of 19 M2 with the following 3 estimators 1 TM X2 2 Tgn X2 7 572 3 Tgm max07 TM where X is the sample mean and Sis the sample variance a Show that the regular amse of ijj 17 27 3 are the same if M 31 0 but may be different when M 0 b Which estimator has the smallest limiting regular amse when M 0 2 Asymptotic relative ef ciency Example 2 Let X17 7X be iid random variables with EX1 M7 var X1 1 and EX1 lt 00 Let TM 71 and Tgn X2 7 be i1 NH estimators of M2 a Find the asymptotic relative ef ciency of TM wrt T2 b Show that the asymptotic relative ef ciency of TM wrt Tgn is less than or equal to 1 if the distribution of Xi 7 M is symmetric about 0 and M 31 0 Of ce 1275A M80 1 Phone 262 1577 TA Yuan Jiang Email jiangy statwiscedu c Find a distribution P for which the asymptotic relative ef ciency of TM wrt Tgn is larger than 1 Of ce 1275A M80 2 Phone 262 1577 Lecture 12 Relationship among convergence modes and uniform integrability Theorem 18 Let X X1X2 be random k vectors i If Xn a X then Xn gt X The converse is not true ii If Xn gt X for an 7 gt 0 then Xn gt X The converse is not true iii If Xn gt X then Xn gtd X The converse is not true iv Skorohod s theorei n If Xn gtd X then there are random vectors YYl YZ defined on a common probability space such that Py PX Pyn PX n 1 2 and YT a Y A useful result a conditional converse of i iii v If for every 6 gt 0 2301 PXn X 2 e lt 30 then Xn a X A conditional converse of i PXn X Z 6 tends to 0 fast enough vi If Xn gt X then there is a subsequence anj 12 such that X gtM X j gt A partial converse of vii If Xn gtd X and PX c 1 Where c 6 Rk is a constant vector then Xn gt c A conditional converse of viii Suppose that Xn gtd X Then for any 7 gt 0 gggEHXnu EHXH lt co lt1 if and only if is unifownly integrable in the sense that 333ng lanllIianilrgt 0 2 A conditional converse of Discussion on uniform integrability If there is only one random vector then is HXIIHmwil 0 Which is equivalent to the integrability of X doi ninated convergence theorei n Suf cient conditions for uniform integrability sup EXn6 lt 30 for a 5 gt 0 T6 This is because llX H6 513 81712913 lanlanlx39nnac lt 11111811913lanllUaniwn quot T itaoo n t6 1 t 6 SgggsngOIanli 0 Exercises 1 1 7 1 20 Proof of Theorem 18 The result follows from Lemma 14 ii The result follows from Chebyshev s inequality with t tT iii Assume k 1 The general case is proved in the textbook Let x be a continuity point of F X and e gt 0 be given Then FXx ePX x PXn xPX x 6Xngtx FXxPXn X gt6 Letting n gt 30 we obtain that FX e g limginf Fxn Switching X7 and X in the previous argument we can show that FX 6 Z limnsup Fxn Since 6 is arbitrary and F X is continuous at x F X x limH00 F iv The proof of this part can be found in Billingsley 198639 pp 399 402 v Let A X X Z 6 The result follows from Lemma 14 Lemma 15i and Proposition 11 iii vi X7 gt X means limH00 PXn X gt e 0 for every 6 gt 0 That is for every 6 gt 0 PXn X gt e lt 6 for n gt m m is an integer depending on 6 For every 1 1 2 there is a positive integer H such that PX X gt 2 lt 2 For any 6 gt 0 there is a k5 such that for j 2 k5 PXj X gt e lt PXj X gt 2 Since 2391 2 3 1 it follows from the result in that X gtM X j gt vii The proof for this part is left an exercise viii First by part iv we may assume that X7 a X why39 Proof of implies 1 Note that the uniform integrability of implies that sup E lt oc why39 By Fatou s lemma Theorem 11i g liminfn lt Hence 1 follows if we can show that limsupEHXnHT g n For any 6 gt 0 and t gt 0 let A X XT lt e and B XT gt t Then Elanll Eanll1Amnn EllanllIAmB EXnl14n S Eanl1m 14 ElanlAn ll For 7 S 1 lanIAnlli S 0an Xlli llXHDIAn and ElanIAnllT S EWan XHI llelillAnl S 5 EllelI 2 For 7 gt 1 an application of Minkowski s inequality leads to ElanlAnll EIKXn XML Xian SE an XIAnIlT 39l39 llXIAnllTT 39 v T EIMXR anuzi EHXIAnWT e EllxuzrT In any case since 6 is arbitrary limsupnEHXRIAnHT g EHX This result and the previ ously established inequality imply that lim sup EHXR 3 lim sup n H mm M33130 Mg lii nsupEXnIAn n S8EflanllUMans EHXHZ since gt 0 Since is unifori nly integrable letting t gt oc We obtain Proof of 1 impies Let En Xn1 l XI l Then En gtM 0 and n 3 F X which is integrable By the dominated convergence theorei n E5 gt 0 this and 1 imply that EUWMWM EfllelUHn gt 0 From the definition of B B C HXR XHT gt tB U gt tB Since EHX lt 30 it follows from the dominated convergence theorem that Efllel aixnixeirmzl 0 Hence limsupEXnIgn g lii nsupEXIgn g EXIE Xi rgt 2 T6 T6 Lettin V t gt oc it follows from the dominated conver Vence theorem that 8 lim limsupEXnIgn lt lim EXI Xi rgt 2 0 taco n 7 taco This proves Lecture 5 RadonNikodym derivative Let 9 V be a measure space and f be a nonnegative Borel function Note that AA de A e f A is a measure satisfying VA 0 implies AA 0 we say A is absolutely continuous wrt V and write A lt V Comupting AA can be done through integration wrt a well known measure A lt V is also almost suf cient Theorem 14 Radon Nikodym theorem Let V and A be two measures on 9 and V be o nite If A lt V then there exists a nonnegative Borel function f on 9 such that AA A de Aef Furthermore f is unique ae V ie if AA 14ng for any A E 7 then f g ae V The function f is called the Radon Nikodym derivative or density of A wrt V and is denoted by dAdV Consequence If f is Borel on 9 and 14de 0 for any A E 7 then f 0 ae If f de 1 for an f 2 0 ae V then A is a probability measure and f is called its probability density function pdf wrt V For any probability measure P on 7211811 corresponding to a cdf F or a random vector X if P has a pdf f wrt a measure V then f is also called the pdf of F or X wrt V Example 110 Discrete cdf and pdf Let a1 lt a2 lt be a sequence of real numbers and let p n 1 2 be a sequence of positive numbers such that 22110 1 Then i 2110 an xltanh n12 0 foo lt z lt a1 is a stepwise cdf It has a jump of size pn at each an and is at between an and an n 1 2 Such a cdf is called a discrete cdf The corresponding probability measure is PA 2 pi AEf iai6A where f the set of all subsets power set Let V be the counting measure on the power set Then PA Ade Z fa A c 9 aiEA 1 where fa pi i 12 That is f is the pdf of P or F wrt V Hence any discrete cdf has a pdf wrt counting measure A pdf wrt counting measure is called a discrete pdf Example 111 Let F be a cdf Assume that F is differentiable in the usual sense in calculus Let f be the derivative of F From calculus fydy z E R Let P be the probability measure corresponding to F Then PA A fdm for any A E B where m is the Lebesgue measure on R f is the pdf of P or F wrt Lebesgue measure Radon Nikodym derivative is the same as the usual derivative in calculus A continuous cdf may not have a pdf wrt Lebesgue measure A necessary and suf cient condition for a cdf F having a pdf wrt Lebesgue measure is that F is absolute continuous in the sense that for any 6 gt 0 there exists a 6 gt 0 such that for each nite collection of disjoint bounded open intervals ahbi EU 7 a lt 6 implies ZFbi 7 Fa lt 6 Absolute continuity is weaker than differentiability but is stronger than continuity Note that every cdf is differentiable ae Lebesgue measure Chung 1974 Chapter 1 A pdf wrt Lebesgue measure is called a Lebesgue pdf Proposition 17 Calculus with Radon Nikodym derivatives Let V be a o nite measure on a measure space 9 All other measures discussed in i iii are de ned on 9 i If A is a measure A lt V and f 2 0 then dA fdA fidV dV Notice how the dV7s cancel77 on the right hand side ii lfAi i 1 2 are measures and A lt V then A1 A2 lt V and do A2 n n Cl C Cl ae V iii Chain rule If 739 is a measure A is a o nite measure and 739 lt A lt V then dT dT dA E g ae V In particular if A lt V and V lt A in which case A and V are equivalent then dA dV 1 E ae V or A iv Let QEV be a measure space and V be o nite i 12 Let A be a o nite measure on and A lt W i 1 2 Then A1 gtlt A2 ltlt V1 gtlt V2 and dA1 gtlt A2 dA1 d2 dltV1 X V2 W1W2 Ew1diyzw2 ae V1 gtlt V2 Lecture 6 pdf and transformation Example 112 Let X be a random variable on 97 P whose cdf FX has a Lebesgue pdf fX and FXC lt 17 where c is a xed constant Let Y minXc7 ie7 Y is the smaller ofX and 0 Note that Y 17oo7 9 if z 2 c and Y 17oo7x X 1oo if z lt 0 Hence Y is a random variable and the cdf of Y is 1 20 m lt c This cdf is discontinuous at 07 since FXC lt 1 Thus7 it does not have a Lebesgue pdf It is not discrete either Does Py7 the probability measure corresponding to Fy7 have a pdf wrt some measure De ne a probability measure on R787 called point mass at c7 by 1 06A MA 0 A A68 C 7 Then Py lt m 60 where m is the Lebesgue measure7 and the pdf of Py is 0 zgtc dP Y 17FXC zc dm 6 95 fXx z lt 0 Example 114 Let X be a random variable with cdf FX and Lebesgue pdf fX7 and let Y X2 Since Y 17oo7 is empty if m lt 0 and equals Y 107x X 17 7 if x 2 07 the cdf of Y is FYW P 0 Y 100796l P o flue WE FXW e FXHE if z 2 0 and Fyx 0 if z lt 0 Clearly7 the Lebesgue pdf of Fy is V fY95 leE fX El1ooo95 i 1 2W ln particular7 if 1 2 7 ix 2 m i e M 7W which is the Lebesgue pdf of the standard normal distribution N017 then 1 fY MeiwZ om 1 which is the Lebesgue pdf for the chi square distribution x Table 12 This is actually an important result in statistics Proposition 18 Let X be a random k vector with a Lebesgue pdf fX and let Y gX where g is a Borel function from 7211811 to 7211811 Let A1 Am be disjoint sets in Bk such that 72k 7 A1 U U A has Lebesgue measure 0 and g on A is one to one with a nonvanishing Jacobian ie the determinant Det69x6 31 0 on A j 1 m Then Y has the following Lebesgue pdf fY96 lDet 5hj9 59 l fX hj9 7 a where h is the inverse function of g on A j 1 m In Example 114 A1 7000 A2 000 g 2 h1x i h2z and ldhj9 dl 12xil Example 115 Let X X1X2 be a random 2 vector having a joint Lebesgue pdf fX Consider rst the transformation g 1 1 x2 Using Proposition 18 one can show that the joint pdf of gX is fgX9517l fX9517y 961 where y x1 2 note that the Jacobian equals 1 The marginal pdf of Y X1 X2 is then My fx961y 7 mm In particular if X1 and X2 are independent then My fX1ltz1gtfX2lty 7 zoom Next consider the transformation hz1 x2 zlxg 2 assuming that X2 31 0 as Using Proposition 18 one can show that the joint pdf of hX is fhX27962 l izleZ95279527 where z slzg The marginal pdf of Z X1X2 is fz2 llefxltzz2x2gtdx2 In particular if X1 and X2 are independent then fzltzgt l zle12952fX2952d952 Example 116 t distribution and F distribution Let X1 and X2 be independent random variables having the chi square distributions xi and x Table 12 respectively The pdf 2 of ZX1X2 is ZmZ l o gtZ 1 924 i gt00 n n 71zz 2 M2 2n1n22Pn12Pn22A 2 5 2 d PKnl l 712 2l ZmZTI I Pltn12gtrltn22gt 1 zgtltmmv2 09 Using Proposition 187 one can show that the pdf of Y Xln1X2n2 712711Z1S the pdf of the F distribution me given in Table 12 Let U1 be a random variable having the standard normal distribution N071 and U2 a random variable having the chi square distribution xi Using the same argument7 one can show that if U1 and U2 are independent7 then the distribution of T U14U2n is the t distribution tn given in Table 12 Noncentral chi square distribution Let X177Xn be independent random variables and Xi NM7027 2 177n The distribution of Y XE XfU2 is called the noncentral chi square distribution and denoted by g 7 where 6 M Jana02 is the noncentrality parameter x with 6 0 is called a central chi square distribution It can be shown exercise that Y has the following Lebesgue pdf 7 6 2 7 6 622 fz nw 20 j 39 where fkx is the Lebesgue pdf of the chi square distribution X13 lfY177Yk are independent random variables and K has the noncentral chi square distribu tion xii6i7 2 17 7 k7 then Y Y1 Yk has the noncentral chi square distribution X211Wlt61 61 Noncentral t distribution and F distribution in discussion Theorem 15 Cochran7s theorem Suppose that X NnmJn and XTX XTAlX XTAkX7 where In is the n gtlt 72 identity matrix and Ai is an n gtlt n symmetric matrix with rank m7 2 17 7 k A necessary and suf cient condition that XTAlX has the noncentral chi square distribution xii6i7 2 17 7 k7 and XTAlX7s are independent is n 721 nk7 in which case 61 MTAM and 61 6k p722 Lecture 24 Bayes rules minimax rules point estimators and hypothesis tests The second approach to nding a good decision rule is to consider some characteristic RT of RTP7 for a given decision rule T7 and then minimize RT over T E S The following are two popular ways to carry out this idea The rst one is to consider an average of RTP over P E 73 mm p RTltPgtdHltPgt7 where H is a known probability measure on 737 73 with an appropriate U eld f7 rTll is called the Bayes risk of T wrt H lf Ti 6 S and TR H S rTll for any T 6 37 then Tl is called a S Bayes rule or Bayes rule when 3 contains all possible rules wrt H The second method is to consider the worst situation7 ie7 suppep RTP lf Ti 6 S and sup RTxP supRTltPgt P673 P673 for any T 6 37 then Tl is called a S mim39mtw rule or minimaX rule when 3 contains all possible rules Bayes and minimaX rules are discussed in Chapter 4 Example 225 We usually try to nd a Bayes rule or a minimaX rule in a parametric problem where P P9 for a 6 6 73 Consider the special case of k 1 and L097 1 t9 7 127 the squared error loss Note that am R Ere e TltXgtPdnlt6gt7 which is equivalent to E0 7 TX27 where 0 is a random variable having the distribution H and7 given 0 07 the conditional distribution of X is P9 Then7 the problem can be viewed as a prediction problem for 0 using functions of X Using the result in Example 1227 the best predictor is Et9lX7 which is the S Bayes rule wrt H with 3 being the class of rules TX satisfying ETX2 lt 00 for any 0 As a more speci c example7 let X X1L7 7 X with iid components having the NW7 02 distribution with an unknown u 6 E R and a known 027 and let H be the NM07U distribution with known 0 and 03 Then the conditional distribution of 0 given X z is Nurz7 02 with 2 T 7103 02M 2 2 2 7100 7 2 00039 2722 and 7100 039 z C 7103 02 The Bayes rule wrt H is E0lX X In this special case we can show that the sample mean X is minimaX For any decision rule T7 SEgRTw R RT19dH19 ZRR 6dH19 WimWW EVHW M4m XH M 027 where MAX is the Bayes rule given in 1 and 02 is also given in Since this result is true for any 173 gt 0 and 02 a 0271 as 173 a 007 2 m 21 w 9672 n 9672 where the equality holds because the risk of X under the squared error loss is 0271 and independent of 6 M Thus7 X is minimaX A minimaX rule in a general case may be dif cult to obtain It can be seen that if both M and 172 are unknown in the previous discussion7 then sup R209 007 2 9672gtlt0oo where 6 702 Hence X cannot be minimaX unless 2 holds with X replaced by any decision rule T7 in which case minimaXity becomes meaningless Statistical inference Point estimators7 hypothesis tests7 and con dence sets Point estimators Let TX be an estimator of 19 E R Bias bTP ETX 7 19 Mean squared error mse mseTP ETX i 19 mm VarTX Bias and mse are two common criteria for the performance of point estimators Example 226 Let X17 7Xn be iid from an unknown cdf F Suppose that the parameter of interest is 19 1 7 Ft for a xed t gt 0 If F is not in a parametric family7 then a nonparametric estimator of Ft is the empirical cdf n l Fnt ZIOO JX 7 t6 R i 1 2 Since IXX17 7 IOotXn are iid binary random variables with PIOotXZ 1 Ft7 the random variable nFnt has the binomial distribution BiFt7 Consequently7 Fnt is an unbiased estimator of Ft and VarFnt mSantP Ft17 Since any linear combination of unbiased estimators is unbiased for the same linear combi nation of the parameters by the linearity of expectations7 an unbiased estimator of i9 is UX 1 7 Fnt7 which has the same variance and mse as The estimator UX 1 7 Fnt can be improved in terms of the mse if there is further information about F Suppose that F is the cdf of the exponential distribution E076 with an unknown 19 gt 0 Then 19 e te The sample mean X is suf cient for 19 gt 0 Since the squared error loss is strictly convex7 an application of Theorem 25ii Rao Blackwell theorem shows that the estimator TX E1 7Fnt 1X17 which is also unbiased7 is better than UX in terms of the mse Figure 21 shows graphs of the mse7s of UX and TX7 as functions of 197 in the special case of n 107t 27 and 1 7 e m9I0mz Hypothesis tests To test the hypotheses H0 P 6730 versus H1 P 7317 there are two types of statistical errors we may commit rejecting H0 when H0 is true called the type I error and accepting H0 when H0 is wrong called the type II error A test T a statistic from X to 071 Pprobabilities of making two types of errors aTP PTX 1 P e 730 3 and 17 aTP PTX 0 P E 731 4 which are denoted by aT0 and 1 7 aT0 if P is in a parametric family indexed by 19 Note that these are risks of T under the 0 1 loss in statistical decision theory Error probabilities in 3 and 4 cannot be minimized simultaneously Furthermore7 these two error probabilities cannot be bounded simultaneously by a xed 04 E 071 when we have a sample of a xed size A common approach to nding an optimal test is to assign a small bound Oz to one of the error probabilities7 say 04TP7 P E 7307 and then to attempt to minimize the other error probability 1 7 04TP7 P E 7317 subject to sup aTP S CY 5 P6730 The bound 04 is called the level of signi cance The left hand side of 5 is called the size of the test T The level of signi cance should be positive7 otherwise no test satis es 5 except the silly test TX E 0 as 73 Example 228 Let X177Xn be iid from the NM702 distribution with an unknown M E R and a known 02 Consider the hypotheses H0 M 3 M0 versus H1 M gt M07 where M0 is a xed constant Since the sample mean X is suf cient for M E 727 it is reasonable to consider the following class of tests TAX 090027 ie7 H0 is rejected accepted if X gt c X S c7 where c E R is a xed constant Let ltIgt be the cdf of N071 Then7 by the property of the normal distributions7 O TJM PTCX 11 I t Figure 22 provides an example of a graph of two types of error probabilities7 with M0 0 Since ltlgtt is an increasing function of t7 sup aw 1 q P6730 U ln fact7 it is also true that x C M0gt39 Sign 7 am ltIgt U P If we would like to use an 04 as the level of signi cance7 then the most effective way is to choose a ca a test Tea such that a sup aTaa 0 P673 in which case ca must satisfy 1 q WW 7 0 a7 a ie7 ca Uzla M07 where 2a ltlgt 1a In Chapter 67 it is shown that for any test TX satisfying 57 1 WW 217 WaxLL M gt Mo TA Yuan Jiang Email jiangy statwiscedu STAT 709 Discussion 1 September 117 2007 1 a elds and 7r Asystems Let Q be an arbitrary set7 and 73 be the power set of Q o 7r system D Q 73 is said to be a W system if it is closed under nite intersection7 ie7 if A7 B E D implies A B E D o A system Q 73 is said to be a A system if it satis es i Q 6 g ii lfA7B andAQB7thenBA iii If A 6 and A g An V71 then QIAn e L Dynkin7s 7T Theorem If D is a 7T system and is a A system con taining D7 then 2 039 D Example 1 Let V and A be two measures on a U eld f on 9 such that VA MA for any A 6 C7 where C Q f is a 7T system Assume that there exist Al 6 C723917277 such that liAi Q and VAi lt 00 for all 239 Show that VA MA for any A E 0C7 where 0C is the smallest U eld containing C 2 Borel Functions Example 2 Let f be a Borel function on R2 Let yo be a xed point in R7 de ne a function g from R2 to R as W Anao Show that g is Borel ls it true that f is Borel from R2 to R if ay is Borel from R to R for any xed z or any xed y Of ce 1275A M80 1 Phone 262 1577 TA Yuan Jiang Email jiangy statwiscedu STAT 709 Discussion 17 November 87 2007 1 Bayes Rules and MinimaX Rules Example 1 Let X be a sample having a probability density fjz with respect to a U nite measure V where j is unknown andj E 17 7 J with a known integer J 2 2 Consider a decision problem in which the action space is 17 7 J and the loss function is i Oifaj Ljia 1 ifay j a Obtain the risk of a decision rule which may be randomized b Let H be a prior probability measure on 17 J with 73 j 17 7 J Obtain the Bayes risk of a decision rule A 0 Obtain a Bayes rule under the prior H in d Assume that J 27 7T1 7T2 057 and fjz z 7 W where z is the Lebesgue density of the standard normal distribution and W7 j 17 27 are known constants Obtain the Bayes rule in e Obtain a minimaX rule when J 2 2 Hypothesis Tests Example 2 Let X17 7X be iid random variables having the exponen tial distribution E0t97 6 E 07 00 Consider the hypothesis H06 60vs H16gt60 where 00 gt 0 is a xed constant For the testing rule TAX 1000X a Compute its size b Find a ca such that Tom has size a where 04 is a given level of signi cance c Find the p value for Tea Of ce 1275A MSC 1 Phone 262 1577 TA Yuan Jiang Email jiangy statwiscedu 3 Con dence sets Example 3 Let X17 7X be iid from the exponential distribution Ea76 with unknown 1 E R and 6 E 07 00 Let 04 E 071 be given a Using T1 X 21Xi 7 X007 construct a con dence interval for 6 with con dence coef cient 1 7 Oz and nd the expected interval length b Using T1X and T2X XOL7 construct a con dence interval for a with con dence coef cient 1 7 a c Using the method in Example 2327 construct a con dence set for the two dimensional pararneter 176 with con dence coef cient 1 7 a 4 Consistency Example 4 Let X17 7X U67 767 where 6 E R is an unknown parameter Show that M 2 is strongly consistent for 6 and also consistent in mean squared error Of ce 1275A M80 2 Phone 262 1577 Lecture 11 Convergence modes and stochastic orders 0 Chwk E 73 llCllr 321 WWIT 7 gt 0 lf 7 2 17 then Hell is the LT distance between 0 and c When 7 27 Hellg xCTC De nition 18 Let X7 X17X27 be random k vectors de ned on a probability space i We say that the sequence Xn converges to X almost surely as and write Xn at X if and only if limH00 Xn X as ii We say that Xn converges to X in probability and write Xn a1 X if and only if7 for every xed 6 gt 07 EXH gt e 0 iii We say that Xn converges to X in L or in rth moment and write Xn LT X if and only if ggngoEHX e XH o where r gt 0 is a xed constant iv Let F7 Fn7 n 1727 7 be cdf7s on 72k and P7 P 7 n 177 be their corresponding probability measures We say that converges to F weakly or Pn converges to P weakly and write F aw F or P aw P if and only if7 for each continuity point z of F7 We say that Xn converges to X in distribution or in law and write Xn ad X if and only if FXR aw FX lms p7 LT How close is between Xn and X as n a 007 FX aw FX Xn and X may not be close they may be on different spaces Example 126 Let 9 1 n 1 and Xn be a random variable having the exponential distribution E070n Table 127 n 127 Let X be a random variable having the exponential distribution E071 For any x gt 07 as n a 007 FXnz 1 E aim9quot E17 67m Since FXTL E 0 E for x S 07 we have shown that Xn ad X Xn a X Need further information about the random variables X and X We consider two cases in which different answers can be obtained First7 suppose that Xn E QnX then Xn has the given cdf Xn E X 0n 7 1X n lX7 which has the cdf 1 E e mI0oo PanEXl 26 e MEO for any 6 gt 0 ln fact7 by Theorem 18v7 Xn a X Since Ean E Xl n pEX lt 00 for any p gt 07 Xn ELF X for any p gt 0 1 Next7 suppose that Xn and X are independent random variables Since pdf7s for Xn and 7X are gle m9 IOmz and emIX07 respectively7 we have P Mn 7 X1 3 e egleiwQneyiwnmmpwWm7 which converges to by the dominated convergence theorem e mey mI0ooxIoowydxdy 1 7 6 5 Thus7 P an 7 X1 2 6 7 6 5 gt 0 for any 6 gt 0 and7 therefore7 Xn 717 X does not hold Proposition 116 Polya7s theorem If F 7w F and F is continuous on 72117 then lim sup 7 0 Lemma 14 For random k vectors X7X17X27 on a probability space7 Xn 7 X if and only if for every 6 gt 07 431 gm 7 XH gt 6 7 0 lt1 Proof Let Aj Ufa fno l 7 XH S j l7 j 127 Then 5A 7 w gin mm 7 Xw 391 By Proposition 11iii7 7 PM 7 431 lt W 7 XH 7 1 74133013 lt W 7 XH gtj1 1 holds for every 6 gt 0 if and only if PAj 1 for every j7 ie7 P 1AJ 1 PM mm 717mm 217 PltA2gt Lemma 15 Borel Cantelli lemma Let An be a sequence of events in a probability space and lim sup An 1201 Ufno l Am i If 221 PAn lt 007 then Plim supn An 0 ii If A17A27 are pairwise independent and 221 PAn 007 then Plimsupn An 1 Proof By Proposition 117 PltlimsupAngt Plt U Am JLIEOPltU 10339 2 PM 0 n1m n if 221 PAn lt oo ii We prove the case of independent Ans P lt1imsupAngt nliHmOP lt U Am 1 133013 lt Ag 1311330 H PAfn nk nk nk 4 H PW H 17 PltAmgti H expePltAmgt expi Z HAM 1 it 3 equot expt Letting k a 007 00 nk 00 WE PAm EnPAm 3 exp 7 m2 PAm 0 See Chung 19747 pp 76 78 for the pairwise independence Ans The notion of O7 07 and stochastic and 0 ln calculus7 two sequences of real numbers7 an and bn7 satisfy an 0bn if and only if lanl S clbnl for all n and a constant c an 0bn if and only if anbn a 0 as n a 00 De nition 19 Let X17 X27 be random vectors and 317567 be random variables de ned on a common probability space i Xn 0Yn as if and only if 1 ii Xn 0Yn as if and only if XnYn at 0 iii Xn 0170 if and only if7 for any 6 gt 07 there is a constant OS gt 0 such that supnPOanH 2 04m lt 6 iv Xn 017Yn if and only if XnYn 0 Since an 01 means that an is bounded7 Xn is said to be bounded in probability if Xn 0171 017Yn implies Xn 0170 Yn and Yn OpZn implies Xn OpZn Yn does not imply Yn OpXn If X 042 then XnYn 0YnZn If X 042 and Yn 0Zn then X Y 042 The same conclusion can be obtained if Op and 0p are replaced by O as and 0 a s7 respectively If Xn ad X for a random variable X7 then Xn Op1 lf Eanl 0an7 then Xn Opan7 where an E 07 lf Xn at X7 then supn anl Op1 X X 017 X 017 Lecture 10 Markov chains An important example of dependent sequence of random variables in statistical application A sequence of random vectors Xn n 17 27 is a Markov chain or Markov process if and only if PBlX177Xn P Ban as7 B UXn17 n 2737 1 Xn1 tomorrow is conditionally independent of X1L7 7Xn1 the past7 given Xn today X1L7 7Xn1 is not necessarily independent of XanH A sequence of independent random vectors forms a Markov chain Example 124 First order autoregressive processes Let 817827 be independent random variables de ned on a probability space7 X1 817 and Xn1 an8n17 n 17 27 7 where p is a constant in R Then Xn is called a rst order autoregressive process We now show that for any B E B and n 17277 PXn1 E BlX177Xn P 5n1 B 7 an PXn1 E Ban as7 where B 7 y x E R z y E B7 which implies that Xn is a Markov chain For any y E R7 IauB7m7PeMyemmmwwamo and7 by Fubini7s theorem7 Pgn1B 7y is Borel Hence7 Pgn1B 7an is Borel wrt 0Xn and7 thus7 is Borel wrt 0X177Xn Let B 6 57 j 177n7 and A 1X1BJ Since 6n an Xn1 and an is independent of X1L7 7Xn7 it follows from Theorem 12 and Fubini7s theorem that Pg B7 XndP dPS MP A n1 p mjijj1rL teB7pmn n1lt dPXSn1 7 t 1739 EBj j1nmn1EB PmmmMm where X and z denote X1L7 7Xn and 17 man7 respectively7 and xn denotes pxn t Using this and the argument in the end ofthe proof for Proposition 1117 we obtain PXn1 E BlX177Xn Pgn1B 7 an as The proof for Pgn1B 7 an PXn1 E Ban as is similar and simpler Characterizations of Markov chains Proposition 112 A sequence of random vectors Xn is a Markov chain if and only if one of the following three conditions holds a For any n 2737 and any integrable hXn1 with a Borel function h7 EhXn1lX17 quot7an EhXn1an as b For any n 1727 and B E 0XnH7Xn127 PBlX17 7Xn PBan as the past and the future are conditionally independent given the present c For any n 27377 A E 0X177Xn7 and B E 0XnH7Xn127 PA m Ban PAanPBan as Proof It is clear that a irnplies If h is a simple function7 then 1 and Proposition 110iii imply a If h is nonnegative7 then there are nonnegative sirnple functions hl S hg S S h such that hj a h Then 1 together with Proposition 110iii and X imply a Since h I11 7 I117 we conclude that 1 implies a ii It is also clear that b irnplies We now show that 1 implies Note that 0Xn117Xn127 039 U210Xn177Xn Exercise 19 Hence7 it suf ces to show that PBlX177Xn PBan as for B E 0Xn1177Xn7 for anyj 1727 We use induction The result for j 1 follows from Suppose that the result holds for any B E 0Xn1177Xn To show the result for any B E 0Xn1177Xnj17 it is enough why to show that for any B1 6 0Xn1j11 and any B2 6 0Xn1177Xn7 PB1 Bng177Xn PB1 Bngn as From the proof in i7 the induction assurnption irnplies EhXn17 7XnjlX17 quot7an EhXn17 7Xnjan 2 for any Borel function h The result follows from E139g113932lX17 7Xn EELg1IBZlX17 7XnjlX17 quot7an IBZEUBIth 7XnjlX17 quot7an IBZEIBIanjlX17quot7an IBZEIBI IBZEUBIle 7Xnjanl EIBIIB2le 7Xnjan IBIIBZan as7 A where the rst and last equalities follow from Proposition 110v7 the second and sixth equalities follow from Proposition 110vi7 the third and fth equalities follow from 17 and the fourth equality follows from iii Let A E 0X17 7Xn and B E 0Xn 7XnJr27 If b holds7 then EUAIBan EEIAIBiX17 7Xnani IAEIB X177XW XW IAEIBanani IAanEIBan7 E E E which is Assume that 0 holds Let A1 6 0Xn7 A2 6 0X177Xn17 and B E UXn17Xn27 Then EIBXndP IA2EIBXndP Al Ag A1 A EIA2EIBXnXndP 1 P IA2anEIBXndP E A1 EIAZIBXndP A1 A1 A2 Since disjoint unions of events of the form A1 A2 as speci ed above generate 0X17 7 X 7 this shows that EIBXn EIBX17 7Xn as7 which is Lecture 39 The method of moments The method of moments is the oldest method of deriving point estimators It almost always produces some asymptotically unbiased estimators although they may not be the best estimators Consider a parametric problem where X1 Xn are iid random variables from P9 6 E 9 C 73 and Elelk lt 00 Let Mj EXf be the jth moment of P and let A 1 n 39 WEZX 7 i1 be the jth sample moment which is an unbiased estimator of it j 1 k Typically Wh 97 j177k7 1 for some functions hi on 73 By substituting MS on the left hand side of 1 by the sample moments j we obtain a moment estimator ie satis es jhj67 j17quot397k7 which is a sample analogue of This method of deriving estimators is called the method of moments An important statistical principle the substitution principle is applied in this method Let i 1 k and h hlhk Then i h0 If the inverse function h 1 exists then the unique moment estimator of 6 is h lm When h l does not exist ie h is not one to one any solution of i is a moment estimator of 0 if possible we always choose a solution in the parameter space 9 In some cases however a moment estimator does not exist see Exercise 111 Assume that 9 for a function g If h 1 exists then 9 h l If g is continuous at M M1 Mk then is strongly consistent for 0 since i a Mj by the SLLN If g is differentiable at M and Elele lt 00 then is asymptotically normal by the CLT and Theorem 112 and amseiw n llV9MlTViV9M7 where VM is a k gtlt k matrix whose ijth element is M1 7 Li1739 Furthermore the n 1 order asymptotic bias of is lt2ngt1tr mom 1 Example 324 Let X17 7Xn be iid from a population P9 indexed by the parameter 6 M7027 where M EX1 E R and 02 VarX1 E 07 00 This includes cases such as the family of normal distributions7 double exponential distribu tions7 or logistic distributions Table 127 page 20 Since EX1 M and EX VarX1 EX12 02 M27 setting M1 M and M2 02 M2 we obtain the moment estimator X 209 42 X L152 i1 Note that X is unbiased7 but 7 1sz is not If Xi is normal7 then is suf cient and is nearly the same as an optimal estimator such as the UMVUE On the other hand7 if Xi is from a double exponential or logistic distribution7 then is not suf cient and can often be improved Consider now the estimation of 02 when we know that M 0 Obviously we cannot use the equation M1 M to solve the problem Using M2 M2 02 we obtain the moment estimator amp2 M2 714 221 X12 This is still a good estimator when Xi is normal7 but is not a function of suf cient statistic when Xi is from a double exponential distribution For the double exponential case one can argue that we should rst make a transformation Y1 and then obtain the moment estimator based on the transformed data The moment estimator of 02 based on the transformed data is Y2 71 1 21 XZW which is suf cient for 02 Note that this estimator can also be obtained based on absolute moment equations Example 325 Let X17 7Xn be iid from the uniform distribution on 1911927 foo lt 61 lt 62 lt 00 Note that EX1 91 932 and EX 0 63 01623 Setting M1 EX1 and M2 EXl2 and substituting 91 in the second equation by 2M1 7 02 the rst equation7 we obtain that 2 e 622 6 2i1 7 02m 3 which is the same as 92 302 3012 Since 02 gt EXL7 we obtain that z m 3022 7 f X i82 and l 31 3032 Lil X V 730271 52 These estimators are not functions of the suf cient and complete statistic X17Xn Example 326 Let X17 7 Xn be iid from the binomial distribution Bz39 7k with unknown parameters k E 1727 and p E 01 Since EX1 kp and EXI2 kp1ip k2 2 we obtain the moment estimators Z3 31 i 2 1 1 TASZX and A 7 7 k iii 31 Hi z X1 35290 The estimator f is in the range of 01 But k may not be an integer A It can be improved by an estimator that is k rounded to the nearest positive integer Example 327 Suppose that X17 7 Xn are iid from the Pareto distribution Paa7 9 with unknown 1 gt 0 and 6 gt 2 Table 127 page 20 Note that EX1 at6 71 and EXI2 6126 i 2 From the moment equation7 g 2 A A 9712 Note that 611 7 1 Hence 1 6194 99 i 2 xii 12 Since 6 gt 27 there is a unique solution in the parameter space 1 z zi i1 1 X2SZ and Exercise 108 Let X17 7Xn be a random sample from the following discrete distribution 7 72176 7 7 6 PltX171gt7 279 1309427276 where 6 E 071 is unknown Notethat 21 6 26 2 EXl 7 H777 2 7 6 2 7 6 2 7 639 Hence7 a moment estimator of 6 is 6 21 7 X l7 where X is the sample mean Now that 21 7 0 46 4 46 i 202 i 4 WM 249 t 292 W7 9 21 Mil 1 91M 262 2l22 9l2 2 922 By the central limit theorem and 5 method7 V7167 7d N 0739 The method of moments can also be applied to nonparametric problems Consider7 for example7 the estimation of the central moments CjEX1M1j7 j277k Since I J t 0739 Z JP41 W47 the moment estimator of 07 is where 10 1 It can be shown exercise that which are sample central moments From the SLLN7 ofs are strongly consistent If Elele lt 007 then x 52762775k70k 7d Nk107D where the 2397jth element of the k 7 1 gtlt k 7 1 matrix D is Cij2 01410741 1Ci0j2 j 0014201 1j10i0102 Lecture 34 The projection method Since 73 is nonparametric7 the exact distribution of any U statistic is hard to derive We study asymptotic distributions of U statistics by using the method of projection De nition 33 Let Tn be a given statistic based on X17 7 X The projection of Tn on kn random elements Y17 7 Yk is de ned to be u kn Tn ETn 7 Let ETanl lf Tn is symmetric as a function of X17 7Xn7 then wnX17zbnXn are iid with mean El nXil ElETanil ETn lf lt 00 and Var7nXl gt 07 then 1 71 WAX 7 ETnl d N01 1 nVar7nX1 by the CLT Let Tn be the projection of Tn on X17 7Xn Then n i1 If we can show that Tn 7 Tn has a negligible order of magnitude7 then we can derive the asymptotic distribution of Tn by using 1 2 and Slutsky7s theorem The order of magnitude of Tn 7 Tn can be obtained with the help of the following lemma Lemma 31 Let Tn be a symmetric statistic with VarTn lt 00 for every n and Tn be the projection of Tn on X17 7Xn Then ETn ETn and ETn i Tn VarTn i Varm Proof Since ETn Emil7 ETn i Tn VarTn Varm i 2CovTm Tn From De nition 33 with K Xi and kn n7 VarTn nVarETanZ The result follows from CovTan ETnTn i Ema nElTnETanil nlETnl2 nEElTnETanilXil nlETnl2 nElETanil2 nlETnl2 nVarETanZ VarTn This method of deriving the asymptotic distribution of Tn is known as the method of pro jection and is particularly effective for U statistics For a U statistic Un7 one can show exercise that o m U EUn g 23109 lt3 where 7 is the projection of Un on X17 Xn and h1x 7EhX1 Xm7 h1x EihltzX2 Xmgti Hence VarUn m2C1n and7 by Corollary 32 and Lemma 317 EUn i U 0n 2 lf 1 gt 07 then 1 holds with zbnXZ mhL Xi7 which leads to the result in Theorem 35i stated later lf 1 07 then T11 E 0 and we have to use another projection of U Suppose that 1 kl 0 and k gt 0 for an integer k gt 1 Consider the projection Um of Un on random vectors Xi17 Xlk7 1 3 2391 lt lt 2 S n We can establish a result similar to that in Lemma 31 and show that EUn i U 0nltk1gt Also7 see Ser ing 19807 534 With these results7 we obtain the following theorem Theorem 35 Let Un be a U statistic with EhX17Xm2 lt 00 i lf 1 gt 07 then WW 7 EUn ad N07m2C1 ii lf 1 0 but 2 gt 07 then niUn 7 EltUngti w W ZAJxi 71 lt4 11 where s are iid random variables having the chi square distribution x and Ms are some constants which may depend on P satisfying 221 A 2 We have actually proved Theorem 35i A proof for Theorem 35ii is given in Ser ing 19807 552 One may derive results for the cases where 2 07 but the case of either 1 gt 0 or C2 gt 0 is the most interesting case in applications lf 1 gt 07 it follows from Theorem 35i and Corollary 32iii that amseUnP m2C1n VarUn 001 By Proposition 24ii7 nlUn 7 EUn2 is uniformly integrable lf 1 0 but 2 gt 07 it follows from Theorem 35ii that amseUnP EYZnz7 where Y denotes the random variable on the right hand side of The following result provides the value of EYZ Lemma 32 Let Y be the random variable on the right hand side of Then EYZ m2 m71 2 lt2 2 Proof De ne k m m 7 1 17LE7JEMW71L kL2WH j1 It can be shown exercise that YE is uniformly integrable Since Yk 7d Y as k 7 007 limH00 EYkZ EYZ Theorem 18viii Since x fs are independent chi square random variables with E j 1 and VarX j 27 EYk 0 for any k and 7712771712 k EK377L7I7Q7 AQNX F m2m 7 D2 k 2 Agjr472zv 71 m2 m 7 1 2 LLLLLLQ 2 It follows from Corollary 32iii and Lemma 32 that 2 7 2 amseUnP WQnz V3FUn 00173 g0 Again7 by Proposition 24ii7 the sequence nlen 7 EUn2 is uniformly integrable We now apply Theorem 35 to the U statistics in Example 311 For Un Elsiltan XlXj7 C1 M202 Thus7 if M 31 07 the result in Theorem 35i holds with C1 202 If M 07 then 1 07 2 U4 gt 07 and Theorem 35ii applies However7 it is not convenient to use Theorem 35ii to nd the limiting distribution of U We may derive this limiting distribution using the following technique7 which is further discussed in 35 By the CLT and Theorem 1107 nXZUZ 7d X when M 07 where x is a random variable haVing the chi square distribution x Note that 7 nXZ U2 By the SLLN7 7 21Xi2 a 1 An application of Slu sky7s theorem leads to n 7 02 1 71 nUnUz ad x 7 1 Since M 07 this implies that the right hand side of 4 is 020 7 17 ie7 A1 02 and M 0 when j gt 1 For the one sample Wilcoxon statistic7 C1 VarF7X1 gt 0 unless F is degenerate Similarly7 for Gini7s mean difference7 C1 gt 0 unless F is degenerate Hence Theorem 35i applies to these two cases
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'