INTRO STAT INFERENCE
INTRO STAT INFERENCE STA 525
Popular in Course
Popular in Statistics
Helga Torp Sr.
verified elite notetaker
This 8 page Class Notes was uploaded by Helga Torp Sr. on Friday October 23, 2015. The Class Notes belongs to STA 525 at University of Kentucky taught by Staff in Fall. Since its upload, it has received 21 views. For similar materials see /class/228276/sta-525-university-of-kentucky in Statistics at University of Kentucky.
Reviews for INTRO STAT INFERENCE
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/23/15
Stat 5102 Notes Fisher Information and Con dence Intervals Using Maximum Likelihood Charles J Geyer March 7 2003 1 One Parameter 11 Observed and Expected Fisher Information Equations 789 and 7810 give two ways to calculate the Fisher infor mation in a sample of size n DeGroot and Schervish don t mention this but the concept they denote by 1709 here is only one kind of Fisher information To distinguish it from the other kind 1709 is called expected Fisher information The other kind 1M9 AQKXII I 11 is called observed Fisher information Note that the right hand side of 11 is just the same as the right hand side of 7810 in DeGroot and Schervish except there is no expectation It is not always possible to calculate expected Fisher information Sometimes you can t do the expectations in 789 and 7810 in DeGroot and Schervishi But if you can evaluate the log likelihood then you can calculate observed Fisher information Even if you can t do the derivatives you can approximate them by nite differences From the de nition of limit WK m Mxie e 7 Mme for small 6 Applying the same idea again gives approximate second derivatives 109 s W E N 7Anx6 e 7 many Angolan e 62 12 Since the last approximation has no actual derivatives it can be calculated whenever the log likelihood can be calculated The formula is a bit messy for hand calculation It s better to use calculus when possible But this nite difference approximation is well suited for computers Many computer statistical packages don t know any calculus but can do nite differences just net The relation between observed and expected Fisher information is what should now be a familiar theme consistent estimation If we write out what observed Fisher information actually is we get 82 Jnt9 7 11 w logfXll9 113 Since X1 X2 are assumed to be independent and identically distributed the terms of the sum on the right hand side of 113 are also independent and identically distributed and the law of large numbers says their average not sum we need to divide by n to get the average converges to the expectation of one term which by de nition is 719 that is 5109 L 19 14 or in sloppy notation 11709 m 71119 1709 So we can for large sample sizes use 11709 and 1709 interchangeably 12 PlugIn Actually we need another use of plugin We don t know 9 otherwise we wouldn t be trying to estimate it Hence we don t know either 11709 or 1709 We know the functions J7 and In but we don t know the true value of the parameter 9 where we should evaluate them However an old theme again we do have a consistent estimator L 19 which implies by the continuous mapping theorem Slutsky for a single sequence under the additional assumption that In is a continuous function H6371 i 19 or 1 I L 19 115a n The analogous equation for observed Fisher information lJ Q L 19 15b n doesn t quite follow from Slutsky and continuity of Jn it really requires that 114 be replaced by a socalled uniform law of large numbers which is way beyond the scope of this course However in nice problems both 115a and 115b are true and so both can be used in the plugin theorem to estimate asymptotic variance of the maximum likelihood estimator Using sloppy notation1 either of the following approximations can be used to construct con dence intervals based on maximum likelihood estimators 1 m Normalt91n n 1 117a m formal nonisloppy versions of 17a and 17b are magi2 9 7 e L Normal0 1 16a mm 9 7 e L Normal0 1 1610 The analogous equation for observed Fisher information 1 m Normal9 J fl li7b 13 Con dence Intervals The corresponding con dence intervals are n i c1n n712 18a where c is the appropriate 2 critical value for example 196 for 95 con dence or 1645 for 90 con dence The analogous equation for observed Fisher information n i cJ n712 1813 Example 11 Binomial Again We redo the binomial distribution The log likelihood is Ammo 7 x10gp n 7 9610g17 10 and two derivatives are x n 7 x x 7 n p p 17 p and x n 7 x xp7 7 1 9 nu 102 may lt gt We know from previous work with maximum likelihood that the MLE is n Plugging in n for p and writing as n n in L9 and attaching a minus sign gives the observed Fisher information 7 x n 7 x 15 1 7 Ian 711 n 7 n n 133 1 7 Ian 7 n 7 Ian The expected Fisher information calculation is very similar Taking minus the expectation of 19 using EX np gives n I 7 101 7 p and plugging in the consistent estimator n of p gives n I 15 7 7 n 7 pn17 1 So in this problem Sometimes this happens sometimes ob served and expected information are different In this problem either of the con dence intervals LSa or 18b turns out tobe A 17A Mic1M n where c is the 2 critical value which is the usual plugin con dence interval taught in elementary statistics courses In our binomial example we didn t really need the asymptotics of maximum likelihood to construct the con dence interval since more elementary theory arrives at the same interval But in complicated situations where there is no simple analytic expression for the MLE there is no other way to get the asymp totic distribution except using Fisher information An example is given on the we page http www statumn edugeyer5102examprlike html which we looked at before but now has a new section Fisher Information and Con dence Intervals 2 Multiple Parameters 21 Observed and Expected Fisher Information Matrices The story for maximum likelihood for multiple parameters is almost the same If the parameter is a vector 6 then instead of one rst derivative we have a vector of rst partial derivatives sometimes called BAH X 9 39 3 1 9 39 vxnme 2 21 BMW 394 the gradient vector and instead of one second derivative we have a matrix of second partial derivatives 32Axle 32Axle 32Axle 2 1 391392 39 39 39 3913 d 32Axle 32Axle 32Axle 392391 ae 39 39 39 aegaed Vixnme 2 22 32Axle 32Axle 32Axle d 91 aedaeg 393 As in the oneparameter case we have identities derived by di erentiation under the integral sign The multiparameter analog of 785 in DeGroot and Schervish is E9VnXl6 0 2 3a a vector equation2 which if you prefer can be written instead as d scalar equations EQ 60 1mdi 23b 1 And the multiparameter analog of the equivalence of 789 and 7810 in DeGroot and Schervish is Var9VWX 6 7E9V2AnX6 24a a matrix equation3 which if you prefer can be written instead as d2 scalar equations 8nXl6 8nXl6 82AXl6 E9 861 agj 7 E9 Belagj 271Hidi 24b We generally prefer the vector and matrix equations 23a and 24a because they are much simpler to read and write although we have to admit that this concise notation hides a lot of details s in the oneparameter case expected Fisher information is de ned as either side of 24a me Var9VnX 6 2 5 E9V2nXl6 The difference between the oneparameter and manyparameter cases is that in the rst the Fisher information is a scalar and in the second it is a matrix Similarly we de ne the observed Fisher information matrix to be the quan tity we are taking the expectation of on the right hand side of 24a m eVQMXlG 26 We also have the multiparameter analog of equation 7812 in DeGroot and Schervish In6 71116 and we often write 16 with no subscript instead of 116 22 PlugIn The multiparameter analogs of 15a and 1i5b l1 ni 16 27a n and 1 A P Jn6n 16 27b n 2Recall that the mean of a random vector Y 31 Yd is a vector H M1 Md having components that are the expectations of the components of the random vector that is when we write H EY we mean the same thing as M 239 1 d 3Recall that the mean ofa random vector Y Y1 Yd is a matrix M with components ij that are the covariances of the components of the random vector that is when we write M VarY we mean the same thing as mij CovYi 239j 1 d where n is the MLE hold in nice situations and as in the oneparameter situation we will be vague about exactly what features of a statistical model e it nice for maximum likelihood theory This allows us to use the natural plugin estimators and J7 which we can calculate in place of In6 and J7me which we can t calculate because we don t know the true value of the parameter 6 23 Multivariate Convergence in Distribution We didn t actually say what the convergence in probability in equations 27a and 27b means but it is trivial For any sequence of random vectors Y1 Y2 l l l and any constant vector a the statement P Yngta contains no more and no less mathematical content than the d convergence in probability statements P Ymaal 21Hl where Y7 Ynl l l l Ynd and a a1 l l l ad The situation with convergence in distribution is quite different The state ment D Y7 A Y 28 where Y is now a random vector contains much more mathematical content than the d convergence in distribution statements D YmeYl 21H dl 29 When we need to make the distinction we refer to 28 as joint convergence in distribution and to 29 as marginal convergence in distribution The vector statement 28 can actually be de ned in terms of scalar statements but not just d such statements The joint convergence in distribution statement 28 holds if and only if t Yn 3 t Y t e Rd What this means is that we must check an in nite set of convergence in distri bution statements for every constant random vector t we must have the scalar convergence in distribution tYn tYl However we don t actually check an in nite set of statements that would be tough We usually just use the central limit theorems And the univariate CLT quite trivially implies the multivariate CLTl Theorem 21 Multivariate Central Limit Theorem If X1 X2 is a sequence of independent identically distributed random vectors having mean vector u and variance matrix M an 1 n XnX z is the sample mean for sample size n then g 7 1 A Normal0M1 210 The trivial proof goes as follows Let Y be a random vector having the Normal0tMt distribution so 2110 can be rewritten xn7u AY1 2111 Then for any constant vector 1 the scalar random variables t Xz are indepen dent and identically distributed with mean t p and variance tMti Hence the univariate CLT says win 7 1 A Normal0 t Mt which can be rewritten tvmm 7 M A Normal0t Mt 212 But the distribution of tY is Normal0tMt so 2112 can be rewritten tRMin 7 1 A t Y since this is true for arbitrary vectors 1 this means 211 holdsi 24 Asymptotics of Maximum Likelihood With multivariate convergence theory in hand we can now explain the asymptotics of multiparameter maximum likelihood Actually it look just like the uniparameter cases You just have to turn scalar quantities into vectors or matrices as appropriate In nice situations again being vague about what nice means the mul tiparameter analogs of 17a and 117b are4 an m Normal6 m nrl 2114a and A A an m Normal6 Jn6n 1 2114b Wons of 214a and 214b are many23 7 9 A Normal0 1 21331 Jnn12n 7 9 A Normal0 1 213b where the superscript 12 is interpreted as the symmetric square root Any symmetric positive semiede nite matrix has a spectral decomposition A ODO where O is orthogonal and D is diagonal meaning the offediagonal elements are zero and positive seniiede nite meaning the diagonal elements are nonnegative For a diagonal matrix D the meaning of mmetric square root is simple D 2 is the diagonal matrix whose elements are the square roots of the corresponding elements of D It is easily veri ed that D12 is symmetric and positive semiede nite and D D1 2D12 Then the symmetric square root of A is de ned by A D 1 20 It is easily veri ed that A12 is also symmetric and positive semiede nite and A A12A12 The only difference between these equations and the earlier ones being some boldface type Here the MLE n is a vector because the parameter it estimates is a vector and the Fisher information matrix either or as the case may be is as the terminology says a matrix this means the inverse operation denoted by the superscript 71 is a matrix inverse Matrix inversion is hard when done by hand but we will generally let a computer do it do it s really not a big deal 2 5 Con dence Intervals Con dence intervals analogous to LSa and LSb are a bit tricky When the parameter is a vector it doesn t t in an interval which is a onedimensional object So there are two approaches 0 We can generalize our notion of con dence interval to multidimensional random sets which are called con dence sets or con dence regions Some theory courses cover this generalization but I have never seen it actually applied in an actual application 0 What users actually do in multiparameter situations is to focus on con dence intervals for single parameter or for scalar functions of parameters So we will concentrate on linear scalar functions of the parameters of the form 136 which of course we estimate by 1367 A special case of this is when t has all components zero except for tj 1 Then 136 is just complicated notation for the jth component 9 and similarly 1367 is just complicated notation for the jth component 9M So with that said the con dence intervals for 1319 analogous to LSa and LSb are 13 i c t1n 1t 215a where c is the appropriate 2 critical value for example 196 for 95 con dence or 1645 for 90 con dence and 13 i c t Jn n1t 21513 When we specialize to t with only one nonzero component tj 1 we get 637 i c 1 rl 216 11 and the similar interval with J7 replacing In It may not be obvious what the notation in 216 for the asymptotic variance I 1gt H means so we explain it in words First you invert the Fisher information matrix and then you take the jj component of the inverse Fisher information matrix This can be very different from taking the jj component of the Fisher information matrix which is a scalar and inverting that Mostly the material in this section is for computer use We won t even bother with a pencil and paper example See the web page http www statumn edugeyer5102examprlike html
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'