OVERVIEW OF MATHEMATICAL STATISTICS
OVERVIEW OF MATHEMATICAL STATISTICS STAT 431
Popular in Course
Popular in Statistics
This 14 page Class Notes was uploaded by Miss Sabina Grimes on Monday October 19, 2015. The Class Notes belongs to STAT 431 at Rice University taught by Staff in Fall. Since its upload, it has received 7 views. For similar materials see /class/225040/stat-431-rice-university in Statistics at Rice University.
Reviews for OVERVIEW OF MATHEMATICAL STATISTICS
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/19/15
Probability and Inference 1 Introduction In this section we will be dealing with an operator probability applied to collections of events sets In terms of the logical steps dealing with sets we are implicitly going back to Socrates and Aristotle But we will be using the notation of George Boole 1815 1864 The probabilistic arguments really goes back into the realm of folklore for human beings have dealt with probabilities crudely or well for millenia What exactly does the television weather forecaster mean when stating There is a 20 probability that it will rain today Hopefully she is making the following sort of analysis Based on all the information I have at this time around 20 of the time that conditions are as they are today it will rain7 But it could be argued surely the conditions are in place which will cause rain or not The earth is pretty much a closed system Consequently the weatherwoman should be able to say of the next 24 hours It will rain7 or It will not rain7 Suppose we wished to measure the amount of water which will be required to ll a one liter bottle The answer is one liter7 Here there is no reason to invoke probabilities But it could be argued the weather is determined by all the factors which go to make the weather behave as it does So if we have the right model then there is no reason to have probabilities in the weatherwoman7s statement as to whether it will rain or not within the next day But that is precisely the problem We do not have a very good model for predicting the weather We have good models for many physical processes eg for example force really is equal WHAT Is PROBABILITY77THOMPSON 2 to mass times acceleration to a very close approximation unless the object is moving close to the speed of light But we do not have good models for many processes For example we do not know how to predict very well whether a stock Blahblahbiotech will be say 10 higher one year from today But even with the stock market there are useful models which will enable us to say things like There is a 50 chance that a year from today BBBT will have increased at least 10 in value77 These models are by no means of Newtonian reliability but they are getting better Weather models are getting better too Today unlike 20 years ago we have a much better chance of predicting how many hurricanes will hit the shoreline of the United States 2 What Is Probability But what does it mean this 2077 probability for rain within the next 24 hours As we have noted above it could well mean that conditions are such that with the crude model available to the weather forecaster she can say that it will rain in 20 of days with such conditions The weather forecaster is making a statement about the reliability of her model in the present circumstances Of course whether it will rain or not is a physical certainty based on the conditions But she has no model which will capture this certainty Probability then can be taken to be a measure of the quality of our models of our knowledge and ignorance in a given situa tion We can in toy examples make statements rather precisely For example if I toss a coin it can be said to have a 50 proba bility of coming up heads Again however if we really knew the precise conditions concerning the tossing of the coin we could bring the result to a near certainty Over many years a frequency interpretation of probability has become rather common as in the meaning of the statement of the weather forecaster If the conditions of today were dupli cated many times in 20 of the cases it would rain7 There are problems with such an interpretation Suppose we ask the question whether the Peoples Republic of China will WHAT Is PROBABILITY77THOMPSON 3 launch a nuclear attack against the United States within the next ve years It is hard to consider such a unique event from a frequency standpoint We could7 of course7 postulate a ction of 10000 parallel worlds just like this one A 1 chance of the nuclear attack would imply a nuclear attack in roughly 100 of the 10000 parallel worlds Such a ctional construction may be useful7 but it is ctional nonetheless Another interpretation of probability can be made from the standpoint of betting Let A be the event that a PRC nuclear attack is launched against the USA during the next ve years Let AC7 that is A complement7 be the event that A does not happen Suppose a Swiss gambler is willing to put up 100 Swiss francs in order to receive 10000 Swiss francs in the event the PRC launches the attack and nothing if there is no attack Then7 it could be argued that PA might be computed via 100 PA gtlt 10 000 PA gtlt 0 1 A fair game77 would say that the expected value of the bet right hand side of should be equal to the amount one pays to play it By such a rule7 PA is equal to 017 for 100 01 gtlt 10000 99 gtlt 0 2 The fact is that we can develop many de nitions of what we mean by probability However7 there are certain common agree ments concerning probability For example7 consider an election where there are 71 candidates Suppose the election gives the vic tory to the candidate with the most votes plurality rule Then PAi is the probability that the 2th candidate wins the election One of the candidates must win the election The probability that two candidates can both win is zero Therefore 1 0 PAi 1 2 PA1A2 An PA1PA2PAn 1 3 For any collection of the candidates7 say7 239 j PAi Aj PAiPAj THE 2000 ELECTIONiTHOMPSON 4 3 The 2000 Election In the year 20007 of the four leading candidates for the presi dency of the United States had a residential address in a south ern state Buchanan7 Bush and Core Two of the candidates Gore and Nader were liberals Let us suppose7 that other third party candidates are dismissed as persons whose election in 2000 were beyond the realm of possibility Let us suppose that we would like to nd the probability of the winner being a southern liberal Table 1Candidate Probabilities Candidate Southerner Liberal Prob of Election Buchanan Yes No 0001 Bush Yes No 5300 Core Yes Yes 4689 Nader No Yes 0010 Table 1 is both a probability table and a truth table7 ie7 it shows the probability of each candidate winning7 and it also shows by yes77 and no77 answers T and F whether a candi date has the property of being a southerner andor a liberal In order to compute the probability of an event we need to decorn pose the event into primitive events ie7 events which cannot be decomposed further In this case7 we note that the event of a southern liberal is satis ed only by Gore Thus7 we have PSoutherner Liberal PGore 4689 3 On the other hand7 suppose that we seek the probability that a southern nonliberal wins The set of southern nonliberals win nings includes two prirnitive events Buchanan wins and Bush wins Then PSoutherner NonLiberal PBuchanan Bush PBuchanan PBush 0001 5300 5301 4 During the campaign7 one poll showed THE 2000 ELECTIONiTHOMPSON 5 Table 2Percentage Support in Poll Candidate Percentage in Poll Buchanan 1 Bush 48 Core 45 Nader 6 Based on this poll7 can we compute the probability that a par ticular candidate will win Some might wrongly look at the poll and suppose that Nader7s chance of winning is 6 after all7 he seems to have 6 of the electorate behind him In fact7 given the pro le in Table 27 the chance of Nader winning is es sentially zero A candidate wins the US presidential election by garnering a majority of the electoral vote A candidate wins the electoral votes obtained by summing the number of con gresspersons plus two Generally speaking7 a pro le like that in Table 2 will guarantee that only Bush and Gore have a chance of winning Nader and Buchanan will probably gain a plurality in not one single state If Nader stays in the race7 then Bush prob ably wins7 since his votes are largely Democraticlf Nader drops out7 then Gore probably wins And of course7 we are looking at a poll taken some time before the election It would be a stretch to make a guess about the probability that Gore will win the election And7 again7 we note that we are going to have a hard time making a rigid frequency interpretation about this proba bility There is only one US presidential election in 2000 On the other hand7 there were presidential elections which have had similarities to the situation in 2000 And there were elections for governors and congressmen which are relevant7 and the opin ions of experts who study elections To make a rigorous logical statement about the probability Gore will win is extremely dif cult But practicality will require that people attempt to make such statements Part of living in a modern society requires a practical and instinctive grasp of probability An acquisition of such a practical understanding is part of the motivation for this course Comnorm FmEAElu39ryiTHohmsoxi n 4 Conditional Probability ow let us exemlne bne y the Venn dzmjmmm gure 1 John Venn 183mm gave us the Venn dregem as e means of gaphr lcel vrsuelrzetron of logcal stetements In the ebove exemple there ere our pnmltwe events ell of whlch ere descnbed by whether A ls true or false end whether B 15 true or false The nonsheded ereers notB or B complement 13 More genexelhx u we heve e number of events sey 5 of them then the number of pnmltwe events wlll be 2 gtlt z gtlt z gtlt z gtlt z Nievexythlng ls ohereeterrzed by w ether en event heppens or does not he pen So then m the full generallty for n events there ere 2b pnmltwe events Figure 1Venn Diagram Another matter we wlll need to mvestlgate ls that of causal rty We mrgnt ask the questron will there be e blg turnout thls elternoon et en outdoor pohtroel xelh Let us denote thls event es A The turnout of the relly ls probabh dependent on the weether thls elternoon Suppose then we oonsrder enother event lt sterts to rem by one hour before the xelh Cell thls CONDITIONAL PROBABILITYiTHOMPSON 7 event B Now we can say that PA B PBPAlB 5 Reading 5 partly in English and partly in mathematical sym bols we are saying The Probability A happens and B happens equals the Probability B happens multiplied by the Probability A happens given that B happens Recalling what A and B represent we are saying completely in English this time The probability there is a big turnout at the rally and that it rains is equal to the probability it rains mul tiplied by the probability there is a big turnout if it rains Stated in this way it is clear that we are implying that rain can have an effect on the rally If there were no effect of rain on the rally then 5 would become simply PA m B PBPA 6 In such a case we would say that A and B are stochastically in dependent and that the conditional probability PAlB is simply equal to the marginal probability PA Returning to the situation where A and B are not independent of each other let us note that the following is logically true PB A PAPBlA 7 In English The probability there is a big turnout at the rally and that it rains is equal to the probability there is a big turnout at the rally multiplied by the probability that it rains if there is a big turnout This might sound as though we were talking about the turnout haVing an effect on the climate Really we are not We are talking here about concurrence rather than causation If there BAYES THEOREMiTHOMPSON 8 is a natural causal event and a natural e ect event then the kind of statements in 5 and 7 speak of concurrence which may be causation but need not be The weather may well effect the turnout at the rally 5 concurrency and causation but the turnout at the rally does not affect the weather concurrency only All these matters can be written rather simply but require some contemplation times before it becomes clear Returning to our original problem we might write the proba bility of a big turnout at the rally as PA PBPAlBPBCPAlBC 8 Reading 8 in English we are saying The Probability A happens equals The Probability B happens times the Probability A happens given that B happens plus The Probability B does not happen times the Probability A happens given that B does not happen An experienced political advisor gives us the information that probability of a big turnout in the case of rain is 40 but the probability of a big turnout in the case of no rain is 90 Making a good guess about PA is important for the public relations team can then know whether to prepare to get the media to cover the event or not The weather consultant informs the probability of rain in the afternoon is 20 We wish to compute our best estimate as to whether the turnout will be big or not PA 20 gtlt 40 80 gtlt 90 08 72 80 9 It would appear that the public relations team might well try and get the media to come to the rally 5 Bayes Theorem Now it is reasonable to suppose that rain can have an effect on the size of an outdoor rally It is less clear that the size of the outdoor rally can have an effect on the weather However let us suppose that some years have passed since the rally Reading in a pile of newspaper clippings a political scientist reads that the BAYES THEOREMiTHOMPSON 9 turn out at the rally was large Nothing is found in the article about what the weather was Can we make an educated guess as to whether it rained or not We can try and do this using the equation mmmmm Hmmf a mm This equation is not particularly hard to write down no fancy mathematical machinery is necessary But it is one of the most important equations in science lnterestingly7 it was not discov ered by Newton or Descartes or Pascal or Gauss7 great mathe maticians all Rather it was discovered by an 18th century Pres byterian pastor7 Thomas Bayes 1702 1761 The result bears his name Bayes7 Theorem Let7s prove it First of all7 returning to our Venn diagram in Figure 17 we note that it either rains or it does not7 that is to say7 B BC Q 11 Q is the universal set the set of all possibilities Clearly7 then PB PB 139 1 12 Rewriting 77 we have Pmmm mam MN m Next7 we substitute the equivalent of B A from 5 into 13 to give HERNE P B A 14 lt l gt MA lt gt Returning to the Venn diagram7 we note that AA nA BB A BA B 15 Combining 5 with 157 we have PA PBPAlBPBCPAlBC 16 BAYES THEOREMiTHOMPSON 10 Substituting 16 into the denominator of 14 we have Bayes7 Theorem PBPAlB PBlA W 17 Let us see how we might use it to answer the question about the probability it rained on the day of the big rally Substituting for what we know and leaving symbols in for what we do not we have PB40 PB40 PB 90 39 And here we see a paradox In order to compute the posterior probability that it was raining on the afternoon of the big rally given that there was a big rally that afternoon we need to have a prior guess as to PB that is to say a guess prior to our reading in the old clipping that there had been a big afternoon rally that day The philn nphical implication are ill tantin Bayes wants us to have a prior guess as to what the probability of rain on the afternoon in question was in the absence of the data at hand namely that there was a big rally This result was so troubling to Bayes that he never published his results They were published for him and in his Bayes7 name by a friend after his death a nice friend many results get stolen from living people not to mention dead onesWhy was Bayes troubled by his theorem Bayes lived during the Enlightenment After Descartes it was assumed that everything could be reasoned out starting from zero7 But Bayes7 Theorem does not start from zero It starts with guesses for the probability which might simply be the prejudices of the person using his formula Bayes7 Theorem then was politically incorrect by the standards of his time Then as now political correctness is highly damaging to human progress To use Bayes7 Theorem we need to be able to estimate PB Of course if we know PB then we know PB 1 PB What to do Well suppose the rally was in October We look in an almanac and nd that for the area where the rally was held it rained in 15 of the days We are actually looking for rain in the afternoon but almanacs are usually not that detailed PBlA 18 BAYES THEOREMiTHOMPSON 11 Then we have from 18 PBlA 0727 19 15 gtlt 40 85 gtlt 90 19 is the way to proceed when we have a reasonable estimate of the prior probability PB And in very many cases we do have an estimate of PB What troubled Bayes was the very idea that in order to use a piece of data such as the knowledge that on the day in question there was a big rally if we are to estimate the probability that it rained on that day then we need to have knowledge of the probability of rain prior to the use of the data By Enlighten ment standards one should be able to start from zero7 A prior assumption such as PB was sort of like bias or prejudice Bayes came up with a politically correct way out ofthe dilemma but the x always troubled him The x is referred to as Bayes7 Axiom In the absence of prior information concerning PB assume that PBPB When we take this step in 18 then notice that PB and PB cancel from numerator and denominator and we are left simply with PBPAlB PBPAlB PB PAlB P A113 PAlB PAlBC 40 40 90 3077 20 PBlA The difference in the answers in 19 and 20 is substantial To us today it would appear that 19 is the way to go and that the assumption of prior ignorance is not a good one to make unless we must But as a matter of fact many of the statistical computations in use today are based on 20 rather on than 19 And as it turns out as we gain more and more data our guesses as to the prior probabilities become less and less important BAYES THEOREMiTHOMPSON 12 Let us now consider a more practical use of Bayes7 Theorem Suppose a test is being given for a disease at a medical center Historically7 5 of the patients tested for the disease at the center actually have the disease In 1 of the cases when the patient has the disease7 the test incorrectly gives the answer that the patient does not have the disease Such an error is called afalse negative In 6 of the cases when the patient does not have the disease7 the test incorrectly gives the answer that the patient does have the disease Such an error is called afalse positive Let us suppose that the patient tests positive for the disease What is the posterior probability that the patient has the disease We will use the notation DJr to indicate that the patient has the disease7 D that the patient does not have the disease T4r indicates the test is positive7 T indicates the test is negative Then we have PTlDPD PTlDPD PTlDPD 99 X 05 99 gtlt 05 06 gtlt 95 4648 21 PDlT Suppose that a physician nds that a patient has tested positive It is still very likely780ithat the patient does not have the disease So7 it is likely the physician will tell the patient that it is quite likely the disease is not present7 but to be on the safe side7 the test should be repeated Suppose this is done and the test is again positive The new prior probability of the disease being present is now PD 4648 So7 we have PTlDPD TtlDPD PTlD PD 99 gtlt 4648 99 gtlt 4648 06 gtlt 5352 9348 22 PDlT P At this point7 the physician advises the patient to enter the hospital for more detailed testing and treatment MULTISTATE VERSION OF BAYES THEOREMiTHOMPSON 13 6 Multistate Version of Bayes Theorem Typically7 there will be more than two possible states Let us suppose there are n states7 A17A27An Let us suppose that H is some piece of information data Consider the Venn diagram in Figure 2 A1 A2 An Figure 21nferential Venn Diagram Then it is clear really that 17 becomes PA1PHlA1 PA1lH m 23 As an example7 consider the case where a mutual fund manager is deciding where to move 5 of the funds assets currently in bonds She wishes to decide whether it is better to invest in chips A17 Dow listed large cap companies A27 or utilities A3 She is inclined to add the chip sector to the fund7 but is not certain Her prior feelings are that she is twice as likely to be better invested in chips than in Dow stocks7 and three times as likely to be better invested in chips than in utilities This gives her prior probabilities PA1 PA1 PA1 1 24 MULTISTATE VERSION OF BAYES THEOREMiTHOMPSON 14 Solving7 we have7 for the prior probabilities 6 PA1 E 3 PA2 E 2 She receives information H that the prime interest rate is likely to rise by one half percent during the next quarter She feels that PHlA1 1 PHlA2 4 PHlA3 5 How7 then7 should she revise her estirnates about the desirability of the various investrnents7 given the new information on interest rate hikes P rorn 237 we have 611 gtlt 10 P A H 214 ll 611gtlt10311gtlt40211gtlt50 311 gtlt 40 P A H 42 2 611gtlt10311gtlt40211gtlt50 8 2 11 50 PA3lH X 35217 611 gtlt 10 311 gtlt 40 211 gtlt 50 28 Based on the new inforrnation7 she reluctantly abandons her prior notion about investing in chips The increase in interest rates will have a chilling effect on their RampD She decides to go with a selection of large cap Dow stocks