Class Note for CMPSCI 585 at UMass(5)
Class Note for CMPSCI 585 at UMass(5)
Popular in Course
Popular in Department
This 48 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 19 views.
Reviews for Class Note for CMPSCI 585 at UMass(5)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Andrew McCallum UNIass Amherst Probability Lecture 7 Introduction to Natural Language Processing CMPSCI 585 Fall 2007 University of Massachusetts Amherst Andrew McCaIIum Today s Main Points Remember or learn about probability theory samples events tables counting Bayes Rule and its application A little calculus random variables Bernoulli and Multinomial distributions the work horses of Computational Linguistics Multinomial distributions from Shakespeare Probability Theory Probability theory deals with predicting how likely it is that something will happen Toss 3 coins how likely is it that all come up heads See phrase more lies ahead how likely is it that lies is noun See Nigerian minister of defense in email how likely is it that the email is spam See Le chien est noir how likely is it that the correct translation is The dog is black Probability and CompLing Probability is the backbone of modern computational linguistics because Language is ambiguous Need to integrate evidence Simple example which we will revisit later see the first word of a news article glacier What is the probability the language is French EngHsh Now see the second word melange Now what are the probabilities Experiments and Sample Spaces Experiment or trial repeatable process by which observations are made eg tossing 3 coins Observe basic outcome from sample space 9 set of all possible basic outcomes eg Andrew McCallum UMass Amherst one coin toss sample space 9 H T basic outcome H or T three coin tosses Q HHH HHT HTH TTT Partof speech of a word 9 CC1 CD2 CT3 WRB36 lottery tickets Q 107 next word in Shakespeare play Q size of vocabulary number of words in your PhD thesis 9 0 1 oo discrete countably infinite 39continuous uncountany infinite H H length of time of a sounds when I said sample Events and Event Spaces An event A is a set of basic outcomes Le a subset of the sample space 9 Intuitively a question you could ask about an outcome S2 HHH HHT HTH HTT THH THT TTH TTT eg basic outcome THH eg event has exactly 2 H s ATHH HHT HTH AQ is the certain event A is the impossible event For not A we write K A common event space F is the power set of the sample space 9 power set is written 29 lntuitively all possible questions you could ask about a basic outcome Probability A probability is a number between 0 and 1 0 indicates impossibility 1 indicates certainty A probability function P or probability distribution assigns probability mass to events in the event spaceF P F a 01 PQ 1 Countable additivity For disjoint events Aj in F Puj Aj 2 PAj We call PA the probability of event A Welldefined probability space consists of sample space 9 event space F probability function P Probability more intuitively Repeat an experiment many many times Let T number of times Count the number of basic outcomes that are a member of event A Let C this count The ratio CT will approach some unknown but constant value Call this constant the probability of event A write it PA Why is the probability this ratio of counts Stay tuned Maximum likelihood estimation at end Example Counting A coin is tossed 3 times What is the likelihood of 2 heads Experiment Toss a coin three times 9 HHH HHT HTH HTT THH THT TTH TTT Event basic outcome has exactly 2 H s A THH HTH HHT Run experiment 1000 times 3000 coin tosses Counted 373 outcomes with exactly 2 H s Estimated PA 3731000 0373 Example Uniform Distribution A fair coin is tossed 3 times What is the likelihood of 2 heads Experiment Toss a coin three times 9 HHH HHT HTH HTT THH THT TTH TTT Event basic outcome has exactly 2 H s A THH HTH HHT Assume a uniform distribution over outcomes Each basic outcome is equally likely PHHH PHHT PTTT PA A s2 38 0375 Probability again A probability is a number between 0 and 1 0 indicates impossibility 1 indicates certainty A probability function P or probability distribution distributes probability mass of 1 throughout the event spaceF P F a 01 PQ 1 Countable additivity For disjoint events Aj in F PLJj Aj EJ PAj The above are axioms of probability theory Immediate consequences PQ 0 l3A 1 PA A Q B gt PA s PB 2a 69 Pa 1 for a basic outcome Vocabulary Summary Experiment a repeatable process Sample a possible outcome Sample space all samples for an experiment Event a set of samples Probability assigns a probability to each sample distribution Uniform all samples are equiprobable distribution Collaborative Exercise You roll a fair die then roll it again What is the probability that you get the same number from both rolls Explain in terms of event spaces and basic outcomes Joint and Conditional Probability Joint probability of A and B PA m B is usually written PAB Conditional probability of A given B PAB PAB PB S2 Updated probability of an event given some evidence P A prior probability of A PAB posterior probability of A given evidence B Joint Probability Table What does it look like under the hood Pprecipitation temperature sun rain sleet snow 108 009 000 000 001 208 008 000 000 002 308 005 001 001 003 408 006 003 001 000 508 006 004 000 000 608 006 004 000 000 708 007 003 000 000 808 007 003 000 000 908 008 002 000 000 1005 008 002 000 000 it takes 40 numbers Andrew McCallum UMass Amherst What does it look like under the hood Conditional Probability Table Pprecipitation temperature Andrew McCallum UMass Amherst 105 205 305 405 505 605 70s 805 905 1005 SUh 09 08 05 06 06 06 07 07 08 08 rain 00 00 01 03 04 04 03 03 02 02 sleet 00 00 01 01 00 00 00 00 00 00 SHOW 01 02 03 00 00 00 00 00 00 00 it takes 40 numbers Two Useful Rules Multiplication Rule PAB F AIB PB equivalent to conditional probability definition from previous slide Total Probability Rule Sum Rule PA PAB PAB or more generally if B can take on n values PA 2i1n PABi from additivity axiom Bayes Rule PAB PBA since PA o B PB o A Therefore PAB PB PBA PA and thus PAB PBIAPA PB 39 Normalizing constant Bayes Rule lets you swap the order of the dependence between events calculate PAB in terms of PBIA Reverend Thomas Bayes Rumored to have been tutored by De Moivre Was elected a Fellow of the Royal Society in 1742 despite the fact that at that time he had no published works on mathematics Essay towards solving a problem in the doctrine of chances published in the Philosophical Transactions of the Royal Society 1702 1761 of London in 1764 Same year Mozart wrote his symphony 1 in Eflat Andrew McCallum UMass Amherst Independence Can we compute PAB from PA and PB Recall PAB PBA PA multiplication rule We are almost there How does PBA relate to PB PBA PB iff B and A are independent Examples Two coin tosses Color shirt I m wearing today what Bill Clinton had for breakfast Two events A B are independent from each other if PAB PA PB Equivalent to PB PBA if PA i 0 OthenNise they are dependent 1 0s 20s 30s 40s 50s 60s 70s 80s 90s 1 00s Joint Probability with Independence Independence means we need far fewer numbers Pprecipitation temperature SUl39l 009 008 005 006 006 006 007 007 008 008 rain 000 000 001 003 004 004 003 003 002 002 sleet 000 000 001 001 000 000 000 000 000 000 it takes 40 numbers Andrew McCallum UMass Amherst Sl39lOW 001 002 003 000 000 000 000 000 000 000 Pprecipitation Ptemperature 10s 20s 30s 40s 50s 60s 70s 80s 90s 100s sun rain sleet snow 05 03 005 015 01 01 01 01 01 01 01 01 01 01 it takes 14 numbers 03 rim 2 gt9 gt9 gt9 gtV u 2 gt9 gt9 gt9 gtV 2gt9 gt9 gt9 gt3 2 9805 a 2gtwv u 225 IE 03 rim 2 gt9 gt9 gt9 gtV u 2 gt9 gt9 gt9 gtV 2gtN gt9 gt9 gtV 2gt9 gt9 gtV 03 rim 2 gt9 gt9 gt9 gtV u 2 gt9 gt9 gt9 gtV 2gtN gt9 gt9 gtV 2gtw gt9 gtV 15 9232306 PEP m6 m Sqmcmzqm3183 mmo 05 Chain Rule If A1An are all independent from each other PA1 A2 A3 A4 An PA1 PA2 PA3 39F539ltAngt Example Two ways same answer A fair coin is tossed 3 times What is the likelihood of 3 heads Experiment Toss a coin three times 9 HHH HHT HTH HTT THH THT TTH TTT Event basic outcome has exactly 3 H s A HHH Chain rule PHHH PH PHH PHHH PH PH PH 123 18 Size of event spaces PHHH A Q 18 Collaborative Exercise Suppose one is interested in a rare syntactic construction parasitic gaps which occur on average once in 100000 sentences Peggy Linguist has developed a complicated pattern matcher that attempts to identify sentences with parasitic gaps It39s pretty goodbut its not perfect if a sentence has a parasitic gap it will say so with probability 095 if it doesn39t it will wrongly say so with probability 0005 Suppose the test says that a sentence contains a parasitic gap What is the probability that this is true Finding most likely posterior event for example P lies Noun more lies ahead F B Want to find most likely A given B but PB is sometimes a pain to calculate arg maxA PBAPA arg maxA PBAPA PB because B is constant while changing A Random Variables A random variable is a function X 2 a Q in general Q5R but more generally simply Q R makes it easier to talk about numerical values related to event space Random variable is discrete if Q is countable Example coin Q01 die Q16 Called an indicator variable or Bernoulli trial if Q E 01 Example Suppose event space comes from tossing two dice We can define a random variable X that is the sum of their faces X 2 a 212 Because a random variable has a numeric range we can often do math more easily by working with values of the random variable than directly with events Andrew McCallum UMass Amherst Probability Mass Function pXx PAX where AX a E Q Xax Often written just px when X is clear from context H Write X px for X is distributed according to px In English Probability mass function p maps some value x of random variable X to the probability random variable X taking value x equal to the probability of the event AX this event is the set of all basic outcomes a for which the random variable Xa is equal to x Example again Event space roll of two dice eg alt25gt Q36 Random variable X is the sum of the two faces pX4 PA4 A4 lt13gt lt22gt lt31gt PA4 336 Random variables will be used throughout the Introduction to Information Theory coming next class Andrew McCallum UMass Amherst Expected Value is a weighted average or mean of a random variable EX ZxEXQX 39 pX Example X value of one roll of a fair sixsided die EX 1234566 35 X sum of two rolls EX 7 lfY pYy is a random variable then any function gY defines a new random variable with expected value EgY Eye 9 9M 39 py For example let gY aYb then EgY a EY b EXY EX EY if X and Y are independent EXY EX EY Variance Variance written 02 Measures how consistent the value is over multiple trials How much on average the variable s value differs from the its mean VarX EXEX2 Standard deviation VarX 0 Joint and Conditional Probabilities with Random Variables Joint and Conditional Probability Rules Analogous to probability of events Joint probability pxy PXX Yy Marginal distribution px obtained from the joint pxy pX 2y pxy by the total probability rule Bayes Rule pxly pyIX px py Chain Rule pwxyz W pylz pxlyz pwlxyz Parameterized Distributions Common probability mass functions with same mathematical form just with different constants employed A family of functions called a distribution Different numbers that result in different members of the distribution called parameters I0ab Binomial Distribution A discrete distribution with two outcomes S2 0 1 hence bi nomial Make n experiments Toss a coin n times Interested in the probability that rof the n experiments yield 1 Careful It s not a uniform distribution q prob of H n 19R r m rjq 1qquotquot where n r n rr Pictures of Binomial Distribution binomial nq b1001 b1003 04 020 01 010 00 00 012345678910 012345678910 b1005 b1007 020 020 010 010 00 00 012345678910 012345678910 b1009 b10099 04 08 03 39 02 04 I 01 00 00 012345678910 012345678910 Andrew McCallum UMass Amneisi Multinomial Distribution A discrete distribution with m outcomes 9 012 m Make n experiments Examples Roll a msided die n times Assuming each word is independent from the next generate an nword sentence from a vocabulary of size m Interested in the probability of obtaining counts c 61 c2 cm from the n experiments l PC nQ HWQQ m i1 Unigram language model Parameter Estimation We have been assuming that P is given but most of the time it is unknown 80 we assume a parametric family of distributions and estimate its parameters by finding parameter values most likely to have generated the observed data evidence treating the parameter value as a random variable Not the only way of doing parameter estimation This is maximum likelihood parameter estimation Maximum Likelihood Parameter Estimation Example Binomial Toss a coin 100 times observe r heads Assume a binomial distribution Order doesn t matter successive flips are independent One parameter is q probability of flipping a head Binomial gives prnq We know r and n Find arg man prn q Maximum Likelihood Parameter Estimation Example Binomial Toss a coin 100 times observe r heads Assume a binomial distribution Order doesn t matter successive flips are independent One parameter is q probability of flipping a head Binomial gives prnq We know r and n Find arg man prn q Notes for board n likelihood pR r nq qr q quot r log likelihood L logpr nq oclogqr1 q quot rlogq n rlog1 q dL r n r r gtr1qn rq3q 99 q 1 9 Our familiar ratioofoounts is the maximum likelihood estimate Binomial Parameter Estimation Examples Make 1000 coin flips observe 300 Heads PHeads 3001000 Make 3 coin flips observe 2 Heads PHeads 23 Make 1 coin flips observe 1 Tail PHeads 0 Make 0 coin flips PHeads We have some priof belief about PHeads before we see any data After seeing some data we have a posteriof belief Maximum A Posteriori Parameter Estimation We ve been finding the parameters that maximize pdataparameters not the parameters that maximize pparametersdata parameters are random variables pqlm prlnn pqln prlnn pq prn constant And let pq 2 q1q Maximum A Posteriori Parameter Estimation Example Binomial posterior 190 naqpq qu q quot2q1 q r1 10g posterior L OC 10gq 1 q quot1r 110gq n r 110g1 q o7Lr1n r1Ar3911qnr1qq r1 ampq q l q n2 Bayesian Decision Theory We can use such techniques for choosing among models Which among several models best explains the data Likelihood Ratio Pmodel1 data Pdatamodel1 Pmodel1 Pmodel2 data Pdatamodel2 Pmodel2 back to our example French vs English pFrench glacier melange versus pEninsh glacier melange We have real data for Shakespeare s Hamlet Charles Dickens Oliver Twist pHamIet hand death pOIiver hand death Continuing Homework Assignment EC for whatever you hand in by tonight For next week More time to create your own grammar Modify parser to keep trace and print parse trees Try an additional grammar of your own creation and investigate ambiguities Work in small teams Document Classification with Probabilistic Language Modeling Temporal reasoning for I planning has long Testing been studied formally Document V We discuss the semantics of several planning Categories Machine Prog Lang Garbage Learning Flaming Semantics Collection Multimedia Training 39 data Neural networks quotPlannmng based on 1333 for quotMultimedia User and other machine With temporal the semantics Stron 1 ty ed streaming studies learning methods reasonmg of program languages 10quot Video for quot of GUI quot of Classi cation quot has been quot dependencequot Andrew McCallum UMass Amherst GUI A Probabilistic Approach to Classification Nai39ve Bayes Pick the most probable class given the evidence I I I IIIII I I a class like Planning I a document like language intelligence proof Bayes Rule Na39I39ve Bayes III IIII III l39l39Ill39l39 I 39II IIIquot I III39I I39ll I III IIIquot I IIII IIII III I I H I III the ith word in dlike proof Andrew McCallum UNIass Amherst
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'