### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Introduction to Natural Language Processing CMPSCI 585

UMass

GPA 3.57

### View Full Document

## 17

## 0

## Popular in Course

## Popular in ComputerScienence

This 573 page Class Notes was uploaded by Roman McCullough on Friday October 30, 2015. The Class Notes belongs to CMPSCI 585 at University of Massachusetts taught by Staff in Fall. Since its upload, it has received 17 views. For similar materials see /class/232275/cmpsci-585-university-of-massachusetts in ComputerScienence at University of Massachusetts.

## Similar to CMPSCI 585 at UMass

## Popular in ComputerScienence

## Reviews for Introduction to Natural Language Processing

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/30/15

Noisy Channel Ngrams amp Smoothing Lecture 9 Introduction to Natural Language Processing CMPSCI 585 Fall 2007 University of Massachusetts Amherst Andrew McCaIIum Andrew McCallum UNIass Amherst Today s Main Points Application of the Noisy Channel Model Markov Models including Markov property definition Smoothing Laplace Lidstone s Heldout GoodTuring Noisy Channel Model Y X W W E d Channel D d nco er eco er pXy Message Input to Output from Attempt to from a finite channel channel reconstruct message alphabet based on output Optimize Encoder for throughput and accuracy compression remove all redundancy accuracy adding controlled redundancy Capacity rate at which can transmit information with arbitrarily low probability of error in W 7 max 9171 Y1 MY Andrew McCallum UMass Amherst Noisy Channel in NLP Y X Y mpunn Unpume A EmpHD charme charme wecnnsu um message m an W Wage mm pWWvly 9 w I Y al gleLVpQ argm x T Rl39gIIX IXZJUHKIM Probabilistic Language Modeling Assigns probability pt to a word sequence t w1w2w3w4w5w6 Chain rule and jointconditional probabilities for text t phi 1711391Lt39n0Ll391pirnlwl1 wn1 n 2H ul wlwiil i1 where pw1urk 7 u391wk pltl t quot10quot V a 1 1 I 1 pw1wk1 Cu lwk 1 The chain rule leads to a historybased model we predict following things from past things Human mumsquot ng ram models the classic example of a statistical model of language Each word is predicted according to a conditional distribution based on limited context Conditional Probability Table CPT pX both pofboth 0066 ptoboth 0041 pinboth 0038 aka Markov chain models sequences of random variables in which the future variable is determined by the present variable but is independent of the way in which the present state arose from its predecessors 1gram model W1 W2 W3 W4 Firstorder Markov model Pwtlwt1 Simplest linear graphical model Words are random variables arrows are direct dependencies between them CPTs These simple engineering models have been amazingly successful Andrew McCallum UMass Amherst nth order Markov models First order Markov assumption bigram 1lL A1lL A 1139211111L39 quotI111IL39 11 Al 1 A 1 I Al A 1 MIN13971 Similarly nth order Markov assumption Most commonly trigram 2nd order Plt k72wkilwkl 7 LP U JD 2 11371 7I lUm lt Al 1 A 1 AI A 2 A 1 Kwkizwki Andrew McCallum UMass Amherst Andrei Andreyevich Markov 1856 1922 Graduate of Saint Petersburg University 1878 where he began a professor in 1886 Mathematician teacher political activist In 1913 when the government celebrated the 300th anniversary of the House of Romanov family Markov organized a countercelebration of the 200th anniversary of Bernoulli s discovery of the Law of Large Numbers Markov was also interested in poetry and he made studies of poetic style Markov s Model Took 20000 characters from Pushkin s Eugene Onegin to see if it could be approximated by a simple chain of characters vowel consonant vowel 0128 0872 consonant 0663 0337 Markov Approximations to English Zeroorder approximation Pc XFOML RXKXRJFFUJ ZLPWCFWKCRJ FFJEWKCQSGHYD QPAAMKBZAACIBZLHJQD Firstorder approximation Pcc OCRO HL RGWR NWIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBI IVA Secondorder approximation Pccc ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE From Shannon s original paper Markov Approximations to English cont Thirdorder approximation Pwwww IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTABIN IS REGOACTIONA OF CRE Markov Random Field with 1000 features WAS REASER IN THERE TO WILL WAS BY HOMES THING BE RELOVERATED THER WHICH CONSISTS AT FORES ANDITING WITH PROVERAL THE CHESTRAING FOR HAVE TO INTRALLY OF QUT DIVERAL THIS OFFECT INATEVER THIFER CONTRANDED STATER Della Pietra Della Pietra amp Lafferty 1997 Wordbased Approximations Firstorder approximation representing and speedily is an good apt or come can different natural here he the a in came the to of to expert gray come to furnishes the line message had be Secondorder approximation the head and in frontal attack on an English writer that the character of this point is therefore another method for the letters that the time of who ever told the problem for an unexpected Shannon s comment 1948 quotIt would be interesting if further approximations could be constructed but the labor involved becomes enormous at the next stage Andrew McCallum UMass Amherst ngram models Core language model for the engineering task of better predicting the next word Speech recognition OCR Contextsensitive spelling correction It has only recently that improvements have been made for these tasks Alshawi 96 Wu 97 But linguistically they are appallingly simple and naive Why might ngram models not work Relationships say between subject and verb can be arbitrarily distant and convoluted as linguists love to point out The man on the sidewalk without pausing to look at what was happening down the street and quite oblivious to the situation that was about to befall him confidently strode into the center of the road Why do they work That kind of thing doesn t happen much Collins 1997 74 of dependencies in the Penn Treebank WSJ are with an adjacent word 95 with one less than 5 words away once one treats simple NPs as units Evaluation of language models Best evaluation of probability model is taskbased As substitute for evaluating one component standardly use corpus perword cross entropy a n 1 HlXul E lulu 39lquotr39 z 39 quot Iquot n IU H 3911 I l Or perplexity units average number of choices scaled for uniform distr high unpredictable n 1 rquot39n PP 391 glqlexPl39 H I39I39l391l 12 1 1l l Andrew McCallum UMass Amherst Parameter Estimation Maximum Likelihood Estimate Relative frequency Makes training data a probable as possible Overfits 7 11 12 pl 3 1 u39ll Limitations of the Maximum Likelihood Estimator Problem often infinitely surprised when unseen word appears Punseen O Problem this happens commonly Probabilities of zerocount words are too low Probabilities of nonzerocount words are too high Estimates for high count words are fairly accurate Estimates for low count words are unstable We need smoothing Andrew McCallum UMass Sparsity How often does an every day word like kick occur in a million words of text kick about 10 depends vastly on genre of course wrist about 5 Normally we want to know about something bigger than a single word like how often you kick a ball or how often the dative alternation he kicked the baby a toy occurs How often can we expect that to occur in 1 million words Almost never There s no data like more data Must be ofthe right domain Amherst Severity of the sparse data problem count 2grams 3grams 1 8045024 53737350 2 2065469 9229958 3 970434 3654791 gt4 3413290 8728789 gt0 14494217 75349888 possible 68 x1010 17 x1016 Vocab size 260741 words 365M words training The Zero Problem Necessarily some zeros trigram model 17 x 1016 parameters but only 26 x 106 words of training data How should we distribute some probability mass over all possibilities in the model optimal situation even the least frequent trigram would occur several times in order to distinguish its probability versus other trigrams optimal situation cannot happen unfortunately how much data would we need Two kinds of zeros pwh0 or even ph0 Laplace smoothing 131111 m 1 1Iil 3939t i y l quot101 lquot V is the vocabulary size assume fixed closed vocabulary This is the Bayesian maximum a posteriori estimator you get by assuming a uniform prior on multinomials a Dirichlet prior Dirichlet Distribution Multinomial is a die a distribution over a finite alphabet of outcomes Dirichlet is a dice generator a distribution over multinomials Ff 1 1 ZAHA HHA quotm j N Dil39i 111t f39i39 091 H r 3 It is a conjugate prior for multinomials Dirichlet Examples 105 U1 35 2 3 15 25 2 1 15 39 1 x 05 H 4 39quotquot L15 39 39 0 Cl 02 04 05 Us 1 U 02 04 05 US U2 U15 15 5 f a Andrew McCallum UMass Amherst Laplace Smoothing Problem gives too much probability mass to unseens Not good for large vocabulary comparatively little data NLP eg 10000 word vocab 1000000 words of training data but comes across occurs 10 times Of those 8 times next word is as PMLEascomes across 08 PLap aEJaslcomes across 81101000000009 Quick fix Lidstone s law Mitchell s 1997 mestimate 7101 102 A Cwl V foiA lt1 0g 12 01003 Plt1L 2l 1 How much mass to allocate to unseens For Laplace smoothing in Pcomes across 100001001O of the prob mass is given to unseen events How do we know that this is too much Absolute discounting Idea is that we want to discount counts of seen things a little and reallocate this probability mass to unseens By subtracting a fixed count probability estimates for commonly seen things are scarcely affected while probabilities of rare things are greatly affected lfthe discount is around 6075 then seeing something once is not so different than not having seen it at all r39liiu 1 113 5 Jill uquot l 7 if Waf 1399 11gt l r r 4M pl l gllt ll ilu l WlHr 7i 1391 Held Out Estimator How do you know how likely you are to see a new word type in the future in a certain context Examine some further text and find out empirical heldout estimators validation Divide data into two pots training data validation data N number of nonunique bigrams in training Nr number of unique bigrams with freq r in training Tr number of times that all bigrams appearing rtimes in the training data appeared in the validation data T 1 i Itquot Fol x39llt l i 39I39 W U 3 120 l NT N l l TrNr is an improved estimate for r Pots of data for estimating and testing models Major error testing on your training data Overfitting expect future events to be too much like the events on which it was trained rather than allowing sufficiently for other possibilities Training data Validation data Testing data Development test data Final test data Don t report results on just one test set but average of many and report variance use a test of statistical significance Heldout Estimator with Crossvalidation Reshuffle the training data several times into pots of training and validation data Calculate ph0w1w2 for each split then average them Tim Tlll ll 1 Where 139 7391 U l quot l quot IV I IV pml 1391 51 Extreme case of crossvalidation leaveoneout cross validation Train on N1 of the words validate on 1 word How much probability mass would be reserved for the unseen words in this case GoodTuring Smoothing Derivation reflects leaveoneout estimation For each word token in data call it the validation set remaining data is training set The validationset word has count r in training set See how often any word has r counts in training set How many different words have count r This will happen every time word left out has r1 counts in original data 80 total count mass of r count words is assigned from mass of r1 count words Nr1 X r1 Apply to low counts not needed harmful for high count words GoodTuring smoothing All words with same count get same probability as before Count mass of words with r1 occurrences is assigned to words with r occurrences r is corrected frequency estimates for a word occurring rtimes 17 139 1quotI E JVI39139 13 gfifu lugjl Estimated frequencies in AP newswire Church amp Gale 1991 rfM LE femprical fLap fdel fGT 0 0000027 0000137 0000037 0000027 1 04480 000274 0396 0446 2 125 0000411 124 126 3 224 0000548 223 224 4 323 0000685 322 324 5 421 0000822 422 422 6 523 0000959 520 519 7 621 000109 621 621 8 721 000123 718 724 Andrew McCallum UMass Amherst Differentiating based on history So far the methods considered have all used nothing but the raw frequency of an ngram the large Pargethe the mauve Pmauvelthe if C the large C the mauve eg 0 then Pargethe Pmauvelthe This doesn t seem right Also use frequency of its n1 gram parge pmauve Linear Interpolation Estimate probability of an ngram from a weighted average of oworderhigh order ngrams 1 Plimrnlmnig lt39nil A 1P1mn A2PZ1LV77 IL N1 A3Pirr77lu39nil 1372 where Ns sum to 1 set Ns by hand or from heldout data they can be functions of equivalenceclassed histories Also known as shrinkage Stein 1957 Works surprisingly well mwcmmmm Assigning Probability to the Unseen Event At test time see word u that wasn t seen at training time Pu Replace all singleton word tokens in training data with special token ltUNKgt Smoothing Rest of the story Other methods backoff Katz 1987 Try fourgram if zerocount try trigram if zerocount try bigram Kneser and Ney 1995 Backoff ngram counts not proportional to frequency of ngram in training data but to expectation of how often it should occur in novel trigram since one only uses backoff estimate when trigam not found WittenBell discounting Smoothed maximum entropy models See Chen and Goodman 1998 for a survey Progress in the field is often dominated not by the need to create fancier more complex models but by the need to do a good job of estimating parameters for the simpler models we already have Statistical Language Modeling 39 a a 7 ft Noam Chomsky Fred Jelinek But it must be recognized that the notion of quotprobability of a sentence is an entirely useless one under any known interpretation of the term 1969 Anytime a linguist leaves the group the speech recognition rate goes up While at IBM speech group 1988 Andrew McCallum UMass Amherst Progress in the field is often dominated not by the need to create fancier more complex models but by the need to do a good job of estimating parameters for the simpler models we already have Real benefit comes from targeted enhancements and sharp tool set of excellent estimation techniques Distinctiveness of NLP as an ML problem Most structure is hidden Relational constraint satisfaction nature Long pipelines with cascading errors Large and strange sparse discrete distributions Large scale Featuredriven performance driven HW4 As usual your choice Naive Bayes Classifier Spam vs Ham English vs French vs Spanish vs Klingon Sliding window PartofSpeech tagger Ngram language model Train and generate language Use for spelling correction there vs their Accuracy Evaluation HW4 Help Result of running classifier on a test set filename trueclass predclass ppredclassdoc filename trueclass predclass ppredclassdoc filename trueclass predclass ppredclassdoc true spam true ham pred spam TP FP pred ham FN TN Accuracy TPTN TPTNFPFN Precision TP TPFP Recall TP TPFN F1 harmonic mean of Precision amp Recall Andrew McCallum UMass Amherst PrecisionRecall Curve Result of running classifier on a test set filename trueclass predclass ppredclassdoc filename trueclass predclass ppredclassdoc filename trueclass predclass ppredclassdoc HW4 Help true spam true ham pred spam TP FP pred ham FN TN Accuracy TPTN TPTNFPFN Precision TP TPFP Recall TP TPFN F1 harmonic mean of Precision amp Recall Andrew McCallum UMass Amherst AccuracyCoverage Curve Result of running classifier on a test set filename trueclass predclass ppredclassdoc filename trueclass predclass ppredclassdoc filename trueclass predclass ppredclassdoc HW4 Help true spam true ham pred spam TP FP pred ham FN TN Accuracy TPTN TPTNFPFN Precision TP TPFP Recall TP TPFN F1 harmonic mean of Precision amp Recall Andrew McCallum UMass Amherst HW4 Help Working with logprobabilities piM X C Hpuv2ji 1 1 gpl oc 10g1 Z lt ggwjM i Getting back to pcd Subtract a constant to make all nonpositive eXpO Probability Lecture 7 Introduction to Natural Language Processing CMPSCI 585 Fall 2007 University of Massachusetts Amherst Andrew McCaIIum Andrew McCallum UNIass Amherst Today s Main Points Remember or learn about probability theory samples events tables counting Bayes Rule and its application A little calculus random variables Bernoulli and Multinomial distributions the work horses of Computational Linguistics Multinomial distributions from Shakespeare Probability Theory Probability theory deals with predicting how likely it is that something will happen Toss 3 coins how likely is it that all come up heads See phrase more lies ahead how likely is it that lies is noun See Nigerian minister of defense in email how likely is it that the email is spam See Le chien est noir how likely is it that the correct translation is The dog is black Probability and CompLing Probability is the backbone of modern computational linguistics because Language is ambiguous Need to integrate evidence Simple example which we will revisit later see the first word of a news article glacier What is the probability the language is French EngHsh Now see the second word melange Now what are the probabilities Experiments and Sample Spaces Experiment or trial repeatable process by which observations are made eg tossing 3 coins Observe basic outcome from sample space 9 set of all possible basic outcomes eg Andrew McCallum UMass Amherst one coin toss sample space 9 H T basic outcome H or T three coin tosses Q HHH HHT HTH TTT Partof speech of a word 9 CC1 CD2 CT3 WRB36 lottery tickets Q 107 next word in Shakespeare play Q size of vocabulary number of words in your PhD thesis 9 0 1 H H length of time of a sounds when I said sample 00 discrete countany infinite 39continuous uncountably infinite Events and Event Spaces An event A is a set of basic outcomes Le a subset of the sample space 9 Intuitively a question you could ask about an outcome S2 HHH HHT HTH HTT THH THT TTH TTT eg basic outcome THH eg event has exactly 2 H s ATHH HHT HTH AQ is the certain event A is the impossible event For not A we write K A common event space F is the power set of the sample space 9 power set is written 29 lntuitively all possible questions you could ask about a basic outcome Probability A probability is a number between 0 and 1 0 indicates impossibility 1 indicates certainty A probability function P or probability distribution assigns probability mass to events in the event spaceF P F a 01 PQ 1 Countable additivity For disjoint events Aj in F PUj Aj 2 PAj We call PA the probability of event A Welldefined probability space consists of sample space 9 event space F probability function P Probability more intuitively Repeat an experiment many many times Let T number of times Count the number of basic outcomes that are a member of event A Let C this count The ratio CT will approach some unknown but constant value Call this constant the probability of event A write it PA Why is the probability this ratio of counts Stay tuned Maximum likelihood estimation at end Example Counting A coin is tossed 3 times What is the likelihood of 2 heads Experiment Toss a coin three times 9 HHH HHT HTH HTT THH THT TTH TTT Event basic outcome has exactly 2 H s A THH HTH HHT Run experiment 1000 times 3000 coin tosses Counted 373 outcomes with exactly 2 H s Estimated PA 3731000 0373 Example Uniform Distribution A fair coin is tossed 3 times What is the likelihood of 2 heads Experiment Toss a coin three times 9 HHH HHT HTH HTT THH THT TTH TTT Event basic outcome has exactly 2 H s A THH HTH HHT Assume a uniform distribution over outcomes Each basic outcome is equally likely PHHH PHHT PTTT PA A Q 38 0375 Probability again A probability is a number between 0 and 1 0 indicates impossibility 1 indicates certainty A probability function P or probability distribution distributes probability mass of 1 throughout the event spaceF P F a 01 PQ 1 Countable additivity For disjoint events Aj in F PLJj Aj 2 PAj The above are axioms of probability theory Immediate consequences PQ 0 l3A 1 PA A Q B gt PA s PB 2a 69 Pa 1 for a basic outcome Vocabulary Summary Experiment a repeatable process Sample a possible outcome Sample space all samples for an experiment Event a set of samples Probability assigns a probability to each sample distribution Uniform all samples are equiprobable distribution Collaborative Exercise You roll a fair die then roll it again What is the probability that you get the same number from both rolls Explain in terms of event spaces and basic outcomes Joint and Conditional Probability Joint probability of A and B PA m B is usually written PAB Conditional probability of A given B PAB PAB PB S2 Updated probability of an event given some evidence P A prior probability of A PAB posterior probability of A given evidence B Joint Probability Table What does it look like under the hood Pprecipitation temperature sun rain sleet snow 108 009 000 000 001 208 008 000 000 002 308 005 001 001 003 408 006 003 001 000 508 006 004 000 000 608 006 004 000 000 708 007 003 000 000 808 007 003 000 000 908 008 002 000 000 1005 008 002 000 000 it takes 40 numbers Andrew McCallum UMass Amherst What does it look like under the hood Conditional Probability Table Pprecipitation temperature Andrew McCallum UMass Amherst 105 205 305 405 505 605 70s 805 905 1005 SUh rain it takes 40 numbers Two Useful Rules Multiplication Rule PAB F AIB PB equivalent to conditional probability definition from previous slide Total Probability Rule Sum Rule PA PAB PAB or more generally if B can take on n values PA 2i1n PABi from additivity axiom Bayes Rule PAB PBA since PA o B PB o A Therefore PAB PB PBA PA and thus PAB PBIAPA PB Normalizing constant Bayes Rule lets you swap the order of the dependence between events calculate PAB in terms of PBIA Reverend Thomas Bayes Rumored to have been tutored by De Moivre Was elected a Fellow of the Royal Society in 1742 despite the fact that at that time he had no published works on mathematics Essay towards solving a problem in the doctrine of chances published in the Philosophical I Transactions of the Royal Society 702 1761 Of London in 1764 g 4 Same year Mozart wrote his symphony 1 in Eflat Andrew McCallum UMass Amherst Independence Can we compute PAB from PA and PB Recall PAB PBA PA multiplication rule We are almost there How does PBA relate to PB PBA PB iff B and A are independent Examples Two coin tosses Color shirt I m wearing today what Bill Clinton had for breakfast Two events A B are independent from each other if PAB PA PB Equivalent to PB PBA if PA i 0 OthenNise they are dependent 1 0s 20s 30s 40s 50s 60s 70s 80s 90s 1 00s Joint Probability with Independence Independence means we need far fewer numbers Pprecipitation temperature 009 008 005 006 006 006 007 007 008 008 rain 000 000 001 003 004 004 003 003 002 002 sleet 000 000 001 001 000 000 000 000 000 000 it takes 40 numbers Andrew McCallum UMass Amherst SHOW 001 002 003 000 000 000 000 000 000 000 Pprecipitation Ptemperature 10s 20s 30s 40s 50s 60s 70s 80s 90s 100s sun 05 it takes 14 numbers rain 03 sleet 005 snow 015 03 rim 2 gt9 gt9 gt9 gtV u 2 gt9 gt9 gt9 gtv 2gt9 gt9 gt9 gt3 2 9805 8 Egtwv u 22 19 03 WEm 2 gt9 gt9 gt9 gtV u 2 gt9 gt9 gt9 gtV 2gtN gt9 gt9 gtV 2gt9 gt9 gtV 03 rim 2 gt9 gt9 gt9 gtV u 2 gt9 gt9 gt9 gtv 2gtN gt9 gt9 gtV 2gtw gt9 gtV 15 9232306 PEP m6 m 3253qu 03 0 03m Chain Rule If A1An are all independent from each other PA1 A2 A3 A4 An PA1 PA2 PA3 39F539ltAngt Example Two ways same answer A fair coin is tossed 3 times What is the likelihood of 3 heads Experiment Toss a coin three times 9 HHH HHT HTH HTT THH THT TTH TTT Event basic outcome has exactly 3 H s A HHH Chain rule PHHH PH PHH PHHH PH PH PH 123 18 Size of event spaces PHHH A Q 18 Collaborative Exercise Suppose one is interested in a rare syntactic construction parasitic gaps which occur on average once in 100000 sentences Peggy Linguist has developed a complicated pattern matcher that attempts to identify sentences with parasitic gaps It39s pretty goodbut its not perfect if a sentence has a parasitic gap it will say so with probability 095 if it doesn39t it will wrongly say so with probability 0005 Suppose the test says that a sentence contains a parasitic gap What is the probability that this is true Finding most likely posterior event for example P lies Noun more lies ahead F B Want to find most likely A given B but PB is sometimes a pain to calculate arg maxA PBAPA arg maxA PBAPA PB because B is constant while changing A Random Variables A random variable is a function X 2 a Q in general Q5R but more generally simply Q R makes it easier to talk about numerical values related to event space Random variable is discrete if Q is countable Example coin Q01 die Q16 Called an indicator variable or Bernoulli trial if Q E 01 Example Suppose event space comes from tossing two dice We can define a random variable X that is the sum of their faces X 2 a 212 Because a random variable has a numeric range we can often do math more easily by working with values of the random variable than directly with events Andrew McCallum UMass Amherst Probability Mass Function pXx PAX where AX a E Q Xax Often written just px when X is clear from context H Write X px for X is distributed according to px In English Probability mass function p maps some value x of random variable X to the probability random variable X taking value x equal to the probability of the event AX this event is the set of all basic outcomes a for which the random variable Xa is equal to x Example again Event space roll of two dice eg alt25gt Q36 Random variable X is the sum of the two faces pX4 PA4 A4 lt13gt lt22gt lt31gt PA4 336 Random variables will be used throughout the Introduction to Information Theory coming next class Andrew McCallum UMass Amherst Expected Value is a weighted average or mean of a random variable EX ZxEXQX 39 pX Example X value of one roll of a fair sixsided die EX 1234566 35 X sum of two rolls EX 7 lfY pYy is a random variable then any function gY defines a new random variable with expected value EgY Eye 9 9M 39 py For example let gY aYb then EgY a EY b EXY EX EY if X and Y are independent EXY EX EY Variance Variance written 02 Measures how consistent the value is over multiple trials How much on average the variable s value differs from the its mean VarX EXEX2 Standard deviation VarX 0 Joint and Conditional Probabilities with Random Variables Joint and Conditional Probability Rules Analogous to probability of events Joint probability pxy PXX Yy Marginal distribution px obtained from the joint pxy pX 2y pxy by the total probability rule Bayes Rule pxly pyIX px py Chain Rule pwxyz W pylz pxlyz pwlxyz Parameterized Distributions Common probability mass functions with same mathematical form just with different constants employed A family of functions called a distribution Different numbers that result in different members of the distribution called parameters I0ab Binomial Distribution A discrete distribution with two outcomes S2 0 1 hence bi nomial Make n experiments Toss a coin n times Interested in the probability that rof the n experiments yield 1 Careful It s not a uniform distribution q prob of H n 19R r m rjq 1qquotquot where n r n rr Pictures of Binomial Distribution binomial nq b1001 b1003 g 020 01 010 00 00 012345678910 01234568910 b1005 b1007 020 020 010 110 00 00 012345678910 01234568910 b1009 b10099 39 08 04 39 00 012345678910 012345678910 poooo O tNlmh Andrew McCallum UMass Annie 5 Multinomial Distribution A discrete distribution with m outcomes 9 012 m Make n experiments Examples Roll a msided die n times Assuming each word is independent from the next generate an nword sentence from a vocabulary of size m Interested in the probability of obtaining counts c 61 c2 cm from the n experiments l PC nQ HWQQ m i1 Unigram language model Parameter Estimation We have been assuming that P is given but most of the time it is unknown 80 we assume a parametric family of distributions and estimate its parameters by finding parameter values most likely to have generated the observed data evidence treating the parameter value as a random variable Not the only way of doing parameter estimation This is maximum likelihood parameter estimation Maximum Likelihood Parameter Estimation Example Binomial Toss a coin 100 times observe r heads Assume a binomial distribution Order doesn t matter successive flips are independent One parameter is q probability of flipping a head Binomial gives prnq We know r and n Find arg man prn q Maximum Likelihood Parameter Estimation Example Binomial Toss a coin 100 times observe r heads Assume a binomial distribution Order doesn t matter successive flips are independent One parameter is q probability of flipping a head Binomial gives prnq We know r and n Find arg man prn q Notes for board likelihood pR r I nq qr1 qu 10g likelihood L 10gprnq0lt10gqr1 qquot r10gq n r10g1 q Lr n r r gtr1qn rq3q aq q 19 Our familiar ratioofcounts is the maximum likelihood estimate Binomial Parameter Estimation Examples Make 1000 coin flips observe 300 Heads PHeads 3001000 Make 3 coin flips observe 2 Heads PHeads 23 Make 1 coin flips observe 1 Tail PHeads 0 Make 0 coin flips PHeads We have some priof belief about PHeads before we see any data After seeing some data we have a posteriof belief Maximum A Posteriori Parameter Estimation We ve been finding the parameters that maximize pdataparameters not the parameters that maximize pparametersdata parameters are random variables pqlm prlnn pqln prlnn pq prn constant And let pq 2 q1q Maximum A Posteriori Parameter Estimation Example Binomial posterior 120 I nqgtpltqgt mm qgtHlt2qlt1 q 10g posterior L OC 10gq 11 q quot1r 110gq n r 110g1 q 07L Mrll qn r1qq 1 ampq q l q n2 Bayesian Decision Theory We can use such techniques for choosing among models Which among several models best explains the data Likelihood Ratio Pmodel1 data Pdatamodel1 Pmodel1 Pmodel2 data Pdatamodel2 Pmodel2 back to our example French vs English pFrench glacier melange versus pEninsh glacier melange We have real data for Shakespeare s Hamlet Charles Dickens Oliver Twist pHamIet hand death pOIiver hand death Continuing Homework Assignment EC for whatever you hand in by tonight For next week More time to create your own grammar Modify parser to keep trace and print parse trees Try an additional grammar of your own creation and investigate ambiguities Work in small teams Document Classification with Probabilistic Language Modeling Temporal reasoning for planning has long been studied formally We discuss the semantics of several planning Prog Lang Garbage Testing Document Categories Machine Learning Training data Neural networks and other machine learning methods of classi cation quot Andrew McCallum UMass Amh Planning Semantics Collection Multimedia 1 39 G b i Plinnmng 1 based on col ctifrel for quotMultimedia User Wit tempora the semantics streamin studies t 1 d g reasonmg of program faLEZigegypS Video for quot of GUI quot has been dependencequot A Probabilistic Approach to Classification Nai39ve Bayes Pick the most probable class given the evidence I I I IIIII I I a class like Planning I a document like language intelligence proof Bayes Rule Na39l39ve Bayes III IIII III l39l39Ill39l39 I 39II IIIquot I III39I I39ll I III IIIquot I IIII IIII III I I H I III the ith word in dlike proof Andrew McCallum UNIass Amherst Chart Parsing Lecture 6 Introduction to Natural Language Processing CMPSCI 585 Fall 2007 University of Massachusetts Amherst Andrew McCaIIum agglomeration of slides from Jason Eisner Andrew McCallum UNIass Amherst Today s Main Points Hand back lnclass Exercise 2 Apologies HW 1 not completed by grader I m giving you an extra day now due Friday Motivations and applications of Parsing Dynamic Programming for Parsing CYK Some handson practice Discuss Programming Assignment 3 Implement CYK and build a grammar Programming languages printf quotcharset squot reOPCOdet p 1 charsetnot quotAquot quotquot assert p p lt pend for c 0 c lt 256 C if C 8 lt p ampamp pl 08 fl ltlt C 8 Are we starting a range if last 1 c ampamp inrange putchar 39 39 inrange l Have we broken a range else if last 1 c ampamp inrange putchar last inrange 0 if inrange putchar c last c Easy to parse Designed that way Andrew McCallum UMass Amherst Natural languages printf quotcharset squot re opcode t p 1 charset not quotAquot quotquot assert p p lt pend for c 07 c lt 256 c if c397 8 lt p ampamp p1 c8 amp 1 ltlt c 8 Are we starting a range if last 1 c ampamp inrange putchar 39 39 inrange 1 Have we broken a range else if last 1 c ampamp inrange putchar last inrange 0 if inrange putchar c last C No to indicate scope amp precedence Lots of overloading arity varies Grammar isn t known in advance Contextfree grammar not best formalism Andrew McCallum UMass Amherst mnnlm gt IIIIII gt test i i i i I gt sentences The parsing problem correct test trees p l A S c R 0 S gt gt g gt accuracy E r R gt Grammar Andrew McCallum UMass Amherst Applications of parsing 12 Machine translation Alshawi 1996 Wu 1997 tree Engllsh g 2W gt Ch1nese Speech synthesis from parses Prevost1996 The government plans to raise income tax The government plans to raise income tax the imagination Speech recognition using parsing Chelba et al 1998 Put the le in the folder Put the le and the folder Andrew McCallum UMass Amherst Applications of parsing 22 Grammar checking Microsoft Indexing for information retrieval Woods 1997 washin a car With a hose vehicle maintenance g gt Information extraction Hobb31996 Milleret al 2000 NY Times gt J Egg E 152132193153 archive W 4 WW pattern Andrew McCallum UMass Amherst Parsing State of the Art Recent parsers quite accurate eg A MaximumEntropylnspired Parser Eugene Charniak Proceedings of NAACLZOOO Three Generative Lexicalised Models for Statistical Parsing Michael Collins Proceedings of ACL 1997 Most sentences parsed correctly or with one error Last class We defined a CFG where it sits in the Chomsky hierarchy Talked about parsing as search through an exponential number of possible trees Gave examples of bottomup and topdown search Discussed problems Infinite loop with leftrecursive rules Much duplicated work in exponential space backtracking Dynamic Programming for Parsing Given CFG in Chomsky Normal Form and an input string we want to search for valid parse trees What are the intermediate subproblems What would the dynamic programming table lookae CKY algorithm recognizer version Input string of n words I Output yesno since it s only a recognizer Data structure n x n table rows labeled 0 to n1 columns labeled 1 to n cell ij lists possible constituents spanning words between i and CKY algorithm recognizer version fori 1 to n Add to i1i all partof speech categories for the ith word for width 2 to n for start O to nwidth Define end start width for mid start1 to end1 for every constituent X in startmid for every constituent Y in midend for all ways of combining X and Y if any Add the resulting constituent to startendquotrHt srrotal1eady there Andrew McCallum UMass Amherst time 1 flies 2 like 3 an 4 arrow 5 NP 3 Vst 3 0 NP 4 1 VP 4 P 2 v 3 Det 1 4 OWNI NI NOWH NP 9 time Vst a time NP 9 flies VP 9 flies P a like V a like Det a an N a arrow SeNPVP SeVstNP SeSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 Vst 3 SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP NP4 1 VP4 3 Det 1 OWNI LNI LNCDI I time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 Vst 3 SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP NP4 1 VP4 3 Det 1 OLUNI INI INCDI I time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 Vst 3 S 8 SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP NP4 1 VP4 3 Det 1 OLJIJI IIJI IIJCJI L time 1 flies 2 like 3 an 4 arrow 5 NP3 NP10 Vst3 s 8 S13 0 NP4 1 VP4 lt Det 1 OWNl LNl LNmI L SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like u an 4 arrow 5 NP3 NP10 Vst3 s 8 S13 0 NP4 1 VP4 Det 1 OWNI LNHNmH SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like u an 4 arrow 5 NP 3 NP 1o Vst 3 s 8 s 13 o NP 4 1 VP 4 P 2 v 3 Det 1 NP 1o 4 N 3 OWNI LNHNmH SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDdN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like u an 4 arrow 5 NP 3 NP 1o Vst 3 s 8 s 13 o NP 4 1 VP 4 P 2 v 3 Det 1 NP 1o 4 N 8 OWNI LNHNmH SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 Vst 3 S 8 S 13 0 NP 4 1 VP 4 P PP 12 2 v 3 Det 1 NP 10 4 N 8 OWNI LNI LNCDI I SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 Vst 3 S 8 S 13 0 NP 4 1 VP 4 P PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OWNl Nl le L SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 Vst 3 S 8 S 13 0 NP 4 1 VP 4 P PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OWNI LNI LNCDI I SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 Vst 3 S 8 S 13 0 NP 4 NP 18 1 VP 4 P PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OWNI LNI LNCDI I SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 Vst 3 S 8 S 13 0 NP 4 NP 18 1 VP 4 s 21 P PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OLJIJI IIJI IIJCJI L SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 Vst 3 S 8 S 13 0 NP 4 NP 18 1 VP 4 s 21 VP 18 P PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OWNI LNI LNCDI I SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 Vst 3 S 8 S 13 0 NP 4 NP 18 1 VP 4 s 21 VP 18 P PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OWNI LNI LNCDI I SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 NP 24 Vst 3 S 8 S 13 0 NP 4 NP 18 1 VP 4 s 21 VP 18 P PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OWNI LNI LNCDI I SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 NP 24 Vst 3 S 8 S 22 S 13 0 NP 4 NP 18 1 VP 4 s 21 VP 18 P PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OLJIJI IIJI IIJCJI L SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 NP 24 Vst 3 S 8 S 22 S 13 S 27 0 NP 4 NP 18 1 VP 4 s 21 VP 18 P PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OWNI LNI LNmI L SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 NP 24 Vst 3 S 8 S 22 S 13 S 27 0 NP 4 NP 18 1 VP 4 s 21 VP 18 P PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OWNI LNI LNCDI I SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 NP 24 Vst 3 S 8 S 22 S 13 S 27 0 NP 24 NP 4 NP 18 1 VP 4 s 21 VP 18 P 2 PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OWNI LNI LNCDI I SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 NP 24 Vst 3 S 8 S 22 S 13 S 27 0 NP 24 S 27 NP 4 NP 18 1 VP 4 s 21 VP 18 P 2 PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OLJIJI IIJI IIJCJI L SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 NP 24 Vst 3 S 8 S 22 S 13 S 27 0 NP 24 S 27 S 22 NP 4 NP 18 1 VP 4 s 21 VP 18 P 2 PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OWNI LNI LNCDI I SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 NP 24 Vst 3 S 8 S 22 S 13 S 27 0 NP 24 S 27 S 22 S 27 NP 4 NP 18 1 VP 4 s 21 VP 18 P 2 PP 12 2 v VP 16 3 Det 1 NP 10 4 N 8 OWNI LNI LNCDI I SeNPVP SVstNP SSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP Follow backpointers time 1 flies 2 like 3 an 4 arrow 5 NP 3 NP 10 Vst 3 S 8 S 13 24 22 27 24 27 22 27 NP4 1 VP4 18 21 18 12 16 Det 1 10 8 OWNI NI NOWH SeNPVP SeVstNP SeSPP VPeVNP VPeVPPP NPeDaN NP gtNPPP NP gtNPNP PPePNP time 1 flies 2 like 3 an 4 arrow 5 quotP VP NP 3 NP 1o NP 24 Vst 3 s 8 s 22 s 13 s 27 o NP 24 s 27 s 22 s 27 1 SeNP VP NP 4 NP 18 6 SeVst NP 1 VP 4 s 21 2 SeSPP VP 18 1 VPeVNP P 2 PP 12 2 VPeVP PP 2 v 5 VP 16 1NPeDetN 3 Det 1 NP 1o 2 NPeNP PP N 8 3 NPeNP NP 0 PP9 PNP time 1 flies 2 like 3 an 4 arrow 5 NP 3 Vst 3 NP 10 S 8 S 13 24 22 27 24 27 22 27 NP4 VP4 18 21 18 12 16 Det 1 10 OUONI LNI LNCDI L s NP VP VP PP SeNPVP SeVstNP SeSPP VPeVNP VPeVPPP NPeDetN NPe NP PP NPe NP NP PPePNP gt0 time 1 flies 2 like 3 an 4 arrow 5 quotP VP NP 3 NP 1o NP 24 Vst 3 s 8 s 22 VP 7 s 13 s 27 P NP 0 NP 24 s 27 s 22 s 27 1 SeNP VP NP 4 NP 18 6 SeVst NP 1 VP 4 s 21 2 SeSPP VP 18 1 VPeVNP P 2 PP 12 2 VPeVP PP 2 V5 VP 16 1NPeDetN 3 Det 1 NP 1o 2 NPeNP PP N 8 3 NPeNP NP 0 PPe PNP gt0 time 1 flies 2 like 3 an 4 arrow 5 NP VP NP 3 NP 1o NP 24 S 13 S 27 P NP 0 NP 24 A S 27 Det N s 22 s 27 1 SeNP VP NP 4 NP 18 6 SeVst NP 1 VP 4 s 21 2 SeSPP VP 18 1 VPeVNP P2 PP 12 2VPeVPPP 2 V5 VP 16 1NPDetN 3 Det 1 NP 1o 2 NPeNP PP N 3 3 NPeNP NP 0 PPe PNP CMPSCI 585 Inclass Exercise 3 Name Student ID Fill in the CYK dynamic programming table to parse the sentence below In the bottom right corner draw the two parse trees she eats fish with chop sticks s a NP VP 0 1 2 3 4 5 NP 9 NP PP VP 9 V NP VP 9 VP PP O NP PP a P NP 1 2 3 4 Andrew McCallum UMass Amherst NP 9 she NP 9 fish NP a fork NP 9 chopsticks V a eats V a fish P a with CMPSCI 591 N Inclass Exercise 3 Name Student ID Fill in the CYK dynamic programming table to parse the sentence below In the bottom right corner draw the two parse trees she eats fish with chop sticks s a NP VP NP 9 she 0 1 2 3 4 5 NP 9 NP PP NP 9 fish VP 9 V NP NP a fork S NP VP VP 9 VP PP NP 9 chopsticks O NP 8 NPVP 8 PP a P NP V a eats NP VP V a fish P a with 1 V VNP VNP VP VP PP NP 2 V NP PP 3 P PNP NP 4 Andrew McCallum UMass Amherst Homework 3 Implement CYK Create a grammar Experiment with it Due Thursday Dynamic Programming Parsing 2 How about a dynamic programming solution for arbitrary CGF grammars Grammars not in Chomsky Normal Form Earley Parser 1970 Nice combination of dynamic programming incremental interpretation avoids infinite loops no restrictions on the form of the contextfree grammar A gt B C the D of causes no problems On3 worst case but faster for many grammars Uses left context and optionally right context to constrain search Overview of the Algorithm Finds constituents and partial constituents in input A e B C D E is partial only the first half of the A A x x BC DxE D At A s x s s s s B C D AA AA A ABCDE ABCDE Overview of the Algorithm Proceeds incrementally lefttoright Before it reads word 5 it has already built all hypotheses that are consistent with first 4 words Reads word 5 amp attaches it to immediately preceding hypotheses Might yield new constituents that are then attached to hypotheses immediately preceding them EgattachintherBCDEgivesAeBCDE Attaching E to that gives A a B C D E Now we have a complete A that we can attach to hypotheses immediately preceding the A etc Andrew McCallum UMass Amherst The Parse Table Columns 0 through n corresponding to the gaps between words Entries in column 5 look like 3 NP 9 NP PP but we ll omit the 9 etc to save space Built while processing word 5 Means that the input substring from 3 to 5 matches the initial NP portion of a NP gt NP PP rule Dot shows how much we ve matched as of column 5 Perfectly fine to have entries like 3 VP gt is it true that S Andrew McCallum UMass Amherst The Parse Table Entries in column 5 look like 3 NP 9 NP PP What will it mean that we have this entry Unknown right context Doesn t mean we ll necessarily be able to find a VP starting at column 5 to complete the S Known left context Does mean that some dotted rule back in column 3 is looking for an S that starts at 3 So if we actually do find a VP starting at column 5 allowing us to complete the S then we ll be able to attach the S to something And when that something is complete it too will have a customer to its left In short a topdown ie goaldirected parser it chooses to start building a constituent not because of the input but because that s what the left context needs In the spoon won t build spoon as a verb because there s no way to use a verb there So any hypothesis in column 5 could get used in the correct parse if words 15 are continued in just the right way by words 6n Andrew McCallum UMass Amherst Earley s Algorithm recognizer version Add ROOT gt S to column 0 For each j from O to n For each dotted rule in column j including those we add as we go look at what s after the dot If it s a word w SCAN lfw matches the input word between and j1 advance the dot and add the resulting rule to column j1 If it s a nonterminal X PREDICT Add all rules forX to the bottom of column j wth the dot at the start eg X gt YZ If there s nothing after the dot ATTACH We ve finished some constituent A that started in column ltj So for each rule in column j that has A after the dot Advance the dot and add the result to the bottom of column j Output yes just if last column has ROOT gt S NOTE Don t add an entry to a column if it s already there Andrew McCallum UMass Amherst Summary of the Algorithm Process all hypotheses one at a time in order Current hypothesis is shown in blue This may add to the end of the todo list or try to add i l again Process a hypothesis according to what follows the dot If a word scan input and see if it matches If a nonterminal predict ways to match it we ll predict blindly but could reduce of predictions by looking ahead k symbols in the input and only making predictions that are compatible with this limited right context If nothing then we have a complete constituent so attach it to all its customers AGmmmm S e NP VP NP 9 Papa NPe Det N N e caviar NPe NP PP N a spoon VP 9 V NP V 9 ate VP 9 VP PP P a with PPe P NP Det a the Det a a An Input Sentence Papa ate the caviar With a spoon initialize Remember this stands for O ROOT a S o OROOTS predict the kind of S we are looking for Remember this stands for O S a NP VP 0 OROOTS OS NP VP predict the kind of NP we are looking for actually we I look for 3 kinds any of the 3 will do 0 OROOTS OSNPVP 0NPDetN 0NPNPPP 0NPPapa predict the kind of Det we are looking for 2 kinds 0 OROOTS OSNPVP predict the kind of NP we re looking for but we were already looking for these so 0 Bet the don t add duplicate goals Note that this happened 0 Bet a when we were processing a leftrecursive rule 0 Papa 1 OROOTS OSNP VP 0NPDetN ONP NP PP ONP Papa scan the desired word is in the input 0Detthe 0Det a 0 Papa 1 OROOTS 0 NP Papa OSNPVP 0NPDetN 0NPNPPP 0 NP Papa O Det the scan failure 0Deta 0 Papa 1 OROOTS 0 NP Papa OSNPVP 0NPDetN 0NPNPPP 0 NP Papa 0 Det the 0Deta scan failure 0 Papa 1 ONP Papa OROOTS OSNPVP 0NPDetN 0NPNPPP 0NPPapa 0Detthe 0Deta attach the newly created NP which starts at 0 to its customers incomplete constituents that end at O and have NP after the clot 0 Papa 1 0 ROOT S 0 NP Papa OSNPVP OSNPVP predict 0NPDetN 0NPNPPP ONP NP PP 0 NP Papa 0 Det the 0 Det a 0 Papa 1 OROOTS ONP Papa OSNPVP OSNPVP 0NPDetN 0NPNPPP 0NPNPPP 1VPVNP 0NPPapa 1VPVPPP 0Detthe 0Deta predict 0 Papa 1 OROOTS ONP Papa OSNPVP OSNPVP 0NPDetN 0NPNPPP 0NPNPPP 1VPVNP 0NPPapa 1VPVPPP 0Detthe 1PPPNP 0Deta predict 0 Papa 1 0 ROOT S 0 NP Papa OSNPVP OSNPVP 0NPDetN 0NPNPPP ONP NP PP 0 NP Papa predict 0Detthe 1PPPNP 0Det a 1Vate 0 Papa 1 OROOTS ONP Papa OSNPVP OSNPVP 0NPDetN 0NPNPPP 0NPNPPP 1VPVNP 0NPPapa 1VPVPPP 0Detthe 1PPPNP 0Deta 1Vate predict 0 Papa 1 ate 2 0 ROOT S 0 NP Papa OSNPVP OSNPVP 0NPDetN 0NPNPPP 0NPNPPP 1VPVNP 0NPPapa 1VPVPPP 0Detthe 1PPPNP 0 Det a 1 v ate scan success 1PWith 0 Papa 1 ate 2 OROOTS ONP Papa 1Vate OSNPVP OSNPVP 0NPDetN 0NPNPPP 0NPNPPP 1VPVNP 0NPPapa 1VPVPPP 0Detthe 1PPPNP 0Deta 1Vate 1Pwith scan fai lure 0 Papa 1 ate 2 0 ROOT S 0 NP Papa 1 V ate attach OSNPVP OSNPVP 0NPDetN 0NPNPPP 0NPNPPP 1VPVNP 0NPPapa 1VPVPPP 0Detthe 1PPPNP 0Deta 1Vate 1PWith 0 OROOTS OSNPVP 0NPDetN 0NPNPPP 0NPPapa 0 Detthe 0Deta Papa 2 1Vate 1VPVNP 1 ate 0NPPwa OSNPVP 0NPNPPP 1VPVNP 1VPVPPP 1PPPNP 1Vam 1 Pwnh predict 0 Papa 1 ate 2 OROOTS ONP Papa 1Vate OSNPVP OSNPVP 1VPVNP 0NPDetN 0NPNPPP 2NPDetN 0NPNPPP 1VPVNP 2NPNPPP 0NPPapa 1VPVP PP 2NPPapa 0Detthe 1PPPNP I 0Deta 1Vate 1PWith predict these next few steps should look familiar 0 Papa 1 ate 2 OROOTS ONP Papa 1Vate OSNPVP OSNPVP 1VPVNP 0NPDetN 0NPNPPP 0NPNP PP 1VPVNP predict 0NPPapa 1VPVPPP 0Detthe 1PPPNP 2Detthe 0Deta 1Vate 2Deta 1Pwith 0 Papa 1 ate 2 OROOTS ONP Papa 1Vate OSNPVP OSNPVP 1VPVNP 0NPDetN 0NPNPPP 2NPDetN 0NPNPPP 1VPVNP 2NPNPPP 0NPPapa 1VPVP PP 2NPPapa 0Detthe 1PPPNP 2Detthe 0Deta 1Vate 2Deta 1PWith scan this time we fail since Papa is not the next word 0 Papa 1 ate 2 the 3 OROOTS ONP Papa 1Vate i OSNPVP OSNPVP 1VPVNP 0NPDetN 0NPNPPP 2NPDetN 0NPNPPP 1VPVNP 2NPNPPP 0 NP Papa 1 VP VP PP 2 NP Papa 0 Det the 1 PP P NP 2 Det the scan success 0Deta 1Vate 2Deta 1PWith 0 Papa 1 ate 2 the 3 OROOTS ONP Papa 1Vate 2Detthe OSNPVP OSNPVP 1VPVNP 0NPDetN 0NPNPPP 2NPDetN 0NPNPPP 1VPVNP 2NPNPPP ONP Papa 1VPVP PP 2NP Papa 0Detthe 1PPPNP 2Detthe 0Deta 1Vate 2Deta 1PWith 0 Papa 1 ate 2 the 3 OROOTS 0NPPapa 1Vate 2Detthe OSNPVP OSNPVP 1VPvNP 0NPD N 0NPNPPP 2NPDaN 0NPNPPP 1VPVNP 2NPNPPP 0NPPapa 39IVPVPPP 2NPPapa ODame 1PPPNP 2Dame ODaa 1Vae 2Daa 1PMm 0 Papa 1 ate 2 the 3 OROOTS ONP Papa 1Vate 2Detthe OSNPVP OSNPVP 1VPVNP 2NPDetN 0NPDetN 0NPNPPP 2NPDetN 0NPNPPP 1VPVNP 2NPNPPP ONP Papa 1VPVP PP 2NP Papa 0Detthe 1PPPNP 2Detthe 0Deta 1Vate 2Deta 1PWith 0 Papa 1 ate 2 the 3 caviar 4 OROOTS 0NPPapa 1Vate 2Detthe i OSNPVP OSNPVP 1VPVNP 2NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Nucaviar 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 0NPPapa 1VPVPPP 2NPPapa 0 Det the 1 PP P NP 2 Det the 0Deta 1Vate 2Deta 1PWith 0 Papa 1 ate 2 the 3 caviar 4 OROOTS 0NPPapa 1Vate 2Detthe 3Ncaviar OSNPVP OSNPVP 1VPVNP 2NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 0NPPapa 1VPVPPP 2NPPapa 0 Det the 1 PP P NP 2 Det the 0Deta 1Vate 2Deta 1PWith 0 Papa 1 ate 2 the 3 caviar 4 OROOTS 0NPPapa 1Vate 2Detthe 3Ncaviar OSNPVP OSNPVP 1VPvNP 2NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 0NPPapa 1VPVPPP 2NPPapa 0 Det the 1 PP P NP 2 Det the 0Deta 1Vate 2Deta 1PWith attach 0 Papa 1 ate 2 the 3 caviar 4 OROOTS 0NPPapa 1Vate 2Detthe 3Ncaviar OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 0NPPapa 1VPVPPP 2NPPapa 0 Det the 1 PP P NP 2 Det the 0Deta 1Vate 2Deta 1PWith attach again 0 Papa 1 ate 2 the 3 caviar 4 OROOTS ONP Papa 1Vate 2Detthe 3 Ncaviar OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP ONP Papa 1VP VP PP 2NP Papa 0Detthe 1PPPNP 2Detthe 0Deta 1Vate 2Deta 1PWith attach again 0 Papa 1 ate 2 the 3 caviar 4 OROOTS ONP Papa 1Vate 2Detthe 3 Ncaviar OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 0Detthe 1PPPNP 2Detthe 1VPVPPP 0Deta 1Vate 2Deta 1PWith 0 Papa 1 ate 2 the 3 caviar 4 OROOTS ONP Papa 1Vate 2Detthe 3 Ncaviar OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 0Detthe 1PPPNP 2Detthe 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 1Pwith attach again 0 Papa 1 ate 2 the 3 caviar 4 0 ROOT S 0 NP Papa 1 V ate 2 Det the 3 N caviar OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 0Detthe 1PPPNP 2Detthe 1VPVPPP 0 Det a 1 V ate 2 Det a 41 1F 1 4 1Pwith OROOTS 0 Papa 1 ate 2 the 3 caviar 4 OROOTS ONP Papa 1Vate 2Detthe 3 Ncaviar OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 0Detthe 1PPPNP 2Detthe 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 1Pwith OROOTS 0 Papa 1 ate 2 the 3 caviar 4 OROOTS ONP Papa 1Vate 2Detthe 3 Ncaviar OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 0Detthe 1PPPNP 2Detthe 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 1Pwith OROOTS 4PWith 0 Papa 1 ate 2 the 3 caviar 4 with 5 OROOTS ONP Papa 1Vate 2Detthe 3 Ncaviar i OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 0Detthe 1PPPNP 2Detthe 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 1Pwith OROOTS 4PWith 0 Papa 1 ate 2 the 3 caviar 4 with 5 OROOTS ONP Papa 1Vate 2Detthe 3 Ncaviar 4Pwith OSNPVP OSNPVP 1VPvNP 2NPDetN 2NPDetN gl 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 0Detthe 1PPPNP 2Detthe 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 1Pwith OROOTS 4PWith 0 Papa 1 ate 2 the 3 caviar 4 with 5 0 ROOT S 0 NP Papa 1 V ate 2 Det the 3 N caviar 4 P with OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 4PPPNP 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 0Detthe 1PPPNP 2Detthe 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 1Pwith OROOTS 4PWith 0 Papa 1 ate 2 the 3 caviar 4 with 5 OROOTS 0NPPapa 1Vate 2Detthe 3Ncaviar 4Pwith OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 4PPPNP 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 5NPDetN 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 5NPPapa 0Detthe 1PPPNP 2Detthe 1VPVPPP 39 0Deta 1Vate 2Deta 4PPPNP 1Pwith 0 ROOTS 4PWith 0 Papa 1 ate 2 the 3 caviar 4 with 5 0 ROOT S 0 NP Papa 1 V ate 2 Det the 3 N caviar 4 P with OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 4PPPNP 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 0 NP Papa 1 VP VP PP 2 NP Papa 0 S NP VP Niquot 0Detthe 1PPPNP 2Detthe 1VPVPPP 5Detthe 0Deta 1Vate 2Deta 4PPPNP 5Deta 1Pwith OROOTS 4PWith 0 Papa 1 ate 2 the 3 caviar 4 with 5 OROOTS ONP Papa 1Vate 2Detthe 3 Ncaviar 4Pwith OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 4PPPNP 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 5NPDetN 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVP PP 2NPPapa OSNPVP 5NPPapa 0Detthe 1PPPNP 2Detthe 1VPVPPP 5Detthe 0Deta 1Vate 2Deta 4PPPNP 5Deta 1Pwith OROOTS 4PWith 0 Papa 1 ate 2 the 3 caviar 4 with 5 OROOTS ONP Papa 1Vate 2Detthe 3 Ncaviar 4Pwith OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 4PPPNP 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 5NPDetN 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVP PP 2NPPapa OSNPVP 5NPPapa 0Detthe 1PPPNP 2Detthe 1VPVPPP 5Detthe 0Deta 1Vate 2Deta 4PPPNP 5Deta 1Pwith OROOTS 4PWith 2 the 3 caviar 4 with 5 a 6 1Vate 2Detthe 3Ncaviar 4Pwith 1VPVNP 2NPDetN 2NPDetN 4PPPNP 2NPDetN 3Ncaviar 1VPVNP 5NPDetN 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 2NPPapa OSNPVP 5NPPapa 2Detthe 1VPVPPP 5Detthe 2Deta 4PPPNP 5Deta OROOTS 4PWith 2 the 3 caviar 4 with 5 a 6 1Vate 2Detthe 3Ncaviar 4Pwith 5Deta 1VPvNP 2NPDetN 2NPDetN 4PPPNP 2NPDetN 3Ncaviar 1VPVNP 5NPDetN 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 2NPPapa OSNPVP 5NPPapa 2Detthe 1VPVPPP 5Detthe 2Deta 4PPPNP 5Deta OROOTS 4PWith 2 the 3 caviar 4 with 5 a 1Vate 2Detthe 3Ncaviar 4Pwith 5Deta 1VPVNP 2NPDetN 2NPDetN 4PPPNP 5NPDetN 2NPDetN 3Ncaviar 1VPVNP 5NPDetN r 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 2NPPapa OSNPVP 5NPPapa 2Detthe 1VPVPPP 5Detthe 2Deta 4PPPNP 5Deta OROOTS 4PWith 2 the 3 caviar 4 with 5 a 1Vate 2Detthe 3Ncaviar 4Pwith 5Deta 1VPVNP 2NPDetN 2NPDetN 4PPPNP 5NPDetN 2NPDetN 3Ncaviar 1VPVNP 5NPDetN 6Ncaviar 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 6Nspoon 2NPPapa OSNPVP 5NPPapa 2Detthe 1VPVPPP 5Detthe 2Deta 4PPPNP 5Deta OROOTS 4PWith 2 the 3 caviar 4 with 5 a 6 spoon 7 1Vate 2Detthe 3Ncaviar 4Pwith 5Deta a 1VPVNP 2NPDetN 2NPDetN 4PPPNP 5NPDetN 2NPDetN 3Ncaviar 1VPVNP 5NPDetN 6Ncaviar 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 6Nspoon 2NPPapa OSNPVP 5NPPapa 2Detthe 1VPVPPP 5Detthe 2Deta 4PPPNP 5Deta OROOTS 4PWith 2 the 3 caviar 4 with 5 a 6 spoon 7 1Vate 2Detthe 3Ncaviar 4Pwith 5Deta 6Nspoon 1VPvNP 2NPDetN 2NPDetN 4PPPNP 5NPDetN Q 2NPDetN 3Ncaviar 1VPVNP 5NPDetN 6Ncaviar 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 6Nspoon 2NPPapa OSNPVP 5NPPapa 2Detthe 1VPVPPP 5Detthe 2Deta 4PPPNP 5Deta OROOTS 4PWith 2 the 3 caviar 4 with 5 a 6 spoon 7 1 V ate 2 Det the 3 N caviar 4 P with 5 Det a 6 N spoon 1VPVNP 2NPDetN 2NPDetN 4PPPNP 5NPDetN 5NPDetN 2NPDetN 3Ncaviar 1VPVNP 5NPDetN 6Ncaviar E 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 6Nspoon 2NPPapa OSNPVP 5NPPapa 2 Det the 1 VP VP PP 5 Det the 2Deta 4PPPNP 5Deta 0 ROOT S 4PWith 0 Papa 1 ate 2 the 3 caviar 4 with aspoon 7 0 ROOTS ONP Papa 1Vate 2Detthe 3 Ncaviar 6Nspoon OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 0Detthe 1PPPNP 2Detthe 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 1Pwith OROOTS 4PWith 0 Papa 1 ate 2 the 3 caviar 4 with aspoon 7 0 ROOTS ONP Papa 1Vate 2Detthe 3 Ncaviar 6Nspoon OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 2NPNPPP 0Detthe 1PPPNP 2Detthe 1VPVPPP 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 1Pwith OROOTS 4PWith 0 Papa 1 ate 2 the 3 caviar 4 withaspoon 7 OROOTS 0NPPapa 1Vate 2Detthe 3Ncaviar 6Nspoon OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 2NPNPPP 0Detthe 1PPPNP 2Detthe 1VPVPPP 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 7PPPNP 1Pwith 0 ROOTS 0 Papa 1 ate 2 the 3 caviar 4 withaspoon 7 OROOTS 0NPPapa 1Vate 2Detthe 3Ncaviar 6Nspoon OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 2NPNPPP 0Detthe 1PPPNP 2Detthe 1VPVPPP 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 7PPPNP 1Pwith OROOTS 1VPVNP 4Pwith 2NPNPPP 0 Papa 1 ate 2 the 3 caviar 4 withaspoon 7 OROOTS 0NPPapa 1Vate 2Detthe 3Ncaviar 6Nspoon OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 2NPNPPP 0Detthe 1PPPNP 2Detthe 1VPVPPP 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 7PPPNP 1Pwith OROOTS 1VPVNP 4Pwith 2NPNPPP OSNPVP 1VPVPPP 7Pwith Q 0 Papa 1 ate 2 the 3 caviar 4 withaspoon 7 OROOTS 0NPPapa 1Vate 2Detthe 3Ncaviar 6Nspoon OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 2NPNPPP 0Detthe 1PPPNP 2Detthe 1VPVPPP 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 7PPPNP 1Pwith OROOTS 1VPVNP 4Pwith 2NPNPPP gtOSNPVP 5 0 Papa 1 ate 2 the 3 caviar 4 withaspoon 7 OROOTS ONP Papa 1Vate 2Detthe 3Ncaviar 6Nspoon OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 2NPNPPP 0Detthe 1PPPNP 2Detthe 1VPVPPP 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP P 1Pwith OROOTS 1VPVNP 4Pwith 2NPNPPP OSNPVP 1VPVPPP 7PWith 0 Papa 1 ate 2 the 3 caviar 4 withaspoon 7 OROOTS 0NPPapa 1Vate 2Detthe 3Ncaviar 6Nspoon OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 2NPNPPP 0Detthe 1PPPNP 2Detthe 1VPVPPP 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 7PPPNP 1Pwith OROOTS 1VPVNP 4Pwith 2NPNPPP OSNPVP 1VPVPPP 7Pwith i 0 Papa 1 ate 2 the 3 caviar 4 with a spoon 7 GROUPS 0NPPwa 1Vam 20ame 3NwWw quot6NsmmL OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nqmon 2NPNPPP 5NPNPPP 0NPPam 1VPVPPP 2NPPmm OSNPVP 2NPNPPP 00ame 1PPPNP 20ame 1VPVPPP 1VPVPPP 0 Det a 1 v ate 2 Det a 4 PP P NP 39 P P 1PMm OROOTS 1VPVNP 4PwM1 2NPNPPP OSNPVP 1VPVPPP 7PMm OROOTS 0 Papa 1 ate 2 the 3 caviar 4 withaspoon 7 GROUPS 0NPPwa 1Vam 20ame 3NwWw quot6NsmmL OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nqmon 2NPNPPP 5NPNPPP 0NPPam 1VPVPPP 2NPPmm OSNPVP 2NPNPPP ODame 1PPPNP ZDame 1VPVPPP 1VPVPPP 0D a 1Vae 2D a 4PPPNP 7PPPNP 1PMm OROOTS 1VPVNP 4PwM1 2NPNPPP OSNPVP 1VPVPPP 7PMm OROOTS 0 Papa 1 ate 2 the 3 caviar 4 withaspoon 7 OROOTS 0NPPapa 1Vate 2Detthe 3Ncaviar 6Nspoon OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 2NPNPPP 0Detthe 1PPPNP 2Detthe 1VPVPPP 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 7PPPNP 1Pwith OROOTS 1VPVNP 4Pwith 2NPNPPP OSNPVP 1VPVPPP 7Pwith 10ROOTS 0 Papa 1 ate 2 the 3 caviar 4 withaspoon 7 OROOTS 0NPPapa 1Vate 2Detthe 3Ncaviar 6Nspoon OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 2NPNPPP 0Detthe 1PPPNP 2Detthe 1VPVPPP 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 7PPPNP 1Pwith OROOTS 1VPVNP 4Pwith 2NPNPPP OSNPVP 1VPVPPP 7Pwith IIOROOTS Left Recursion Kills Pure TopDown Parsing VP Andrew McCallum UMass Amherst Andrew McCallum UMass Amherst Left Recursion Kills Pure TopDown Parsing VP VP PP Left Recursion Kills Pure TopDown Parsing VP VP PP VP PP Andrew McCallum UMass Amherst Left Recursion Kills Pure TopDown Parsing VP makes new hypotheses X PP ad infinitum before we ve VP pp seen the PPs at all VP PP hypotheses try to predict in advance how many PP s will arrive in input but Earley s Alg is Okay VP 1 VP gt VP PP vi PP in column I but Earley s Alg is Okay VP 1 VP 9 VP PP VP PP in column I VP V NP ate the caviar 1VPVNP in column 4 Andrew McCallum UMass Amherst but Earley s Alg is Okay VP 1 VP 9 VP PP VP PP in column I attach VP 1 VP 9 VP PP VP PP V NP ate the caviar in column 4 Andrew McCallum UMass Amherst but Earley s Alg is Okay VP 1 VP 9 VP PP VP PP in column I X 1VPeVPPP VP PP with a spoon V NP ate the caviar in column 7 Andrew McCallum UMass Amherst but Earley s Alg is Okay VP 1VPVP PP vi39 pp can be reused in column I X 1VPVPPP VP PP Awith a spoon V NP ate the caviar in column 7 but Earley s Alg is Okay VP 1VPeVP PP vi39 pp can be reused in column I attaChA 1 VP 9 VP PP VP PP VP PP with a spoon V NP ate the caviar in column 7 but Earley s Alg is Okay VP 1VPeVP PP vi39 pp can be reused in column I VP 1VPVPPP VP PP in his bed VP PP with a spoon V NP ate the caviar in column I 0 but Earley s Alg is Okay VP 1VPeVP PP vi39 pp can be reused again in column I VP 1VPVPPP VP PP in his bed VP PP with a spoon V NP ate the caviar in column I 0 but Earley s Alg is Okay VP 1VPeVP PP vi39 pp can be reused again in columrzlgt VP 1 VP 9 VP PP VP PP VP PP in his bed VP PP with a spoon V NP ate the caviar in column I 0 col 1 lets us use it in a VP PP structure 0 Papa 1 ate 2 the 3 caviar 4 withaspoon 7 GROUPS 0NPPwa 1Vam 20ame 3NwWw quot6NsmmL OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nqmon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 2NPNPPP 00ame 1PPPNP 20ame 1VPVPPP 1VPVPPP 0D a 1Vae 2D a 4PPPNP 7PPPNP 1PMm OROOTS 1VPVNP 4PwM1 2NPNPPP OSNPVP 1VPVPPP completed a VP in col 4 7pm OROOTS can reuse col 1 as often as we need 0 Papa 1 ate 2 the 3 caviar 4 withaspoon 7 OROOTS ONP Papa 1Vate 2Detthe 3Ncaviar 6Nspoon OSNPVP OSNPVP 1VPVNP 2NPDetN 2NPDetN 5NPDetN 0NPDetN 0NPNPPP 2NPDetN 3Ncaviar 1VPVNP 4PPPNP 0NPNPPP 1VPVNP 2NPNPPP 3Nspoon 2NPNPPP 5NPNPPP 0NPPapa 1VPVPPP 2NPPapa OSNPVP 2NPNPPP 0Detthe 1PPPNP 2Detthe 1VPVPPP 1VPVPPP 0Deta 1Vate 2Deta 4PPPNP 7PPPNP 1Pwith OROOTS 1VPVNP 4Pwith 2NPNPPP OSNPVP 1VPVPPP completed that VP VP PP in col 7 7pm col 1 would let us use itin a VP PP structure OROOTS How to change the parser into a recognizer What s the Complexity Andrew McCallum UM What s the Complexity How many state sets will there be Length of sentence n How big can the state sets get Size of grammar G times n How long does it take to build a state set Scan Constant time Predict Need to check for duplicates Complete Search previous state set also check for duplicates Gn2 Total On3 ass Amherst The pragmatics of questions and answers Christopher Potts U Mass Amherst Linguistics CMPSCI 585 November 6 2007 UMass Amherst Linguistics The birth of Gricean pragmatics In the early 19605 Chomsky showed us how to give compact general specifications of natural language syntax In the late 19605 philosopher and linguist H Paul Grice had the inspired idea to do the same for rational social interactions Rules and maxims Rules Maxims S i NP VP Quality Above all be truthful NP i NlPN Relevance And be relevant N i hippo Quantity Within those bounds be VP VS S as informative as you can VP i Vtrans NP Manner And do it as clearly and VS i realize concisely as possible Syntactic rules are like physical laws Breaking them should lead to nonsense or falsification l Pragmatic rules maxims are like laws of the land J Breaking them can have noteworthy consequences Pragmatic pressures Maxims Quality Above all be truthful unantlty Relevance And be relevant fr Quantity Within those bounds 39 be as informative as you can manner relevance Manner And do It as clearly and concisely as possible A probable breakthrough Things are looking up Researchers are making significant progress with decision theoretic and game theoretic models The Chief innovation A shift in emphasis from truth conditions to probabilities SUBTLE Situation Understanding Bot Through Language amp Environment S U L E m illranobotImdaclmn Machine Lemming amp Pinbablhsuc Reasnnmg McCallum a Pen39em Tree Adjoining Granmapbased parser Jaxln39 LCan39rn TAG emanuc Analyzer Ramm 42 J05 Computational we must move from robust sentence processing to robust utterance understanding Mitch Marcus Norm Badler Pragmatics PonxwchCalIum Lmenx Temporal Aravnnd Joshi George Pap pas Fernando Pereira Maribel Sensml K Mm39emeul The Bat Romero Penn Andrew and me UMass Amherst Holly 32529 W Yanco UMass Lowell Parametenzed u n Representation Badla A screenshot l alher all he ards u mguhen m he em a Type text here P1 urns remaining 7m P2mms lama 9 7m Cllzk a aid 1 pmk II up v ards cf 3 Darlizulav slut dezlde whlzh suluogelhen in me am arms bottom left 2am am a drup it m Mention all vs mention some Where can We buy supplies Mention all vs mention some Where can we buy supplies Mention all o Context We39re writing a comprehensive guide to the area 0 Resolvedness condition An exhaustive listing of the reasonable shopping places Mention all vs mention some Where can we buy supplies Mention all o Context We39re writing a comprehensive guide to the area 0 Resolvedness condition An exhaustive listing of the reasonable shopping places Mention some 0 Context We39re low on food and water 0 Resolvedness condition Mentioning the best closest safest etc place or a few good options Mention all vs mention some Who has a light Mention all vs mention some Who has a light Mention all o Context Speaker needs to ensure that no one in the group is going to get stopped by airport security 0 Resolvedness condition Listing of everyone who has a light Mention all vs mention some Who has a light Mention all o Context Speaker needs to ensure that no one in the group is going to get stopped by airport security 0 Resolvedness condition Listing of everyone who has a light Mention some o Context Speaker needs to light her cigar o Resolvedness condition Just name one friendly willing nearby person with a lighter Domain restrictions What cards do you have Domain restrictions What cards do you have7 Wide domain 0 Context Speaker dealt the cards and noticed that some were missing 0 Resolvedness condition List everything you re holding Domain restrictions What cards do you have7 Wide domain 0 Context Speaker dealt the cards and noticed that some were missing 0 Resolvedness condition List everything you re holding Narrowed domain 0 Context Speaker folds He wants to know what beat him 0 Resolvedness condition Just name the good cards if there are any Domain restrictions Example Pragth data Did you find anything Domain restrictions Example Pragbot data Context Pbyer2isoohngibr Player 2 Did you find anything J Player 1 yep h at the top exit Domain restrictions Example Pragbot data Context Player 2 is looking for Domain restrictions Example Pragbqt data Have you found anything Domain restrictions Example Pragbot data Context The players are trying to get six consecutive hearts but they are still in the process of deciding which six Player 2 i got 2H Player 1 I found a 3H Player 2 sweet Have you found anything no just 2H Player Player M Domain restrictions Example Pragbot data Context The players are trying to get six consecutive hearts but they are still in the process of deciding which six Player 2 i got 2H Player 1 I found a 3H Player 2 sweet Player 1 Have you found Player 2 l No suggestion that Player 2 didn39t see 3D KD etc A 100 v v v 6 O 2 Identification conditions Who is Arnold Schwarzenegger Identification conditions Who is Arnold Schwarzenegger The governor of California Identification conditions Who is Arnold Schwarzenegger o A famous bodybuilder Identification conditions Who is Arnold Schwarzenegger 0 The Terminator Identification conditions Who is Arnold Schwarzenegger That guy over there Identification conditions Who is Arnold Schwarzenegger 0 The governor of California 0 A famous bodybuilder 0 The Terminator 0 That guy over there Levels of specificity Who was at the meeting Levels of specificity Who was at the meeting General 0 Context The speaker wants to get a sense for what kind of meeting it was 0 Resolvedness condition Some linguists some computer scientists some mathematicians Levels of specificity Who was at the meeting General 0 Context The speaker wants to get a sense for what kind of meeting it was 0 Resolvedness condition Some linguists some computer scientists some mathematicians Specific o Context The speaker is checking against a list of likely attendees o Resolvedness condition Chris Aan Mitch Granularity Where are you from Granularity Where are you from 0 Massachusetts Granularity Where are you from o The US Granularity Where are you from I Planet earth Granularity Where are you from I Connectcut My birthplace Granularity Where are you from 0 UMass Amherst My university Granularity Where are you from 0 Belgium My ancestral home Granularity Where are you from7 0 Massachusetts 0 The U5 0 Planet earth 0 Connecticut My birthplace o UMass Amherst My university 0 Belgium My ancestral home Granularity Pragbot players can39t see each other but they must coordinate their movements As a result they ask many versions of Where are you They answer at different levels of granularity depending on o the task they were assigned 0 the current subtask o the nature of the environment 0 their current knowledge of each other39s whereabouts Granularity Example Pragbot data Player A Where are you Player B in the first box like region in the center 2nd row Player B I m sort of in the middle Player B still at the right bottom corner o Unattested In the pragbot worldquot redundant in context 0 Attested confusion In Northamptonquot at the start of the game players not yet fully engaged Polarity variation HOW high is a helicopter permitted to fly Polarity variation How high is a helicopter permitted to fly Lower bound 0 Context We need to avoid the powerlines while still getting close to the ground 0 Resolvedness condition The lowest safe height Polarity variation How high is a helicopter permitted to fly Lower bound 0 Context We need to avoid the powerlines while still getting close to the ground 0 Resolvedness condition The lowest safe height Upper bound 0 Context We need to be sure the atmosphere isn39t too thin o Resolvedness condition The highest safe height Gradable modifiers Is the door open Gradable modifiers Is the door open7 Maximal interpretation 0 Context We need to get a vehicle out of the building It barely fits through the doorway o Intended Is the door completely open Gradable modifiers Is the door open7 Maximal interpretation 0 Context We need to get a vehicle out of the building It barely fits through the doorway o Intended Is the door completely open Minimal interpretation 0 Context We need to secure the building 0 Intended Is the door even a little bit open Plurals Are the Windows are open Plurals Are the Windows are open7 Existential reading 0 Context We need to ensure the building is locked up 0 the Windows some of the windows Plurals Are the Windows are open7 Existential reading 0 Context We need to ensure the building is locked up 0 the Windows some of the windows Universal reading 0 Context We are painting the sills 0 the Windows all the windows Pragmatically required over answering Ali calls a hotel Ali ls Lisa Simpson in Room 343 Clerk A She39s in room 400 implicit no Clerk B She checked out yesterday implicit no Clerk C No Pragmatically required over answering Example Pragbot data Can you take the inside of the cube The answers expected by the form yesquot noquot are infelicitous because their content is mutually known The players have freedom of movement Pragmatically required over answering Example Pragbot data Context The players are planning their search strategy Player 1 Can you take the inside of the cube Player 2 ok I take the inside Player 1 I ll look on the outside The answers expected by the form yesquot noquot are infelicitous because their content is mutually known The players have freedom of movement Pragmatically required over answering Example Worcester Cold Storage Fire Context Firemen lost inside a large maze like building with a floor plan that effectively changed by the minute as rooms filled with smoke and walls collapsed Car 3 Okay do we have fire coming up through the roof yet L lPl We have a lot of hot embers blowing through Inferred pragmatically No but A mere no would be disastrous here Devious exploitation of pragmatic inferencing From Bronston V United States Do you have any bank accounts in Swiss banks Mr Bronston No sir Have you ever The company had an account there for about six months P 9 in Zurich The truth Bronston once had a large personal Swiss account The issue The cooperative inference of the previous slide has gotten us into trouble here Partition semantics and answer values Groenendijk and Stokhof 1982 the semantic value of an interrogative is a partition of the information state W into equivalence classes based on the extension of the question predicate Answering 0 Fully congruent answers identify a single cell 0 Partial answers overlap with more than one cell 0 Over answers identify a proper subset of one of the cells Partition semantics and answer values Groenendijk and Stokhof 1982 the semantic value of an interrogative is a partition of the information state W into equivalence classes based on the extension of the question predicate Answering 0 Fully congruent answers identify a single cell 0 Partial answers overlap with more than one cell 0 Over answers identify a proper subset of one of the cells Groenendijk 1999 is a dynamic logic of questions and their answers ten Cate and Shan 2007 give it a proof theory and show that it is fruitfully thought of from a variety of perspectives Vermeulen 2000 is in a sense an extension to genuinely strategic multi agent settings Polar questions Did Sam Iaugh v E W v E Iaughedsam iff W E Iaughedsam W E W Iaughedsam W7Iaughedsam Answers Polar questions Did Sam Iaugh v E W v E Iaughedsam iff W E Iaughedsam W E W Iaughedsam W 7 Iaughedsam Answers Yes Polar questions Did Sam Iaugh v E W v E Iaughedsam iff W E Iaughedsam W E W Iaughedsam W 7 Iaughedsam Answers No Constituent questions Who Iaughed v e W w Iaughdv iff Iaughdw W e W r f M 6 7 a 29 2g W AL 3 M w in a jw Vab g a jw quotY Q 1 am A Answers Constituent questions Who Iaughed v e W w Iaughdv iff Iaughdw W e W Mr W W 5 f 21 it L1 is 2 W 4 W y xV f f39 x Answers Bart and Lisaquot Constituent questions Who Iaughed v e W i w Iaughdv iff Iaughdw W e W quot w i W 6 I 71 j gw 2 W N V 7K2 39 w w N N N 333 Wt g3 w i H Answers Bart Lisa Maggie and Burnsquot Constituent questions Who Iaughed v e W w Iaughdv iff Iaughdw W e W r f M 6 2 its 93 its ggz m 1 m Nv T gxt 6w V63 5 673 q lt lt2 q lt Q Is A32 m we A32 Answers No onequot Answer values We get a rough measure of the extent to which p answers Q by inspecting the cells in Q with which p has a nonempty intersection Definition Answer values AnspQq le 4 Example Bart Did Sam laugh Lisa Iaughedsam W 7 Iaughedsam Answer values We get a rough measure of the extent to which p answers Q by inspecting the cells in Q with which p has a nonempty intersection Definition Answer values AnspQq le 4 Example Bart Did Sam laugh Lisa Yes lAnsl 1 Iaughedsam W 7 Iaughedsam Answer values We get a rough measure of the extent to which p answers Q by inspecting the cells in Q with which p has a nonempty intersection Definition Answer values AnspQq le 4 Example Bart Did Sam laugh Lisa No lAnsl 1 Iaughedsam W 7 Iaughedsam Answer values We get a rough measure of the extent to which p answers Q by inspecting the cells in Q with which p has a nonempty intersection Definition Answer values AnspQq le 4 Example Bart Did Sam laugh Lisa I heard some giggling lAnsl 2 Iaughedsa39n W 7 Iaughedsam Over informative answers Ans values are a bit too blunt since Ansp Q Ansp Q wherever p Q p for all questions Q Example Bart ls Sam happy at his new job Lisa happ sam l W 7 happ sam Over informative answers Ans values are a bit too blunt since Ansp Q Ansp Q wherever p Q p for all questions Q Example Bart ls Sam happy at his new job Lisa Yes lAnsl 1 happ sam l W 7 happ sam Over informative answers Ans values are a bit too blunt since Ansp Q Ansp Q wherever p Q p for all questions Q Example Bart ls Sam happy at his new job Lisa Yes and he hasn39t been to jail yet lAnsl l I happ s1 Wilhapp sam A preference ordering Definition Relevance GampS van Rooij p gtQ q iff Ansp Q C Ansq Q or Ansp Q Ansq Q and q C p A preference ordering Definition Relevance GampS van Rooij p gtQ q iff Ansp Q C Ansq Q or Ansp Q Ansq Q and q C p Example In the previous example happysam gt 7happysam happysam no jailsam While their Ans values are the same the first is a superset of the second Example Granularity Example Q Which planet are you from Q Which country are you from 7 Where are you from39 Q Which city are you from Example Granularity Example Q Which planet are you from Q Which country are you from 7 Where are you from39 Q Which city are you from Definition Fine grainedness van Rooy 2004 A question Q is more fine grained than question Q iff QEQ ifFVqEQHq EQ qQQ If Q is more fine grained than Q then an exhaustive answer to Q is more informative than an exhausitive answer to Q Partial answers Ans values bring us part of the way towards understanding relevance implicatures Example Partial answer Torn Which city does Barbara live in Jerry Barbara lives in Russia Ansl gt 1 o lmplicature Jerry doesn39t know which city Partial answers Ans values bring us part of the way towards understanding relevance implicatures Example Partial answer Tom Which city does Barbara live in Jerry Barbara lives in Russia Ansl gt 1 o lmplicature Jerry doesn39t know which city Example Complete answer Tom Which country does Barbara live in Jerry Barbara lives in Russia Ansl 1 0 No implicature about Jerry39s city level knowledge Over answer implicatures When speakers over answer they seem to communicate that the original QUD is not quite appropriate They address their answers to a different QUD Example Bart ls Sam happy at his new job Lisa Yes and he hasn39t been to jail yet Example Sam Do you know what time it is Sue Yes it39s 415 Pragmatic principles Let Q be the question under discussion QUD For speakers Answer Q with a proposition p such that p gtQ p for all p E Doxs Pragmatic principles Let Q be the question under discussion QUD For speakers Answer Q with a proposition p such that p gtQ p for all p E Doxs For hearers Let p be the speaker39s answer For all q such that such that q gtQ p the speaker is not positioned to offer q felicitously o If q C p then the speaker lacks sufficient evidence for q o lfp C q then the speaker is answering a different QUD thereby implicating that Q is not the right QUD Decision problems As a first step towards satisfying our desires for this analysis we39ll inject some decision theory into our pragmatics as a way of making sense primarily of answers Definition Decision problems A decision problem is a structure DP W 5 P5A Us 0 W is a space of possible states of affairs 0 5 is an agent 0 P5 is a subjective probability distribution for agent 5 o A is a set of actions that 5 can take and 0 U5 is a utility function for 5 mapping action world pairs to real numbers Example Schlepp the umbrella Example The decision problem Schlepp o W W1W10 Assume it rains in W1 W7 and not in W3 W10 0 P5W P5Wj for all W W E W o A umbrella no umbrella o Uumbrella wi 2 if it rains in W Uumbrella wi 72 if it doesn t rain in W Uno umbrella wi 78 if it rains in W Uno umbrella wi 4 if it doesn t rain in W rain no rain umbrella noumbrella n 5 is deciding under uncertainty What39s his best move Expected utilities Expected utilities take risk into account when measuring the usefulness of performing an action Definition For decision problem DP W 5 P5A Us the expected utility of an action a E A Euppe Z PW Ua w WSW Solving decision problems Definition Utility value of a decision problem Let DP W 5 P5A Us be a decision problem UVDP ma EUDpa 96 Definition Solving a decision problem Let DP W5 P5A Us be a decision problem The solution to DP is choose a such that EUDpa UVDP Utility value of new information Incoming information might change the decision problem by changing the expected utilities Definition Conditional expected utility Let DP W 5 P5A Us be a decision problem EUDpalP Z PWlp 39 Uaa W WSW Utility value of new information Incoming information might change the decision problem by changing the expected utilities Definition Conditional expected utility Let DP W 5 P5A Us be a decision problem EUDpalP Z PWlp 39 Uaa W WSW Example 0 EUno umbrella 744 o EUno umbrellalw8 W9 W10 4 given no rain 0 EUumbrella 8 o EUumbreIIalWg W9 W10 72 given no rain Changes to the utility value The utility value of new information is a measure of the extent to which it changes the utility value of the decision problem Definition UVDpp Tea UVDpalp 7 UVDP Example For the umbrella example the utility value jumps from 8 to 4 when we learn that it will be sunny Thus UVSchleppW87 W97 W10 32 Action propositions Definition If DP W5 P5A Us is a decision problem and a is an action in A then a W E W i U5a W 2 U5a W for a E A If there is a unique optimal action in every world then A the set of propositions a E A is a partition But we won39t impose this condition Example Visiting Barbara WW1W4 B lives in Moscow W1 B lives in Prague W3 B lives in Petersburg W2 B lives in Budapest W4 0 P5W P5vvj for all W W E W o A ax where X 6 Moscow Petersburg Prague Budapest 0 UaX W 10 if Barbara lives in X in W else UaX W 0 Example Visiting Barbara WW1W4 B lives in Moscow W1 B lives in Prague W3 B lives in Petersburg W2 B lives in Budapest W4 0 P5W P5vvj for all W W E W o A ax where X 6 Moscow Petersburg Prague Budapest 0 UaX W 10 if Barbara lives in X in W else UaX W 0 Action propositions altloscow W1 alirague W3 al etersburg W2 aEudapest W4 Resolving the problem Example Assume the decision problem that Bart39s question gives rise to or the one that gives rise to his question is as before Bart Where does Barbara live W1 W2 W3 W4 Lisa Russia Does not resolve the decision problem Lisa Petersburg Moscow Prague Resolves the decision problem Example Which country WW1W4 B lives in Moscow W1 B lives in Prague W3 B lives in Petersburg W2 B lives in Budapest W4 0 P5W P5Wj for all W W E W 0 A ax where X 6 Russia Czech Budapest o UaX W 10 if Barbara lives in X in W else UaX W 0 Example Which country WW1W4 B lives in Moscow W1 B lives in Prague W3 B lives in Petersburg W2 B lives in Budapest W4 0 P5W P5Wj for all W W E W 0 A ax where X 6 Russia Czech Budapest o UaX W 10 if Barbara lives in X in W else UaX W 0 Action propositions aEussia W17 W2 aEzech W3 apungary W4 Mention some questions Example Craige Roberts I39m working on a rush plumbing project and need some parts Where can one buy faucet washers Mention some questions Example Craige Roberts I39m working on a rush plumbing project and need some parts Where can one buy faucet washers WW177W3 Part5 at Moe 395 Wh W2 Parts at Larry39s W2 W3 0 P5W P5Wj for all W W E W U W1 W2 0 W3 aMoe39s aLarry39s Mention some questions Example Craige Roberts I39m working on a rush plumbing project and need some parts Where can one buy faucet washers WW177W3 Part5 at Moe 395 Wh W2 Parts at Larry39s W2 W3 0 P5W P5Wj for all W W E W altIces W17 W2 aims W27 W3 Domain restrictions Example van Rooy 2003 fold I39m curious about whether I did the right thing What cards do you have7 Domain restrictions Example van Rooy 2003 fold I39m curious about whether I did the right thing What cards do you have7 A choice of domain leads to a choice of granularity for A The order gtA can then home in on the optimal choice Expected utility values of questions EUVDpQ gives the average utility of the members of Q Definition Let DP W5 P5A Us be a decision problem Then EuvaQ Z Psq UVDPq qEQ The following ordering is determined largely by EUV but it resorts to the measure of granularity to resolve ties Definition Q gtDp Q gt EUVDpQ or EuvaQ EuvaQ and Q E Q Pragmatic principles decision theoretically Let DPLJ W 5 PJ7A7 UH be a pair of decision problems differing only in the agent I or J For interrogator I Ask a question Q such that Q pr Q for all Q 6 ppW Pragmatic principles decision theoretically Let DPLJ W 5 PJ7A7 UH be a pair of decision problems differing only in the agent I or J For interrogator I Ask a question Q such that Q qu Q for all Q 6 ppW For the witness J Answer Q with a proposition p such that p gtA p for p E DOXJ Pragmatic principles decision theoretically Let DPLJ W 5 PJ7A7 UH be a pair of decision problems differing only in the agent I or J For interrogator I Ask a question Q such that Q pr Q for all Q 6 ppW For the witness J Answer Q with a proposition p such that p gtA p for p E DOXJ In both cases the speaker39s behavior is shaped by the decision problem Statistical Models of Semantics and Unsupervised Language Discovery Lecture 18 Introduction to Natural Language Processing CMPSCI 585 Fall 2007 Andrew McCallum Computer Science Department University of Massachusetts Amherst Including slides from Chris Manning Dan Klein Rion Snow amp Patrick Pantel Attachment Ambiguity Where to attach a phrase in the parse tree I saw the man With the telescope What does with a telescope modify Is the problem Al complete Yes but Proposed simple structural factors Right association Kimball 1973 low or near attachment early closure of NP Minimal attachment Frazier 1978 depends on grammar high or distant attachment late closure of NP Attachment Ambiguity The children ate the cake with a spoon The children ate the cake with frosting Joe included the package for Susan Joe carried the package for Susan Ford Bresnan and Kaplan 1982 It is quite evident then that the closure effects in these sentences are induced in some way by the choice of the lexical items Lexical acquisition semantic similarity Previous models give same estimate to all unseen events Unrealistic could hope to refine that based on semantic classes of words Examples Susan ate the cake with a durian Susan had never eaten a fresh durian before Although never seen eating pineapple should be more likely than eating holograms because pineapple is similar to apples and we have seen eating apples An application selectional preferences Most verbs prefer arguments of a particular type Such regularities are called selectional preferences or selectional restrictions Bill drove a Mustang car truck jeep Selectional preference strength how strongly does a verb constrain direct objects see versus unknotted Measuring selectional preference strength Assume we are given a clustering of direct object nouns Resnick 1993 uses WordNet Selectional association between a verb and a class V t V PMU A 1 D P y p P t l 107 50 lt Clllll c in l OD PM Proportion that its summand contributes to preference strength Plllm lug lr l lli r bll39l For nouns in multiple classes disambiguate as most likely sense A l39 I max Alu 1 rErlnssu n Selection preference strength made up data Noun class c g P c eat Pcsee Pcfind people 025 001 025 033 furniture 025 001 025 033 food 025 097 025 033 action 025 001 025 001 SPS Sv 176 000 035 Aeat food 108 Afind action 013 Selectional Preference Strength example Resnick Brown corpus Verb v Noun r1 Avn Class Noun r1 Avn Class answer request 449 speech act tragedy 388 communication find iabei 110 abstraction fever 022 psych feature hear story 189 communication issue 189 communication remember repiy 131 statement smoke 020 article of commerce repeat comment 123 communication journal 123 communication read article 680 writing fashion 020 activity see friend 579 entity method 001 method write ietter 726 writing market 000 commerce But how might we measure word similarity for word classes Vector spaces A documentbyword matrix Aquot I CDSI HDHELLII EILStFDHEILJt ITIDCIIFI car INLle d1 1 0 i i n a n i i n n 13 I U D U 0 d4 139 0 0 i T d D O 139 i D d j C II 139 U T But how might we measure word similarity for word classes Vector spaces wordbyword matrix B cusmonaut astrunaut moon car truck c usmonaut 2 i i 0 astronaut U 1 1 I1 0 moan i i 2 i 0 car i D i 3 i truck 0 D D i 2 A modi erabyahead matrix C msmnnaut astronaut moon ar truck Soviet i 0 ID i i American 0 1 ID i i spacewalking i i 0 D ID red 0 i i i full 0 D i D U DICI D 0 ID i i Similarity measures for binary vectors Similarity measure De nition matching coef cient iX 1 Fl ZIX YI lXY Jaccarci coef cient Overlap COEWC39EM minileY cosine 1quot 39X yl Dice coef cient Cosine measure 4 a n X 39 r Ejzlxiyi 4 I n 2 II39 n 2 I Ilyl M39Zi1x iquot7i1 i I nl comic maps vectors Ohm unit Circie by dividing through by lengths I EI 3 2 x2 3 i1 i Example of cosine measure on wordbyword matrix on NYT Focus word Nearest neighlmrs garlic sauce 232 pepper 73923 salt 726 cup 226 fallen fell 932 decline 93 l rise 93 0 drop 92 9 engineered genetically 753 drugs 638 research 687 drug 685 Alfred named 814 Robert 869 William 803 W 803 simple something 964 things 963 You 963 always 962 Probabilistic measures Dissimiiarity measure De nition Ki divergence Dplq Zita10 Skew DiallcxH ll ma JensenShannon was IRad Dipll Email L1 norm Manhattan 21 lm qii Neighbors of word company Lee Skew or 199 118 Euclidean airline business city business airline airline bank rm industry agency loank program Firm state organization department agency lJank manufacturer group system network govt today industry city series govt industry portion Learning syntactic patterns for automatic hypernym discovery Rion Snow Daniel Jurafsky and Andrew Y Ng 39 It has long been a goal of A1 to automatically acquire structured knowledge directly from text eg in the form of a semantic network A small portion of the author39s semantic networkquot Douglas Hofstadter G del Escher Bach 5quot Hulw MidN 7v Mint quot3 quot tth 39 We aim to classify Whether a noun pair X Y participates in one of the following semantic relationships Hypernymy ancestor YZX if X is a kind of Y entity gt0rgam39sm gt person H H Coordinate Terms taxonomic sisters ifXand Y possess a common Y1 X hypemym Le 32 such that C X and Y are both kinds of Zquot horseD dog D cat C C entity Organism 7 person lizard alligator Individual feature analysis X andor olher Y Y such as X such Y as X y including x Y especially x Y like x Y called X X lt Y m ltAVgtltgtEI X L x n y nppnsiuvc Precision a c a 10quot Recall log 39 Precisionrecall for 69592 classi ers one per feature 39 Classi erfclassi es noun pair x as hypernym i f r gt 0 39 In red patterns originally proposed in Hearst 1992 Oxygen is the most abundant element on the moonquot Dependency Graph Dependency Paths for oxygen I elementquot NsVBE be VBEpredN NsVBE be VBEpredNtheDetdetN NsVBE be VBEpredNmostPostDetpostN NsVBE be VBEpredNabundantAmodN NsVBE be VBEpredNonPrepmodN Rediscovering Hearst s Patterns Y such as X P W39Enp Such Y as X such NijN NJnod39A Ema XandotherY i O and Proposed in Hearst 1992 and used in Caraballo 2001 Wm 2003 and others but what about the rest of the lem syntactic pattern space Example Using the Y called X Pattern for Hypernym Acquisition MINIPAR path g g gguggu3 ygglbl gt lthypernymgt called39 lthyponymgt None Ui 39 wuuumei u we uaininy at by extension Hy39ponym Hypernym Sentence Fragment ef oresccnce condition and a condition called ef orescence W company The company now called O39Ncal lnc WW ranch run a small ranch called the Hat Creek Out t problem ineversible problem called My hivl W infected by the AIDS Virus called HIV1 a raction sightseeing attraction called the Bateau WWW Israeli collective farm called Kibbutz Type of Noun Pair Count Example Pair NE Person 7 quotJohn F Kennedy presidentquot Marlin Fitzwater spokesmanquot quotDiamond Bar cityquot France place American Can companyquot Simmons companyquot 15 Elvis Alive book Earthquake disasterquot soybean cropquot NE Place NE Company 1239 0th Dt Nq N er Not Named Entity A better hypernym classi er Hypemym classi ers on WordNet labeled dev set 19 Logistic Regression Buckets 0 8 Logistic Regression Binaiy X Hearst39s Patterns 07 ConjunctOther Pattern g 06 quot3 05 0 A 14 03 02 01 x O 01 02 O3 04 05 06 O7 08 09 Recall 39 10 fold cross validation on the WordNet labeled data Conclusion 70000 features are more powerful than 6 VERBOCEAN Mining the Web for FineGrained Semantic Verb Relations Timothy Chklovski and Patrick Pantel Why Detect Semantic Rels mm between Verbs So that we can Understand the relationship when it s not stated Napoleon fought and won the batt e During the holidays people wrap and unwrap presents Soldiers prefer to avoid getting wounded and killed Use the relationship when summarizing across documents eg same event preceding event The board considered the offer of 33 The board accepted the offer 3185 The board akayed the offer of approximately 43 Determine if two people have similar views on and event l nudged himquot He shoved mequot Hard to do manually Why use Web Motivating W m Intuition Small collections are tough Semantics is often implied Lenat Chklovski The Web s 1012 is a lot of words 80 Use small bits of more detailed text to help with mass of general text Patterns issued to a search engine and their correlation 0 Relevant Work whhmm Levin s classes similarity 3200 verbs in 191 classes PropBank 4659 framesets 14 framesets per verb VerbNet 191 coarsegrained groupings with overlap FrameNet MW WordNet I troponomy mommaX vaguinmy Emmaf Kass antonymy 225335 mics Mm entailment Fellhaum s 1993 antailmenthiemchy 39 cage a VerbOcean Webbased WWW Extraction of Verb Relations VerbOcean is a network of verb relations Currently over 3400 nodes with on average 13 relations per verb Detected relation types are similarity strength antonymy enablement temporal precedence happensbefore 7 Download fromjttgj39lgemanticsisieduoceanl Approach Three stages Identify pairs of highly associated verbs cooccurring on the Web with sufficient frequency using DIRT Lin and Pantel 2001 For each verb pair test patterns associated with each semantic relation Eg Temporal Precedence toX and then Yquot Xed and then Yed calculate a score for each possible semantic relation Compare the strengths of the individual semantic relations and output a consistent set as the nal output rip Team mostspeei eand then strongest relations Lexical Patterns maier SEMAND C REL1 TIUN Surface Patterns Example similarity 4 Li M quotShe heckled and taunted the comedian strength 8 3 2131st He nutjust harassed but terrurized her quot no just Xed but Yeti ed by Ying the enablemem 4 Xed by Ying or quotShe saved the ducument by slitking the button m X quot by Ying the amonymy 7 eiLh either X5 or Ys Xed but Yed quotThere 39s samelhing about Mary yuuwill either nve or hate er happensbefore 12 to X and then Y Xed and then Yed to X and later Y t X and subsequently Y Xed and subsequently Yed quotHe designed thepmtotype and then patented it D Lexical Patterns Match Refined to decrease capturing wrong parts of speech or incorrect semantic relations Xed by Ying the Xed by Ying or waved at by parking guard encouraged further by sailing lessons mfg VerbOcean Similarity httpsemantichSI eduocean Verbs that are similar or related eg boo heckle VerbOcean Strength INTIu hm Similar verbs that denote a more intense thorough comprehenswe or absolute action eg changeof state verbs that denote a more complete change shock 9 startle VerbOcean Antonymy Semantic opposition switching thematic roles associated with the verb buy sell stative verbs live die sibling verbs which share a parent walk run t39t t39 39t39 t h b f ve oppoel Ion an onymy appens e ore damage4e VerbOcean Enablement 1471 haw Holds between two verbs V1 and V2 when the pair can be glossed as V1 is accomplished by V2 assess LeMiew Appendix Sample relations extracted by our system SEMANTIC SEMANTIC SEMANTIC EXAMPLES EWPLES EXAMPLES LATION LATIDN LATION maxumzexenhance assessreview ha ens detain prosecute similarity producezwreate enablement accomplishquotcom pp enroll graduate before L J doublechek permi authorize 16 strength surprise startle startle shock autonymy Topic Models Unsupervised Models of Word Co occurrences A Probabilistic Approach 0 Define a probabilistic generative model for documents 0 EH E a a Learn the parameters of this model by fitting them to the data and a prior Ill Clustering words into topics with Latent Dirichlet Allocation e V 2 Blei Ng Jordan 2003 Generative Process Exam 3991 For each document 70 Iraq war 30 US election Sample a distribution over topics 0 For each word in doc Sample a topic 2 Iraq War Sample a word b b u from the topic w om mg Example topics induced from a large collection of text DISEASE WATER BACTERIA FISH DIS EAS ES S EA GERMS SWIM FEVER SWIMMIN CAUSE POOL CAUSED LIKE SPREAD SHELL VIRUSES HARK INFECTION TANK HELLS MICROORGANISMS SHARKS PERSON DIVING INFECTIOUS DOLPHINS SWAM CAUSING LONG SMALLPOX SEAL BODY DIVE INFECTIONS DOLPHIN CERTAIN UNDERWATER MIND STORY FIELD SCIENCE BALL JOB WORLD STORIES MAGNETIC STUDY GAME WORK DREAM TELL MAGNET SCIENTISTS TEAM JOBS DREAMS CHARACTER WIRE SCIENTIFIC FOOTBALL CAREER THOUGHT CHARACTERS NEEDLE KNOWLEDGE BASEBALL EXPERIENCE IMAGINATION AUTHOR CURRENT WORK PLAYERS EMPLOYMENT MOMENT READ COIL RESEARCH PLAY OPPORTUNITIES THOUGHTS TOLD POLES CHEMISTRY FIELD WORKING OWN SETTING IRON TECHNOLOGY PLAYER TRAINING REAL TALES COMPASS MAN BASKETBALL SKILLS LIFE PLOT LINES MATHEMATICS OAC REERS IMAGINE TELLING CORE BIOLOGY PLAYED POSITIONS SENSE SHORT ELECTRIC FIELD PLAYING FIND CONSCIOUSNES S FICTION DIRECTION PHYSICS HIT POSITION STRANGE ACTION FORCE LABORATORY TENNIS FIELD FEELING TRUE MAGNETS STUDIES TEAMS OCCUPATIONS WHOLE EVENTS BE WORLD GAMES REQUIRE BEING TELLS MAGNETISM SCIENTIST SPORTS OPPORTUNITY MIGHT TALE POLE STUDYING BAT HOPE NOVEL INDUCED SCIENCES TERRY ABLE Tennenbaum et al Example topics induced from a large collection of text DISEASE WATER BACTERIA FISH DIS EAS ES S EA GERMS SWIM FEVER SWIMMIN CAUSE POOL CAUSED LIKE SPREAD SHELL VIRUSES HARK INFECTION TANK HELLS MICROORGANISMS SHARKS PERSON DIVING INFECTIOUS DOLPHINS SWAM CAUSING LONG SMALLPOX SEAL BODY DIVE INFECTIONS DOLPHIN CERTAIN UNDERWATER MIND STORY FIELD SCIENCE BALL JOB WORLD STORIES MAGNETIC STUDY GAME WORK DREAM TELL MAGNET SCIENTISTS TEAM JOBS DREAMS CHARACTER WIRE SCIENTIFIC FOOTBALL CAREER THOUGHT CHARACTERS NEEDLE KNOWLEDGE BASEBALL EXPERIENCE IMAGINATION AUTHOR CURRENT WORK PLAYERS EMPLOYMENT MOMENT READ COIL RESEARCH PLAY OPPORTUNITIES THOUGHTS TOLD POLES CHEMISTRY FIELD WORKING OWN SETTING IRON TECHNOLOGY PLAYER TRAINING REAL TALES COMPASS MAN BASKETBALL SKILLS LIFE PLOT LINES MATHEMATICS OAC REERS IMAGINE TELLING CORE BIOLOGY PLAYED POSITIONS SENSE SHORT ELECTRIC FIELD PLAYING FIND CONSCIOUSNES S FICTION DIRECTION PHYSICS HIT POSITION STRANGE ACTION FORCE LABORATORY TENNIS FIELD FEELING TRUE MAGNETS STUDIES TEAMS OCCUPATIONS WHOLE EVENTS BE WORLD GAMES REQUIRE BEING TELLS MAGNETISM SCIENTIST SPORTS OPPORTUNITY MIGHT TALE POLE STUDYING BAT HOPE NOVEL INDUCED SCIENCES TERRY ABLE Tennenbaum et al Collocations An expression consisting of two or more words that correspond to some conventional way of saying things Characterized by limited compositionality compositional meaning of expression can be predicted by meaning of its parts dynamic programming hidden Markov model weapons of mass destruction kick the bucket hear it through the grapevine Topics Modeling Phrases Topics based only on unigrams often difficult to interpret Topic discovery itself is confused because important meaning distinctions carried by phrases Significant opportunity to provide improved language models to ASR MT IR etc Topical Ngram Model Wang McCallum 2005 LDA algorithms algorithm gene c problems efficient LDA Topic Tooical NCIrams genetic algorithms genetic algorithm evolutionary computation evolutionary algorithms fitness function LDA learning optimal reinforcement state problems policy dynamic action programming actions function markov methods decision rl con nuous spaces step policies planning reinforcement learning optimal policy dynamic programming optimal control function approximator prioritized sweeping finitestate controller learning system reinforcement learning rl function approximators markov decision problems markov decision processes local search stateaction pair markov decision process belief states stochastic policy action selection upright position reinforcement learning methods Topic Comparison Topical Nqrams 2 Topical Nqrams 1 policy action states actions function reward control agent qlearning optimal goal learning space step environment system problem steps sutton policies LDA motion visual field position figure direction fields eye location retina receptive velocity vision moving system flow edge center light local Topic Comparison Topical Nqrams 2 Topical Nqrams 1 receptive field spatial frequency temporal frequency visual motion motion energy tuning curves horizontal cells motion detection preferred direction visual processing area mt visual cortex light intensity directional selectivity high contrast motion detectors spa alphase moving stimuli decision strategy visual stimuli motion response direction cells stimulus figure contrast velocity model responses stimuli moving cell intensity population image center tuning complex directions LDA word system recognition hmm speech training performance phoneme words context systems frame trained speaker sequence speakers mlp frames segmentation models speech recognition training data neural network error rates neural net hidden markov model feature vectors continuous speech training procedure continuous speech recognition gamma filter hidden control speech production neural nets input representation output layers training algorithm test set speech frames speaker dependent Topic Comparison Topical Nqrams 2 Topical Nqrams 1 speech word training system recognition hmm speaker performance phoneme acous c words context systems frame trained sequence phone c speakers mlp hybnd Unsupervised learning of topic hierarchies Blei Griffiths Jordan amp Tenenbaum NIPS 2003 the of neurons mm algorithm learning cm mm quotmm method 3mm 1110mm nc rcspamc mocming proban on Cc recognition comroL ncumiL rcinfurccmcm 39 chm39mcicr learning 11L ward my inpuI 39stcm smi i cln ion anion rurc 5mm chm3cm 5 value 5 nzzpacs design phoned opumui p may 1 iraimng Joint models of syntax and semantics Griffiths Steyvers Blei amp Tenenbaum NIPS 2004 Embed topics model inside an nth order Hidden Markov Model Documentspecific distribution over topics nem39ol k w images image r kernel output r r M objects neural network 1 m Semantic classes FOOD MAP FOODS NORTH B OD EARTH NUTRIENTS SOUTH DIET POLE FAT MAPS SUGAR EQUATOR ENERGY WEST MIL INES EATI AST FRUITS AUSTRALIA VEGETABLES GL OBE IGHT POLES FATS HEMISPHERE NE DS CARBOHYDRATES PLACES VITAMINS AND CALORIES WORLD COMPASS RALS CONTINENTS DOCTOR PA39IIENT HEALTH HOSPITAL MEDICAL CARE PATIENTS NURSE DOCTORS MEDICINE PHYSICIAN HOSPITALS DR SICK ASSISTANT EMERGENCY PRACTICE um bum BOOK GOLD BEHAVIOR BOOKS IRON ADI SILVER INDIVIDUAL INFORMA39IION COPPER PERSONALITY IB ME RESPONSE REPORT METALS OCI PAGE STEEL EMOTIONAL TITLE CLAY LEARNING SUBJECT LEAD EELI PAGES ADAM PSYCHOLOGISTS GUIDE I ALS WORDS ALUMINUM PSYCHOLOGICAL MA39IERIAL MINER EXPERIENCES AR39IICL MINE ENVIRONMENT ARTICLES STONE M WORD MINERALS RESPONSES AC S OT BEHAVIORS AUTHOR MINING 39IITUDES REFERENCE MINERS PSYCHOLOGY NO39IE TIN PERSON CELLS CELL ORGANISMS ALGAE BAC39IERI A MICROSCOPE MEMBRANE ORGANISM LIVING FUNGI GREEN MOLDS II I u bum Syntactic classes SAID THE MORE ON GOOD ONE HE BE ASKED HIS SUCH AT SMALL SOME YOU AKE THOUGHT THEIR LESS INTO NEW MANY THEY GET TOLD YOUR MUCH FROM IMPORTANT TWO I HAVE SAYS HER KNOWN WITH GREAT EACH SHE GO ME S ITS JUST THROUGH LITTLE ALL WE TAKE CALLED MY BETTER OVER LARGE MOST IT DO E OUR RATHER AROUND ANY PEOPLE FIND SHOWS THIS GREATER AGAINST BIG THREE EVERYONE USE ANSWERED THESE ACROSS LONG THIS OTHERS SEE LLS A UPON HIGH EVERY SCIENTISTS HELP REPLIED AN LONGER TOWARD DIFFERENT SEVERAL SOMEONE KEEP SHOUTED THAT F S UNDER SPECIAL FOUR wHo GIVE EXPLAIN39ED NEW EXACTLY ALONG OLD FIVE NOBODY L OOK LAUGHED THOSE SMALLER NEAR STRONG BOTH ONE COME MEANT EACH SOMETHING BEHIND YOUNG TEN SOMETHING WORK WROTE MR BIGGER OFF COMMON SIX ANYONE MOVE SHOWED ANY FEWER ABOVE WHI TE MUCH EVERYB ODY LIVE BELIEVED MRS L OWER DOWN SINGLE TWENTY SOME EAT WHISPERED ALL ALMOST BEFORE CERTAIN EIGHT THEN BECOME Semantics Syntax Corpusspecific factorization NIPS image m stale membrane oxpcm mm nemmk imagm gaussiun pom ynup c cxpcn mppnrt neural 0 1m mlxuuc umc ccH gluing mm ncmmks objcus hkelihcod mcnon r hmc y output femurs po clim union m zuch39uccmrc kernels mm mman mm rcmfor cmcm denmim mixnnc m mmg Vi Jimmie lcarning potenm lemming 51mm input cm cm ncumn we mixturm mc on vmghts buycmn uptimal conducmncc func nn machine paramm ers U ml w gm m ompuh in n HgtCJ mm c ncmmks houcwr mm as main d algm39hhm mum also gtlt m obtained 3 E Cm msuhs um i on become emu cm dc hus x from dcnmcx ghcn prnhlcm hcmfmv x at being found cm39ork m n using renmim cd plescmcd mum ham 7 into represents name do nm approach no a ver mm dcmibc gcncmtcd puprr have 1 whhin mm sugch Show pmcm mm p Syntactic classes in PNAS 14 25 IN ARE THE SUGGEST FOR WERE THIS INDICATE ON WAS ITS SUGGESTING BETWEEN IS THEIR SUGGESTS DURING WHEN AN SHOWED AMONG REMAIN EACH REVEALED FROM REMAINS ONE SHOW UNDER REMAINED ANY DEMONSTRATE WITIHN PREVIOUSLY INCREASED INDICATING THROUGHOUT BECOME EXOGENOUS PROVIDE THROUGH BECAME OUR SUPPORT TOWARD BEING RECOMBINAN T INDICATES INTO BUT ENDOGENOUS PROVIDES AT GIVE TOTAL INDICATED INVOLVING IVIERE PURIFIED DEMONSTRATED AFTER APPEARED TILE SHOWS ACROSS APPEAR FULL SO AGAINST ALLOWED CHRONIC REVEAL WHEN NORMALLY ANOTHER DEMONSTRATES ALONG EACH EXCESS SUGGESTED 26 30 LEVELS RESULTS NUMBER ANALYSIS LEVEL DATA RATE STUDIES TIME STUDY CONCENTRATIONS FINDINGS VARIETY EXPERIMENTS RANGE OBSERVATIONS CONCENTRATION HYPOTHESIS DOSE ANALYSES FAMILY ASSAYS SET POSSIBILITY FREQUENCY MICROSCOPY SERIES PAPER AMOUNTS WORK RATES EVIDENCE CLASS FINDING VALUES MUTAGENESIS AMOUNT OBSERVATION SITES MEASUREMENTS Semantic highlighting Darker words are more likely to have been generated from the topicbased semantics module network aciiv39ny i umc pucc niuxmmx ii i i i input am excitability pzu iominpoi39aHsiel liiicgmllon l ilml Hxlllhxllllt w i u i 1m feed foiwaxd i erm fcuiback adapihe neural networle i iwimi convergence i i w so nsxignzlgoiiihm idoubi siocliasiicmanix i i H i imam i i i loiiblysioclmstici i i i 1 motile H i i i i ponfolio leexp i risk level i time horizon i instminonul lvgull iiiiiiiilii u i i i ii iuliii Mimi g wisamplcs u A i gucst ii H i 1m Social Network Analysis with Links and Text Role Discovery Group Discovery Trend Discovery Community Discovery Impact Measurement From LDA to AuthorRecipientTopic ART Latent Dirichlet Allocation LDA BchNgJordnn ZUOEJ o Inference and Estimation 1V 16 x ers Wd nF I rd H1141391171rl5dn 0411139d Iu39zlu ltbdn 71 g Gibbs Sampling Easy to implement f9 Reasonably fast 71quot 3 713 b O E 4 P zxw Olt r z 139 quot 39 2722 Mini ii 2 n5 139 H riz new olt W 22113 23 83 V Z Enron Email Corpus 250k email messages 23k people Date Wed 11 Apr 2001 065600 0700 PDT From debraperlingiereenroncom To stevehooserenroncom Subject EnronTransAltaContract dated Jan 1 2001 Please see below Katalin Kiss of TransAlta has requested an electronic copy of our final draft Are you OK with this If so the only version I have is the original draft without revisions DP Debra Perlingiere Enron North America Corp Legal Department 1400 Smith Street EB 3885 Houston Texas 77002 dperlinenroncom Topics and prominent senders receivers We discovered by ART y hand T 5 Topic 17 To 27 Topic 45 Legal Contracts Document Review Time Scheduling Sports Pool section 00299 attached 00742 ay 00419 game 00170 party 00265 agreement 00493 friday 00418 draft 00156 language 00226 review 00340 morning 00369 week 00135 contract 00203 questions 00257 monday 00282 team 00135 date 00155 draft 00245 of ce 00282 eric 00130 enron 00151 letter 00239 wednesday 00267 make 00125 parties 00149 comments 00207 tuesday 00261 free 00107 notice 00126 copy 00165 time 00218 year 00106 days 00112 revised 00161 good 00214 pick 00097 include 00111 document 00156 thursday 00191 phillip 00095 MHain 00549 GNemec 00737 JDasovich 00340 EBass 03050 J Steffes B Tycholiz RShapiro MLenl1art JDasovich 00377 GNemec 00551 JDasovich 00289 EBass 00780 R Shapiro MWhitt JSteffes PLove DHyv1 00362 BTycholiz 00325 CC1air 00175 MMotley 00522 KWard GNemec MTaylor MGrigsby Topics and prominent senders I receivers discovered by ART Topic 34 Topic 37 Tupic 41 Tupic 42 Operations Power Market t nvernmenf Rel itions Wireless operations 00321 market 0 05 67 state 00404 blackberry 0 0726 team 00234 pnwer 00563 California 00367 net 00557 0 e 0 0173 rice 00280 nwer 00337 www 00409 list 00144 system 00206 energy 00239 website 00375 bob 00129 prices 00182 electricity 00203 report 00373 open 00126 high 00124 dav 00183 wireless 00364 meeting 0 0107 based 00120 utiliti 00158 handheld 00362 gus 00107 bu 39 00117 commission 00136 stan 0282 business 00106 customers 001 10 gm39emor 00132 tyi 00271 hollstnn 0009 costs 00106 prices 00089 named 0 0260 SBeek 02158 JDasovicl 01231 JDasovich 03338 RH2ylett 01432 LKitehen JStc es RShapir0 TGeaccone SBeck 00826 Dasovich 01133 lD ASOVlClI 02440 TGeaccone 00737 JL1vorato RShupirn rsre es Haylett SBeck 00530 MTaylor 00218 lD ASOVlClI 01394 RHaylett 00420 SW1ute ESuger RSanders DFnsslm Beck Chief Operations Officer Dasovich Government Relations Executive Shapiro Vice President of Regulatory Affairs Steffes Vice President of Government Affairs Comparing Role Discovery Traditional SNA ART AuthorTopic 1 z 3 4 gt4 n w 9101121414115 connection strength AB distribution over distribution over distribution over reelplents authored topics authored topics Comparing Role Discovery Tracy Geaconne c Dan McCarty Traditional SNA AuthorTopic 1 z 3 4 gt4 5 w swimwear Similar roles Different roles Different roles Geaconne Secretary McCarty Vice Presidentquot Comparing Role Discovery Lynn Blair gt Kimberly Watson Traditional SNA ART AuthorTopic l 5 E u 1 z 3 4 gt4 n w swimwear Different roles Very similar Very different Blair Gas pipeline logisticsquot Watson Pipeline facilities planning McCallum Email Corpus 2004 January October 2004 23k email messages 825 people From katecsumassedu Subject NIPS and Date June 14 2004 22741 PM EDT To mccallumcsumassedu There is pertinent stuff on the first yellow folder that is completed either travel or other things so please sign that first folder anyway Then here is the reminder of the things I39m still waiting for NIPS registration receipt CALO registration receipt Thanks Kate McCallum Email Blockstructure Four most prominent topics in discussions with 7 1o ic 5 Topic 31 1mm 3 391 41 9 ram Proposalsquot Meeting Setup ML Mudels Friendly Discoursequot proposal 00397 today 0051 mod 1 004 great 00516 1111111 0310 tomorrow 00454 1616 5 004 good 00393 budget 00289 11 me 00413 Inference 00191 don 00223 work 00245 0 0391 conditional 00181 sounds 0 0219 year 00238 meeting 0 0339 methods 00144 work 00196 glenn 0022 week 025 number 00136 wishes 0 0182 nsf 00209 11111 0024 sequence 00121 11 k 0175 project 00188 meet 00233 learning 00126 interesting 0 0168 sets 00157 moming 0 02211 graphical 00121 mm 0162 suppon 00156 monday 0 02011 random 00121 hear 00132 Topic 5 Tupic 31 Topic 33 Topic 41 Grant Proposals Meeting Setup ML Models Friendly Discoursequot proposal 0 197 Inday 00 model 00479 great 00516 11m 310 mmormw 00454 model 00444 good 00393 budget 00289 ime 00413 Inference 00191 don 00223 ark 00245 Condilinndl 00181 sounds 00219 11 023 8 meeting method 5 00144 work 00196 glam 00225 wee um 13 1 00136 wishes 00182 nsf 0209 talk xeqllence 00126 11 39 00175 project 001 8 meet learnmg 00126 interesling 00168 e15 0157 morning graphical 00171 lime 00162 suppon 00156 mnnday mndnm 0012 hear 00132 Iyth 01290 101113 0115111101 00498 mccullum 0 0558 nmcullum mccullum mcca um culmm mccullum 00746 w Her 0 0314 icn1104rwebudmin 0 0366 mccullllm 00530 5 39el mccallum icn1104rcll1lirs c mm mcc1111um 00739 cusutmn 00217 mccal1um 00343 mccullllm 00274 eny ccallum 1 sun mcc1111um 00532 mcc dllllm 00200 n1p504wmk ow 0 0322 mccullllm 00255 Iyll l cusutmn mcca um s11 11 e139s pereim 00339 11100111111111 00200 weinman 00250 mecallum 00181 la eny wellner mcca un ere1ra Two most prominent topics in discussions with Topic 1 Words Prob love 0030514 house 0015402 0013659 time 0012351 great 0011334 hope 0011043 dinner 000959 saturday 0009154 left 0009154 ll 0009009 0008282 visit 0008137 evening 0008137 stay 0007847 bring 0007701 weekend 0007411 road 000712 sunday 0006829 kids 0006539 flight 0006539 7 Topic 2 Words Prob today 0051152 tomorrow 0045393 time 0041289 ll 0039145 meeting 0033877 week 0025484 talk 0024626 meet 0023279 morning 0022789 monday 0020767 back 0019358 call 0016418 free 0015621 home 0013967 won 0013783 day 001311 hope 0012987 leave 0012987 office 0012742 tuesday 0012558 RoleAuthorRecipientTopic Models Model 1 Model 2 Model 3 RARTI RARTZ RARTS 3 0 g a a o o o M A A o a w e g o g lt T N T a T M D D D Results with RART People in Role 3 in Academic Email 010 gauthier irsystem system allan valerie tech steve lead Linux sysadmin sysadmin for CIIR group mailing list CIIR sysadmins mailing list for dept sysadmins Prof chair of computing committee second Linux sysadmin mailing list for dept hardware head of dept T support Roles for allan James Allan Role 3 Role 2 LTsuppon Natural Language researcher Roles for pereira Fernando Pereira Role 2 Role 4 Role 6 Role 10 Role 8 Natural Language researcher SRI CALO project participant Grant proposal writer Grant proposal coordinator Guests at McCaIIum s house ART Roles but not Groups Traditional SNA AuthorTopic 1 z 3 4 gt4 5 w swimwear Block structured Not Not Enron TransWestern Division Social Network Analysis with Links and Text Role Discovery Group Discovery Trend Discovery Community Discovery Impact Measurement Groups and Topics Input Observed relations between people Attributes on those relations text or categorical Output Attributes clustered into topics Groups of peoplevarying depending on topic Adjacency Matrix Representing Relations Student Roster Academic Admiration Adams Bennett Carter Davis Edwards Frederking AcadA B AcadC B AcadA D AcadC D AcadB E AcadD E AcadB F AcadD F AcadE A AcadF A AcadE C AcadF C A n rn Uowgt TilTlUOUJgt Group Model Partitioning Entities into Groups Stochastic Blockstructures for Relations Nowicki Snijders 2001 lt5 39 omial Dirichlet 8 number of entities y G2 5 G number of groups Binomial Enhanced with arbitrary number of groups in Kemp Griffiths Tenenbaum 2004 Two Relations with Different Attributes Student Roster Academic Admiration Social Admiration Adams Bennett Carter Davis Edwards Frederking AcadA B AcadC B AcadA D AcadC D AcadB E AcadD E AcadB F AcadD F AcadE A AcadF A AcadE C AcadF C SociA B SociA D SociA SociB A SociB C SociB SociC B SOCIC D SociC SociD A SociD C SociD SociE B SociE D SociE SociF A SociF C SociF Dirichlet 9 The GroupTopic Model Discovering Groups and Topics Simultaneously ulti Beta 3 61 G no omial Di T Binomial richlet 409 Dataset 1 US Senate 16 years of voting records in the US Senate 1989 2005 a Senator may respond Yea or Nay to a resolution 3423 resolutions with text attributes index terms 191 Senators in total across 16 years S543 Title An Act to reform Federal deposit insurance protect the deposit insurance funds recapitalize the Bank Insurance Fund improve supervision and regulation of insured depository institutions and for other purposes Sponsor Sen Riegle Donald W Jr MI introduced 351991 Cosponsors 2 Latest Major Action 12191991 Became Public Law No 102242 Index terms Banks and banking Accountinq Administrative fees Cost control Credit Deposit insurance Depressed areas and other 110 terms Adams DWA Nay Akaka DHI Yea Bentsen DTX Yea Biden DDE Yea Bond RMO Yea Bradley DNJ Nay Conrad DND Nay Topics Discovered US Senate Mixture of Unigrams GroupTopic Model Education Energy Mn Economic education energy government federal school power military labor aid water foreign insurance children nuclear tax aid drug gas congress tax students petrol aid business elementary research law employee prevention pollution policy care EdicaionForeign Economic Social Security Domestic Medicare education foreign labor soc1al school trade insurance security federal chemicals tax insurance aid tariff congress medical government congress income care tax drugs minimum medicare energy communicable wage disability research diseases business assistance Groups Discovered US Senate Groups from topic Education Domestic Group 1 73 Republicans K1 uegerD TX ramp 2 90 Democrats ChafeeL KARI JeH39ordsI VT CohenRiME DaufoulR MO D111 eube139gerR MN HaLIieldR R HeinzRrPA Jeffox39dsRVT KassebauumRiKS PackwoodR R SpecterRrPA SuoweRAME CollinsRiME Group 4 AnnstrongR CO GamR UT Humphre RiNH FitzgeraldR IL VoinovichR 0H mnemn 1A Colemal RiMN Senators Who Change Coalition the most Dependent on Topic Senator Group Switch Index 06182 He inD AL 06049 VoinovichR OH 06012 J0hnstonD LA 05878 ArmstrongR CO 05747 eg Senator Shelby DAL votes with the Republicans on Economic with the Democrats on Education Domestic with a small group of maverick Republicans on Social Security Medicaid Dataset 2 The UN General Assembly Voting records of the UN General Assembly 1990 2003 A country may choose to vote Yes No or Abstain 931 resolutions with text attributes titles 192 countries in total Also experiments later with resolutions from 19602003 Vote on Permanent Sovereignty of Palestinian People 87th plenary meeting The draft resolution on permanent sovereignty of the Palestinian people in the occupied Palestinian territory including Jerusalem and of the Arab population in the occupied Syrian Golan over their natural resources document A54591 was adopted by a recorded vote of 145 in favour to 3 against with 6 abstentions In favour Afghanistan Argentina Belgium Brazil Canada China France Germany India Japan Mexico Netherlands New Zealand Pakistan Panama Russian Federation South Africa Spain Turkey and other 126 countries Against Israel Marshall Islands United States Abstain Australia Cameroon Georgia Kazakhstan Uzbekistan Zambia Topics Discovered UN Everything Human Rights Security Nuclear in Middle East Mquot ture 0f nuclear rights occupied Un39grams weapons human israel use palestine syria implementation situation security countries israel calls Nuclear Nuclear Arms Human Rights Nonproliferation Race GroupTopic nuclear nuclear rlghts Model states arms human united prevention palestine weapons race occupied nations space israel NuclearArsenal HumanRights NudearAvmsRace ng Groups R 1 r 0 at as human arms Discovered H mm 1mm U N P wmpmts m uple I m n 21 mm The coumries St for ea h lulumhm Mn ucu group are ordered by their 1 1qu ulumbm 2005 GDP PPP and only 5 Peru CW Vane 01A countries are shown In 7 7 mi 397 groups that have more than Mm 5 members 39 chmmw U 39 m m Jam x nor my Ian U 39 m n i Immater 39lmum Indm 4 Yugoslavia Indonesia Amt mum Tlnulaml ptus Pluhppum i Thailand clams Philippine Turkmmmmn Mnlavsm Azerbauan ngm39m Uruguay Tum a Kyrgyzstan Groups and Topics Trends over Time UN Time Pe od Topic 1 Tom 2 cpl m p3 Grown Gmup Nucl szeduve Mm Indep Argrmma karv m mmech L allmg Columynn mnnndnmm ngm 39hlln H Iml y nmnblv mm Venuzucla Bulgaria dmdmg so mml K lat m as gum 35 US mmm budget Palm 1 Japan awn Indepnndcme nppmmnm Hungan UK m conumnnan nnarnm mnal Bulgm n mm o manna weapom elem Rah Wea Israal Rn39gh cs len UsA Haul lnrha imam 7 wwl 7911110 Intlaucsxa pan rkq USSR Imomancnal mamum mmom m UK Edmund rolnnd UN lmhxun mud Th2 and Ban 9 fialomlua Viennm human polllug ngm Phlhppums m1 39lulr sraePa D Mo lgcua Cluna 1mm 7mm 7 we 7 1ndunc Victnmn 1mm 7mm afmn In n n q Argmnum hracl Thailand Syua cxlmnbx ngxn Phlhppuw myn 39hllu Disarmament I SA Inpan nnmnnln Malawx mm 15ml UK 31 Vmccm L39 in Buncr Dommman dls arnmmam palcstmn 15 mi Italy mtmnauuml mm omupmrl aumla ht IsraelP Poland Us his Camemou uutlmu nglm 5501 Lmh U Japan Argmnnm range x641 wapona nnnnn 1m tum Hungaw UK ltmum wer undamsnml oct upl v 1 Bulgmla Tram 0 13913111 Libvvm Imenmucnnl I39medmm disarmament Albaum 1m an istribuuons for To 3 2 Social Network Analysis with Links and Text Role Discovery Group Discovery Trend Discovery Community Discovery Impact Measurement Groups and Topics Trends over Time UN Time Pe od Topic 1 Tom 2 cpl m p3 Grown Gmup Nucl szeduve Mm Indep Argrmma karv m mmech L allmg Columynn mnnndnmm ngm 39hlln H Iml y nmnblv mm Venuzucla Bulgaria dmdmg so mml K lat m as gum 35 US mmm budget Palm 1 Japan awn Indepnndcme nppmmnm Hungan UK m conumnnan nnarnm mnal Bulgm n mm o manna weapom elem Rah Wea Israal Rn39gh cs len UsA Haul lnrha imam 7 wwl 7911110 Intlaucsxa pan rkq USSR Imomancnal mamum mmom m UK Edmund rolnnd UN lmhxun mud Th2 and Ban 9 fialomlua Viennm human polllug ngm Phlhppums m1 39lulr sraePa D Mo lgcua Cluna 1mm 7mm 7 we 7 1ndunc Victnmn 1mm 7mm afmn In n n q Argmnum hracl Thailand Syua cxlmnbx ngxn Phlhppuw myn 39hllu Disarmament I SA Inpan nnmnnln Malawx mm 15ml UK 31 Vmccm L39 in Buncr Dommman dls arnmmam palcstmn 15 mi Italy mtmnauuml mm omupmrl aumla ht IsraelP Poland Us his Camemou uutlmu nglm 5501 Lmh U Japan Argmnnm range x641 wapona nnnnn 1m tum Hungaw UK ltmum wer undamsnml oct upl v 1 Bulgmla Tram 0 13913111 Libvvm Imenmucnnl I39medmm disarmament Albaum 1m an istribuuons for To 3 2 Want to Model Trends over Time Pattern appears only briefly Capture its statistics in focused way Don t confuse it with patterns elsewhere in time ls prevalence of topic growing or waning How do roles groups influence shift over time Topics over Time TOT Wang McCallum KDD 2006 Ex Dirichlet multinomia39 over topics Dirichlet UnIform prior prior topic index time stamp 4 w T T Nd Multinomial Beta over words D over time State of the Union Address 208 Addresses delivered between January 8 1790 and January 29 2002 To increase the number of documents we split the addresses into paragraphs and treated them as documents Oneline paragraphs were excluded Stopping was applied 17156 documents 21534 words 669425 tokens Comparing TOT against LDA Mexican War Panama Canal Cold Wai um in wnu win 770 m min in un znn m um mm mu m s ates 00 0 g ernmu 0 0 9 w rld 0019 nxcxico 00 s united 00 1 stat 0017 govcrumcm 00 7 atee 00 1 security 0017 united 00 5 uiand 00 2 seviet 0017 at 0011 ea a1 00 0 united 0015 eengusa 001 meriean 0009 nueiuar 0015 country 0009 u 00 0014 texas 0 009 0 007 011 0007 h n itin i n mm min wan min um i u an is n n 1950 2 ma xx exit 7 govcruniunt 0058 defense 0056 guvernrnent 3 a 0 027 mi1itary 003s Knox an 1 0 025 ere 0033 tuxas 1 0 023 accunty 0030 territory 7 0 022 trengm 0024 part s 0 022 nuclear 0019 rupublic 0013 0 01s weapons 0017 nxilitLuy 0 011 0010 arms 0019 state 001 niuurugua 0 014 maintain 0012 make 0009 isthmus 0 011 Strong 0011 Faculty Recruiting ART Papa MALLE 1 m gt000 M 2m TOT 50 100 150 200 250 m 100 150 200 250 m 100 150 200 250 cs 72 Xue ui 02113 and 5550 apnl 724 data 01014 111 0 01212 laculty 002811 ward 001501 malm 0 01073 dnvid 002012 leseaxch 001108 java 0 03085 ve lunch 001765 Lop39 DD1366 le 0 02947 schedulc 001055 made 001288 31 0 02470 candidate 001550 mam 001288 directory 0 02080 151k 0 0131quot ample 001152 151011 0 01001 0 01278 Enron 001007 1 0 01421 001232 duasex 00005 b 0 01352 LDA 1 ma 10 200 20 on my 05047 m 1 0 08922 I I msmu 003772 111 0 03702 les 002531 java 0 02522 vs 002511 I directory 0 0197s shlnmn 001057 sen 002023 0 01982 a1 001021 Pan 001550 chucked 001481 i igtributmng Cmnditianed 0n Time probability Infamous Mun mod waded aauaJeguoa gdm u topic mass in vertical height Social Network Analysis with Links and Text Role Discovery Group Discovery Trend Discovery Community Discovery Impact Measurement How do new links form in social networks 1 2 3 4 Randomly Poisson graph Pick someone popular Preferential attachment Pick someone with mutual friends Adamic amp Adar LibenNoweI amp Kleinberg Pick someone from one of your communities Mimno Wallach amp McCallum 2007 2 v 4 a E 52147 gi v r 9 2 6577 g Via 3 71 3 33 vj H V 7 5 H H will iiilyiul mulliiniiwilllilting Kimball llziglliu new volitioi Hill ll A Communitybased Generative Model for Text and Coauthorships 1 To generate a document we first pick a community 2 The community then determines the choice of authors and topics 3 From topics we pick words A Communitybased Generative Model for Text and Coauthorships Graphical Model can answer 0 0 various queries Pauthor3 author authorZ Pauthor3 author authorZ text Pcommunity authors P authors community P text community P text authors Link Prediction Probability of NIPS 20046 Coauthorships S 6 a g V a g 7 t t t mm ADAR Aumoas sow TOPICS Preferential attachment is much worse at 40121 NgA KollerD ParrR AbbeelP JordanM MerzenichM MelB JordanM JaakkolaT SaulL BachFR SinghS WainwrightM NguyenX CommunityAuthor View features feature markov sequence models conditional label function set number results paper based function previous resulting introduction general policy learning action states function reward actions optimal mdp control controller model helicopter system neural fonNard learning systems model models press shows figure related journal underlying correspond present effect figure references important increase similar addition increased learning control reinforcement sutton action space task trajectory methods propagation belief tree nodes node approximation variational networks bounc number results paper based function previous resulting introduction general theorem case proof function assume set section algorithm bound field boltzmann approximations exact jordan parameters set step network log models inference variables model distribution variational parameters matr problem algorithm optimization methods solution method problems proposed clustering spectral graph matrix cut data clusters eigenvectors normalized GriffithsTL SingerY BIeiD GoldwaterS JordanM JohnsonM CampbellW JordanM WillskyA JaakkolaT SaulL WiegerinckW KappenH WainwrightM KawatoM JordanM BartoA lgfilzinfic CommunityAuthorTopic View words model word documents document text topic distribution mixture suffix algorithm feature adaptor space model kernels strings natural learning category naive definition estimation single figure applied obtain set labels analysis adclus pmm function evaluation problem alphabet number results paper based function previous resulting introduction general prior posterior distribution bayesian likelihood data models probability model target task visual figure contrast attention search orientation discrimination propagation belief tree nodes node approximation variational networks bounc field boltzmann approximations exact jordan parameters set step network log models inference variables model distribution variational parameters matr network variables node inference distribution nodes algorithm message tree number results paper based function previous resulting introduction general theorem case proof function assume set section algorithm bound mixture data gaussian density likelihood parameters distribution model functii control motor learning arm model movement feedback movements hand eye vor visual desired field controller force cerebellum vestibular neural data activity figure firing movement motor speech dynamics nrncnnf nffnnf fimtrn rnfnrnnnnc imnnrfgnf innrngcn cimilgr addifinn innrngcnrl Social Network Analysis with Links and Text Role Discovery Group Discovery Trend Discovery Community Discovery Impact Measurement Our Data Over 16 million research papers gathered as part of Rexainfo portal Cross linked references citations 139 Rexa Search Engine 7 7 a 39 4 I l httpznrexamfol Q qu Coogle W3 Rexaiinfo 1 I Research 0 People a Connections 393 Papers C Authors C Grants Advanced Search Search Help Optional fields include abstract body title author venue year tag 35455 I39na39v 25 AND DH Cir 12 1511 s OH About Statistics All Tags Sample Queries venueacm authomowslsy quotconditional random fieldsquot drt Previous Systems Lani e p nguvites 1 V a E m a mh EFMm am me Emi ViEW cu Cummumcaim lag11295 lg m mania wipeiegumw i 39f 1 3 g i E d 3 a A Critical Evaluation of Commensurahie A i 439 WW5 A LBWquot 39 WW5 WE Semantic Interpretation 1990 0er 3 NEE Researchlndex B k iEmer summam Fem Numg Ruben VWlenaky Univeisiiy uiCaIiiumiaf ThmnnlhIntamuunn Cnn rmce ivampulauonalLl F a k 1 Nc Ma 2 an r r ape 2w J L Computer Science Research Paper Search Engine Made possible by Just Research and Justsystem Msmcl this papenve critically evaluate mm recent Mumquot mtzvpxetai 7 mva Hubbs SuckeL M mm and mixes i932 mi N g and Mnaney I Search Hal evidence arexepxasenleiimacnmnancmency39hatzmb campuedm A II J J myia compare allemate examining ii apyeus um a single scalar measu abducuve ppmm ma some mm n snluhuns Ugdate mumm xnmmm my ex Interaction 0quot mm m m M Aguizs KnnwlcdgeReynsumnnn Cnnyumve ImmaceDesim Minn ed Dam alumnus Mggnm 2 mg maxmm retrieval quot quot quot5 quotquot3 M quotquot quot quot T39 QnmunuCnmyimlig Hashing Snmng Emacnnn mam imam Fikumg hmk slight muaiiicaiinn nilhn m nccasinmllylinferrin spurinus inmpmanm afgnikr dqnhsJablu k h quotH minhm m minimum disambigualmnxs discussedinKuy 1 ii 199m w mu assum me man R mmquot m m Limdhy IiiJ 4 Ag 13y 7 9i Th lanox lug mm in all bnd NT tam Laiinou i 19 yggg a m E quotecuquot quotmumum r h v y d i i n 1mm mmmmmum mmzmmmmWWWWWwmwzmnnmm mmxmmm W Wm mmmWmunawmimmW WWW mum w WW mmmmmmmmm mm mum kulmnlin W m mmWumnmmwumyimwwm umme inkmmnnmtytm Arn39vr miingrqiiwmima durum L 01 i 39 is am anjumg Efzecnve Demon Su port in Timercn dc Dom nmm mmm mmnmmm mmwmmwWmWNW mnwmmnwu WWW n a m39 nri rm n MW n 1 A M MLJ M quotWW 39 K 439quot TM Yquot mm L reiuedbyAmhuMcf kmm N JiEonReWWKHSZwSe more a Amniimmi g Scholar w nul nnalrandom a ds39l mhahws mmnne H rse memm m anew as mnan M h kab m Mme in mm w and Law nu a LA ng a LML kw mcmumuum m Magnum mmh Menuquot 2 m Fm am cnmzchuanrrrL a mm H mm annmanris pun Sha aw gavsw with Lnnditmnz vandum elds rm rpm nu W n F Wuquot FAME m F n mm N m C H Rand n pm mm DE mm o Cmpum 51c Warmeum at 91 uHumnanqulcz zmmcmqv NAMLL m umwm mm Angm a uugmm mgmwcmu nun Err enl y inducm lealurss ulcondmonahandwm elds AM umnquot mum v gt45 a h MU mum Mumucmmm R ndumFIEl Mm MLCdm u rmmm a m r waan Mum m Mmmws H n eum Ccmzvence on anzmam y u A39h39xmi th mu m r n Amass m r n my Edu run r r u mm AMECaNm Xsz m mw w H192 amzx van um um mam mm mm mm Fm o Annmw MLCEWW xml we w a cm Canav uhm qmt s max zoosrmm u A A W w W an mm H u UL n 5 mm A leds F ndilmnal Randum nun Ear 39R su mam Wu Facplt rarynmmwmw rm myWm unanimmm mm ream mm and vmcmua gum mm Previous Systems More Entities and Relations Rexa Searm Engina as E 9 mn39rexamm Rexainfo I Research 0 People x Connedlons m Laaom Papers r Authors F Grants Advencedsea39ch Sealzh My apnam nems mmuue abstvacl may we aumm venue veer tag WWW mmokmu Deimmso About Statistics AllTags Spansmed by A quot Um wmk m automated Wmma ow exnaclmn arm ccHetevenne 5 Va Mum Hmshen mam excuse m mammms and vmssmg new We we nammue am mm m pmgvess Vemnn V mans Cvene hv ESLDeDanmentnICantevSmnceUmvmWnlMuaaamaem Dan f Papers P Aolnors f Grams l Irable extraz r Oprlnnallleldellmlude absnabr bcuy nusaolnbr yenne ye r nesln yuseAMrokavll Delxull eon SealLll amung napers oslnn lluelv lable exlraenon nesolls no nr aboul mean 1 Table exlracllon using conditional random lields Davld FlmD Andrew McCallum Xln we w arose Crmt arm as rne ab ry lo nnd lables and exlracr l norn lbern ls a necessary componenl or dala mlnlng doesnon answenng a lmormallon remeval lasrs Documents o len comam lables m order lo cornrnonrcale densely packed rnulndrrnensronal rnrorrnanon Tables do lnrs by ernploy rng layout ballerns to emcrenlly rndrcale nerds and records m twodlmenslanal rdrm rnerr ncn cornbrnanbn or ldrmc MW y r m n new m 7m rnrorrna lon nd olner ancnsr 2 Learning able extraction mm examples A rengn Vun Vang Nlanll Ma w lCLllallollsl mmlriinlAn r Hasan Dawlcu Gulzhen Vang Mrcnael Kuer rdnar Ramakrlshnan Puma 2m Aolornanc dala exlramlon norn semlslmctured soumes socn as HTML pages ls rapldly growlng mm a prahlem 01 srgnrrrcanr rrnponance spurred by the growlng populamy or the so called quotshopbmsquot that enable end users 0 compare pnces or gmds and olner senlees at varlous web sues wrlnool nayrng lo rnanoally browse and nll om rorrns al eacn one m rnese srles rne mam prcblern one nas lo conlend wrrn wnen desrgnrng r5 Llldllnnsl 4 Learning ln39ormalion Extraction Rules or SemirSlruclured and Free Text Slephe a Maclnns Lsannnb el tails 233 395 Mr or arLllne lexl lmormallon can be rnade avallable lo aolornanc processrng by lmmmatlon emamlan us syslerns Eacn IE a separale sel m mles loned lo lne dornarn and wnnng style WHISK nelps a oyercorne rnrs knawledgeanglneerlng bomeneck by learnrng lexlexlramlan rules amornancally WHISK ls d sl nedm nandlelexlslyles ranglng rrorn nrgnly slrocloredlorree lexL rnclodrng lexr lnal ls nerrner ngrdly lorrnalled nor composed raz manner 5 Automatic Table Ground 39 Melhod 3 3 u 4 Dans Seanh an P o 2002 Table extractinn us 9 on llonal random elds nnp VEXalnlDDapampr7ld ECGEDAEDZKEFACBDFSEAHEHNDAADZBUAE v 539 Papels f Aullmrs f Giants Seanh Opilmlallleldellmllldzabsllacl ucuy lllls aalllal vrmle yeel up new ll Derxull lson l Table extraction using conditional random fields Dav Mimffcgmu flgu avid Plnlo Andrew McCallum Xln Wei w Bruce lel am when 39 39 SIGH zoos qu Emall llnk rmquot am le Escholan Cliesee DBL maal MSNy Rexa Raw layout leallnesx condltlmlal random llelasx II we Mole nf lnlannallan exllacl lml M Abslra 1 ex Enl I rne aplllly la llnd laples and exllacl lnlcnnallan lva them l5 a lnploceedlngsltplnla2003tahle nec a componenlo da ao mlnlng quesllan answenng and olhel aulnor Davld Flnlo and Andlew McCallum anu Xln Wel and w W s es lnlannallon lelrleval las s allen cpnlaln laples ln older in Bruce Cm cammunlcate densely packed mallmlmenslonal lmcllmallcm Tables lllle Table extraman uslng condlllanal landom lleldsquot aplnls py employlng layaui pallems lo elllclenlly lnalcalellelas m1 book lle s m lecolds lnthdlmenslonal pnn ennc c m lnallpn ollalmalllng and canlenl lesent allllcallles lor lradlllonal language moaellng lec nlques awever Tnls aper lesenlsme usemcondlllona m p l randa nelus CRFsl my laple exilacllan and compales lhem Wlln Tamas quotlimequot Malk v WME SLHMM51 UquotW9 HNW Sr lEXPaW expewnenlal resulls l2 2 nl classlllcallan l 393 l u lmmmallcm letlleval m w l speecll lecogrllilon w l l Dpelallons l7 ll Relerennes 16 Sorted by dme l cllallpns l alpllabellcally en alllomallque la v n aala l n escnencnla Coll la39 l Fel sna Femanda c N Pelella snannw Parsing wnn Conditional Halmam Fields HLTVNAA CL 2003 l2 Llldllcllsl m MALL ne learning for clllngs 11 apnea py due l cllallpns l alpnapellcally Trevol cpnn Ally Hay 57mm Mellssa Dsbclme SeaIn canni am Correcting a el w Eluce mall Mallnew Klng Wel Ll xln Wel auASM a system or quot quot75 7 n new new i Fields U5in Erlrol odes em 5 g I g m wmmdm JCDLV 2002 gff c lellanlalCompulallonal Llngulsilcs pages ml72ol5lc lErllilllJlls K U gt Marian Wamwngn rpmmllaama sans wlllsky 5154 31155 wgc glggga quotggmggquot9gng V M f ff39m 7 WWW 9 quot79 quot N39Fsr 2 02 4 probablli i model tar labeling and segmemlng sequence A 3 V data ICML l2 Lll lL I 39 Dans Rexa w Bruce Cmfx mm rexamlnaumm use1394mmnasusaaszwnuaacammx v Rexa nfo 539 Papevs f Authors P mams Seanh my no stowed Opunna He d mmuue absham unuy We may venue year an 1 Arman1M4 rn39Taqs39 nwesm y uumvaaaum V quotWk quot W Bruce Croft museums WWW mmgmsmed Professor Depanmem of Computer Smence Umverswy av Massachusetts BRUCE CROFT Amherst MA 010039264 m m g as New URL humc esumasseaupersumvemn mm Publlcallons 1 la 40m 253 lava was ilafians 5mm by dam cuahons 2004 Ea ulnars x cm aumars Cmng amms 1 la 40 12 Saned by dale numbev name Rexa w Bruce Cmfx a Q 9hnnmxamlnaumm Rexa nfo use1394mmnasusaaszwnuaacammx v 9 U 539 Papevs f Authors P mams Seanh cannnamemsmmuue absham unuy wsamnm venue year an wow u new so rch Hopi x Cmedbns 4mm Minis 39n Taas mums mm ummt vLaamA W Bruce Croft mamas WWW mmgmsmed Professor Depanmem of Computer Smence Umverswy av Massachusetts BRUCE CROFT Amherst MA 010039264 ma m g c um seau URL humc e masseaupersumwmmn mm Publlcallons 1 m 40m 253 lava was ilafians 5mm by date anauans Ea ulnars x cm authors Cmng authms 1 la 40 12 Saned by dale numbev name 1995 s Rexa w Bruce Cmfx mm rexam1naumm D343947alafusaasmmzclaacamwzw v Rexa nfo 539 Papevs f Authors P mams run People x Connedlons Arman1kg 39n Taas Send mas mm uumn mm Seanh cannnamemsmmuue absham unuy wsamnm venue year an wow u new so W Bru ce Croft ammmw mmgmsmed Professor Depanmem of Computer Smence Umverswy av Massachusetts BRUCE CROFT Amherst MA 010039264 ma m glEcs umasseau URL humc esumasseaupersumvemn mm Publlcallons 1 la 40m 253 lava was ilafians 5mm by dam euanons 2004 Ea ulnars x cm authors Cmng authms 1 la 40 12 Saned by date numaar name s 9 Rexa w Bruce Cmfx D34l3947CD76FD5595E492CIEBCEESWZY v mm YExaYnlnauthnr7w 539 Papevs f Aumors f Glam Searsh OpnnnaHmmahmhlde absham Hwy 0le mm venue veal an eulokm u new sea W Bru ce Croft Wasmlw iiungmsmed Professor Depanmem of Computer Smence Umverswy DY Massachusetts BRUCE CROFT Amherst MA 010039264 ma cmngl cs umasseau URL upllc csumasseaulperscnnevcmn mm Publlcallons1 la 40m 253 lava was enarinns Ceamncrs Y cued aumars l Cmng aumavs 1 la 40431368 50m by dale Y cuanons 59mm by da e numaer Y name 2004 Danam Memev w Emce Cm mnninrng rne language model w 3995 m 2004 20932002 2002 20472sz and inlerence nelwm approacnea Ia rerneval Im P700255 20W 20m 993 999 99 999 997 997 997 Mam 939 VD 40 Pages 735 20 m WH7 WHE W96 W96 W5 W5 W5 W5 W5 gtltaoyon9 L0 w Bruce Cmquot clusternasea retrieval u5il1g anguage madea SIGIH 2004 a Hanmm Andve39s CovvadarEmmanueL W Bruce MM Answer models for W92 W92 WH2 YHHY YHHY YHHY W90 W90 W7H gueslmn answer passage renew am 2004 m 39 3553quot gre mggoal03mg739m9 quot399quot WWW James a canequot 2001 1999 1997193519351335 Hawzheng ang W Bruce CYDYL Bnan N Levme mev R Lessev 794 7994 994 799 7997 7992 A MulnAgenuppraarn for Peepin eer Basednicnnanon Ellen M Vaamees 2002 200 2000 2000 1909 1994 nem evals rem AAMAS 2004 v Hannm 999 993 939 Donam Memev wcmr Lawenxo w Emce man formal quotmes Allaquot mfg fva quotWM 391quot WM MW S39G39F r 2 Howard a me 994 Wis m 1997 Wu m Stephen CvaneanawnsemJ v0 Zhou w Bruce man A framzwark lusun Zaael 2001 I996 1094 1094 1092 largeactive querekpallsian CIKM 2004 m mnmm Jam smug 1999 Ws 19341 93141 93141 2003 EEmce Cmquot Language Made5 Iorlniarmalian Retrieval Mam Mma 995 994 994 999 7777 I 0 20030mamsl Emce man La ml 9quot 59 9 1990 1990 W7 W5 I993 2m 9 Rexa w Bruce Cmfx mm r2xamlnaumm use1394mmnasusaaszwnuaacammx v Rexa nfo 539 Papevs f Authors P mams Seanh Opnnnamemmwmud absham unuy wsamnm venue year an wow u new so run I Heal Ii Cornedan roar was w mm mm W Bruce Croft mamas WWW mmgmsmed Professor Depanmem of Computer Smence Umverswy av Massachusetts BRUCE CROFT Amherst MA 010039264 ma mngma umasseau URL humc es masseaupersumwmmn mm humans r 0 40 253 Ma was Marinas Ceamms x med aumms Dmng aumars 1 to 40m 50m by dam cmauons 1521 Suned by date numher name 2004 m a Cahuun 1999 Tuleranng aeeney by Prefekh gJava Objects mm rexaymnuapervm wasmmsunoccmmuammmsaaaz v 539 Papevs P Authors P mama E l Seanh cpnmyawnememmuueansnam unuy me aumm vwme yam an Wenemyaseaunaw u Devqu yeoe nu Tolerating Latencm by Prefetching Java Dbiec s Blendon Cahccn Kat awnmam u m a u exa as e ryn S McKm ey I 1 mm Gun VeGS naarCneseer To appear Worksmp an Hardware Supparr for Objects and MIcoarchlecmres fa DBL y Yahao39y MsNy Rexa Raw Java 1999 Em Ema hnk a WW I U Md ND E L W o new Ahslranr m vece yeavs process Elmex Ema qu or speed has become moveasmg Vas ev man mpmceedmgscahoanIESMD era memory speeo Onetechmquemv mpvavmg memory pe mmaneey an data pve39etcmng wmoh 5 sun 255M m away by mg 5 a Evenaon Cannon and Kathryn s MoKm evyquot r ased codes Dummy Haw me quotTmevamg La ency Dy Premean Java Dumasquot ave re amhers app ymg m paymey 3526 code mm papev we k y e To appea mksth an Hardwave Support My omens evama e a data preyemmng echmque caHed greedy pve ewhmg mr and Mmmawmecmves m Javaquot m eva mg atency m Java pmgrams In gveedy pyeremmng when a mop msmmmn Depanmem m Camputer Smence Umvevsmy m or vecuvswe memoa update an omen a we pyevemy omens m wmch Massachus ts a yeceye w escnbe Wee ano muepmcedura a gomhms rm yea 1999quot campulmg amen m pve39emh and we pvesen pvehmmary vesmts Expand a gay expenmema ve us a y memory a m Relelennes 11 Soned by date Manor s a phabe caHy omen My mg 39 y yavay a gamhms 5 y y mym Rom Gunndar 5 Sam strum JumpPainter accuvacy m am y techmques y y y Prelelchmg Ia Liam Data Structures ISCA 1999 L5 Hemansw Tnshm M cmhmm Mavk D Hm James H Laws mm H sachecanaciaua Swalure Layout PLDI 1999 5 wahznsw Shaw Rubm Da M chae James F KmaeeJamA S ankmm Donam r ans ey Kmm m a 99 2 ymf y m m quot7 Hamammham J Elm a Mass w aycyame mm w Bruce Brad Gamer Chanma Knnu symmy John I an M Ausm Cwquot Kathryn McKm ev 4755 Emach quotWasMme e gunammms 05 mm A5 pLQS 99 37 Wasmums m Su Jan Hessech an Iveworked Mumnews m y manualon Syszeme NSF EIA 1995 a y s em Hanan Virtual came Line A New recnmque 2 Improve ea he Exp rs v9 rm Dane nttp rexatntpgrant7 393 e 9 Rexa Kurase August 1 1995 use Research tnitastntttttte tnttastntctote to Suppnn ReseaKh on Networked Multimedt F smtmnaauazcasmwmaazmnsmt v F Papevs f Aotnors f Grants iamt uptttrrrat meta trrrttttue atstraet Dauy HHE aottrer yentra yeer My rsHDoRm u Dena son Information Systems enamel James F Kumse John A Slankcvic Donald F Towsley Klilhi Ramamvilham J Eliot E Moss W Richards Adrian W Bruce Cm Kath n McKinle NSF Gram EIAs9502 39Augusl1 1995 A Decemberzg 1999 w M 11 Scmed try dale t crtatrons t atpnapetrcauy rnrs awartt proyrttes support to eqmp a netwovked expenmentat rm yaearrmwnratare rs rartmswt testbed to enattte researcn m tne oeyetopmentot trre pperatrng system Emery DgtB21ggh Eemamm e Zorn Kathryn s McKWey no networkrng opteot manage ent and rrrtornnatron retneyat Compgsrng ngfrPerlannaIme Memory Allocatars FLDI components or tutore networkett mottrmettra rntormatron systems rne zoot r srtetrnrst p testbed wm oonsrst ot two shavedsmemo mottrpropessortaptttttes Evendan Canoon Katrrryn s Mckrnte Dala FinwAnaysrs or attacnett to yerat parauet mass storage lD aeyrces and a mg spe Somme Pretetonrng Llrlked Data matures in Java IEEE ATM network rne researcrrteam wru pe aeyetoprng seyerat key FACT 2001 r rtatrsnsr naroware and sottwareteennotogres needed to su rttotore SaHy stow Mark Hand ey thenara Paanye ng Wramer netwmketJ mottrmettra trrtormatron systems Specmc veseavch areas Equanarmam con man cantoi tar tumm appmatron rnetoae operatrn systems to networkrng opteet management antt SIGCOMM 2000 teen Hztltaust gt rntormatron retneyat SaHy Ftoyo Mark ktantttey Nenma Padhye Equauorrsased congesnnn Eanlrol tar UnicaslAppIcalmns mamas 2000 V Luzmm st Sopratrk anat naryya Don Towstey James F Kuvase sign an naysis a Loss innisatrnn FIIIEIS or Multisast Congestion control cwsm recnmpat Reptm m 9946 Depanment ot oom mev sprenoe Unryersrty ot nrtassaonosetts Amnerst 2000 to stratrtrrsr rltatnryn 5 Mt We onyrerr mam nuantrrying loop nest locality using 5 35 an tne pentct nennnrna s Tvans Corg n Syst yot t7 pages 235 1999tc artatrarrsr Evendon t oon katnryn s McKtMey roerarrng Latensyoy Preternnrng Java DDMr s To appear Worksnop on Hardwave Eu pon tor Opteets and MrcroaromteetttrestorJaya 199913 artetrsnrr thenora Paohyelames F Kurose Doneto rowste aateey Kmdh A rcpFnsnny Hale Aduswneumlocal or nirnunus Media Flaws aver sesi 5mm mums cmscr w y gt P Papevs 539 Aumors f Grams rnaenine learning AND velnfunement lea g Searen Op anaHremsmnmde abslvam bnuy rrneaunmr venue year lag auenesnwusemomnnu Dehuh W 523th anrnnp aumars usrnu uueru machme earmm AND quotremtovcemem earmmquot 1 Fllchard S Sutton emmr A Specrar Issue a4 Mapnrne Learmng pn Rernmrpernenc Learmng varurne a Two promerns wrm backpmpaganan and mner steepesmescem rearnrrrg procedures rpr Remarks Open Theoreuca auesuans m Hemmmemem Leammg earmr Bet Hesuns Ham annular 2 Thomas G Dlelterlch Drvrae and Conquer Memos ror Maernne Leammg Presraenuar Vuung Invesugamr Award Compmev and Inmrrnauon sprence Deverpp ana Prmype Methods var me Autamauc oarrprauun and Varraauun or Computer Mmers M Comp ex Sysrerns DWHhrsheh Learnmg Ngonmms rpr Shucmva Su Msed Leammg Unuers andmg and Sca rngrUp Macn rne Leam mg A gomhms 3 Andrew G Earto Lyapumw Memms m nernmrperneru Leammg Asspcranve Search neMka39 a rerruorpernem rearnrng assamauve Newark Ir aesrrep me presem anarysrs or me smpnasnc neuranar dynarnrcs can vep aced by an anarysrs or W5 determrmsnc neurana dynam An approach to Leammg cprurm Surraces by Connecuomst Sys ems Rernmrcerneru eammg and us reranunsrnp m superwsed eammg 4 Manm A Hledmlller Learmng m comm aynarmc sysmms speers m eammg neurar can rm Hrgn puamy thermos a camm by rerrnurcerneru eammg a a case study Ka une arnsmnner Desrgn Prmcrpre km r cr 9 Iaglc exper Dance a umrereyam documems s cuedrdpras rrrrerrrar resurrs 3277 rrrmrmauarr 5123 rexr m emrrmema resurrs nous web Em rexr ms query arrguagereary web 52 waraIE seamMGEB wdecHQQB mm rma 257 querywarrguagerese sear as age l5 access 239 Rexa Imm People x Comedians En 023 web a 02527 za a 7 la a 034 expenmema resuhs AL mum rrrrage a 0 ex a quotrrrrprmanprr retrieval mpic hnpzr2xamlDhldu mp mp2 hml uprrprrar News ncm e epsrracr buuy39 HHe39 author39veuue yearns an Trends papers y c an papers crrauorrs ya 0 3H cues cavean sparse F Papers f Authors f Grants Adv nzzr Seakh M 9 u s wuss m5 0R aru Dzrwk rs 0R cnaudrssmmrsmp 9591rank 59mm Impact u wers ny 291 rank M11400 recem Tap papers Soned by pnausrrs broadesr rmpacr eavhest Gevam 5am crrrrs aupkrey farmWeighting Appmamas In Automatic Tm elrieval 57 arrarrarrsr Mymn ickner Harpreer s Sawrrrrey Jppaman J Asnrey erang Huang ayrarr Dom Mnmka Garkam Jrrrr Hamer Derrrs Lee Dvagunn Fetkowc Dawd sreers perer Vanker may bylmage and mm content m asrcsystam 251 mralmns Dougras R Cumng Jan 0 Federsen nayrd Fl Kavger Jarrrr w Tukey SmittenGamer A Clusterb35941 Approach ra Browsing Large rrer Eduavdo M rorr h ma rrnsms Farpursps s as 90 awn Images by earners using L alar 7mm andsnapa rm Hahn e A Pennand a Prpard s Sc am pnmpaarpamanmasad 1 V4 Rexa quotneural nttwnrks mpk r IEhnpr2xamlDhxdu3mprsmprd92hml f Papers f Ammrs f Grants Advunzzr Searzh m Qprrprrar rrerus ncmae eusrrecr muy xrne auuror Venus yearquotag u M Wmnmnnru Demuer v Rexa n L91 2mm ream x comma n n cnananemmrsm m rank 2mm T p39c39 quoteuralnemnrks Irnpammvers ny aesuranngam Tcglc kerm Trends39 papers r m an papers r enemns r quota or 3H cues recem Phrases neura o 3 VEHema networks quotMe39s 59539 eammg o zssneurar network mg amm sneurm work Dvgamzmg aps recurrent somawe memory etworks orr nema nets vgamzmg rganrzrn ap u r ed sznraaenunns connecuamst anmprar neurar netwark w gms zrepur 1 etworks m lcs Eliedloglcs expe mema resuus 2332 expenmema resuus 555 crassnrcagan EOE vrsuamnex 93 V gL gCgaAexlmm bggg gn gm Tap papers Soned by cnauans braaaesr rrnpam eavhes 39 quot553 95W 5 557l 39 ESVW BQEU Kun Harmk Maxweu Strncrmambe Harben wnne Mulllayer 3922 rpvg lmgnmm P52 Fudaiagward ewal envarks m universaAppmximavare r235 39 39 s errenens ggaglsclgglemwm HawamA nawrey Shumeatga ulA39Eakeo Kanaae Neural genenp argor nnms 3m payesrar use quotmamquot Fm M39s quot U ua mm and speech vecugnman 25m mugnmanvaii gt MEWS Ten n 7715 my ghmap rmsmenansr MEL Scott E renrmen cnnsnan Lemere e Camdecarreauan may no 3 l j LeamingArcniremre r r cuanansy genem a ggvr hms 0 01M An rs ngh Jesper vmers y MauraNemrkEnsemblzs crass as mm rel I F Validation andAm w Lemmy r m crlanuuev r wrangan v 0 We P Tamayo lure remr panems argene elpressian win 5 eatuvesK2039g51 5elfva anlllngm 5 methods andappmutlan un were 4 r gt Topical Transfer Citation counts from one topic to another Map producers and consumers machine transialion iniarmatian retrieval gem nets 45 i 35 quot N Iniormation V EXII ECIIOH 70 context 17 naiurai coliaboraiive ianguage ng 155 rsing 16 7 Topical Bibliometric Impact Measures Mann Mimno McCallum 2006 Topical Citation Counts Topical Impact Factors Topical Longevity Topical Precedence Topical Diversity Topical Transfer Topical Transfer Transfer from Digital Libraries to other topics Other topic Cit s Paper Title Web Pages 31 Trawling the Web for Emerging Cyber Communities Computer Vision 14 On being Undigital with digital cameras extending the dynamic Video 12 Lessons learned from the creation and deployment of a terabyte digital video libr Graphs 12 Trawling the Web for Emerging Cyber Communities Web Pages 11 WebBase a repository of Web pages Topical Diversity Papers that had the most influence across many other elds Topical Citations Title Diversitr 400 618 A tutorial on hidden Markov models and selected applications in speech processing 380 138 The selforganizing map 377 163 Hierarchical mixtures of experts and the EM algorithm 374 65 Quantifying Inductive Bias AI Learning Algorithms and 374 144 Knowledge Acquisition via Incremental Conceptual Clustering 373 155 A Tutorial on Learning Nith Bayesian Networks 372 244 Term weighting Approaches in Automatic Text Retrieval 31 294 Finding Structure in Time 37 173 An introduction to hidden Markov models 3 13392 Nearest neighbor pattern classi cation Topical Diversity Entropy of the topic di pape ribut ion among rs that cite this paper this topic ic Impact Diversity Simulated Annealing 52 459 Pattern Recognition 125 457 Probabilistic Modeling 3 455 Finite Autornata 66 455 Probability 89 45 Digital Libraries 102 377 Machine Translation 96 332 Mobile Robots 22 331 Graph39 9 321 Speech Recognition 120 309 Computer Vision 49 295 High Diversity Low Diversity Topical Bibliometric Impact Measures Mann Mimno McCallum 2006 Topical Citation Counts Topical Impact Factors Topical Longevity Topical Precedence Topical Diversity Topical Transfer 5 3 999 Topical Precedence 30quot Within a topic what are the earliest papers that received more than n citations Speech Recoqnition Some experiments on the recognition of speech with one and two ears E Colin Cherry 1953 Spectrographc study of vowel reduction B Lindblom 1963 Automatic Lipreading to enhance speech recognition Eric D Petajan 1965 Effectiveness of linear prediction characteristics of the speech wave for B Atal 1974 Automatic Recognition of Speakers from Their Voices B Atal 1976 5 3 999 Topical Precedence t 30quot Within a topic what are the earliest papers that received more than n citations Information Retrieval On Relevance Probabilistic Indexing and Information Retrieval Kuhns and Maron 1960 Expected Search Length A Single Measure of Retrieval Effectiveness Based on the Weak Ordering Action of Retrieval Systems Cooper 1968 Relevance feedback in information retrieval Rocchio 1971 Relevance feedback and the optimization of retrieval effectiveness Salton 1971 New experiments in relevance feedback Ide 1971 Automatic Indexing of a Sound Database Using Selforganizing Neural Nets Feiten and Gunzel 1982 Topical Transfer Through Time Can we predict which research topics will be hot at ICML next year based on the hot topics in neighboring venues last year learned neighborhood distances for venue pairs How do Ideas Progress Through Social Networks Hvloothetical Example ADA Boost SIGIR Info Retrieval ICCV Vision NLP How do Ideas Progress Through Social Networks Hvloothetical Example ADA Boost SIGIR Info Retrieval ICCV Vision NLP How do Ideas Progress Through Social Networks Hvloothetical Example ADA Boost SIGIR Info Retrieval ICCV Vision NLP Topic Prediction Models Static Model 2 Av I Transfer Model AU E U o Z77 proportion of topic Z in venue V in year i 0 Av static topic coef cient 0 6 5 topic transfer coef cient Linear Regression and Ridge Regression Used for Coefficient Training Preliminary Results Sr 39 531553631 1 397 quot meTraLm cr Transfer Modc1 iRidgcl 39 if Mean 1 Sq uared Prediction 39 Error 1 P l I O O O O O O I o o g 39 Smaller L 1 IS better Transfer 1 Iquot 1 4 f q MOdel Venues used for prediction Transfer Model with Ridge Regression is a good Predictor Topic Model Musings 3 years ago Latent Dirichlet Allocation appeared as a complex innovation but now these methods amp mechanics are wellunderstood Innovation now is to understand data and modeling needs how to structure a new model to capture these Classification amp Information Theory Lecture 8 Introduction to Natural Language Processing CMPSCI 585 Fall 2007 University of Massachusetts Amherst Andrew McCaIIum Andrew McCallum UNIass Amherst Today s Main Points Automatically categorizing text Parameter estimation and smoothing a general recipe for a statistical CompLing model Building a Spam Filter Information Theory What is information How can you measure it Entropy Cross Entropy Information gain Maximum Likelihood Parameter Estimation Example Binomial Toss a coin 100 times observe r heads Assume a binomial distribution Order doesn t matter successive flips are independent One parameter is q probability of flipping a head Binomial gives prnq We know r and n Find arg man prn q Maximum Likelihood Parameter Estimation Example Binomial Toss a coin 100 times observe r heads Assume a binomial distribution Order doesn t matter successive flips are independent One parameter is q probability of flipping a head Binomial gives prnq We know r and n Find arg man prn q Notes for board likelihood pR r I nq qr1 qu 10g likelihood L 10gprnq0lt10gqr1 qquot r10gq n r10g1 q Lr n r r gtr1qn rq3q aq q 19 Our familiar ratioofcounts is the maximum likelihood estimate Binomial Parameter Estimation Examples Andrew McCallum UM Make 1000 coin flips observe 300 Heads PHeads 3001000 Make 3 coin flips observe 2 Heads PHeads 23 7 Make 1 coin flips observe 1 Tail PHeads o Make 0 coin flips PHeads 3927 We have some priof belief about PHeads before we see any data After seeing some data we have a posterior belief ass Amherst Maximum A Posteriori Parameter Estimation We ve been finding the parameters that maximize pdataparameters not the parameters that maximize pparametersdata parameters are random variables pqlm prlnn pqln prlnn pq prn constant And let pq 2 q1q Maximum A Posteriori Parameter Estimation Example Binomial posterior 120 I nqgtpltqgt mm qgtHlt2qlt1 q 10g posterior L OC 10gq 11 q quot1r 110gq n r 110g1 q 07L Mrll qn r1qq 1 ampq q l q n2 Bayesian Decision Theory We can use such techniques for choosing among models Which among several models best explains the data Likelihood Ratio Pmodel1 data Pdatamodel1 Pmodel1 Pmodel2 data Pdatamodel2 Pmodel2 back to our example French vs English pFrench glacier melange versus pEninsh glacier melange We have real data for Jane Austin William Shakespeare o I Stars rthouu pShakeSPeare stars thou Testing Document Categories Training data Andrew McCallum UMass Amherst Statistical Spam Filtering Real Email Speaking at awards ceremonyquot Coming home for dinnerquot Free for a research meeting at 6pmquot Computational Linguistics of ce hoursquot Are you free to meet quotquotquot with Dan Iurafsky today at 3pm He wants to talk about computational methods for noun coreference Spam Email Nigerian minister awards quot Earn money at home today FREE CASH Just hours per dayquot Document Classification by Machine Learning Temporal reasoning for planning has long been studied formally We discuss the semantics of several planning Prog Lang Garbage Testing Document Categories Machine Learning Training data Neural networks and other machine learning methods of classi cation quot Andrew McCallum UMass Amh Planning Semantics Collection Multimedia 1 39 G b i Plinnmng 1 based on col ctifrel for quotMultimedia User Wit tempora the semantics streamin studies t 1 d g reasonmg of program faLEZigegypS Video for quot of GUI quot has been dependencequot Work out Na39I39ve Bayes formulation interactively on the board 1 2 3 4 5 6 Recipe for Solving a NLP Task Statistically Data Notation representation Problem Write down the problem in notation Model Make some assumptions define a parametric model Inference How to search through possible answers to find the best one Learning How to estimate parameters Implementation Engineering considerations for an efficient implementation Engineering Components of a Na39I39ve Bayes Document Classifier Split documents into training and testing Cycle through all documents in each class Tokenize the character stream into words Count occurrences of each word in each class Estimate Pwc by a ratio of counts 1 prior For each test document calculate Pcld for each class Record predicted and true class and keep accuracy statistics A Probabilistic Approach to Classification Nai39ve Bayes Pick the most probable class given the evidence 6 argmaxc Prc Id C a class like Planning d a document like language intelligence proof Bayes Rule Nai39ve Bayes PM Id PrcjPrd I cj PrCjgt1iPrWdi 3961 C 539quot Idl J Prd 2Prcknprwd Ick Wdi the 139 th word in d like proof Parameter Estimation in Na39I39ve Bayes Estimate of Pc 1C0untd E 6 PCj C2C0untd ck k Estimate of Pwc 1 2C0untwidk dkEcj IVI IV HZ 2C0untwtdk t1dkECj Pwilcj Information Theory What is Information The sun will come up tomorrow Condi Rice was shot and killed this morning Efficient Encoding I have a 8 sided die How many bits do I need to tell you what face I just rolled My 8 sided die is unfair P112 P218 P3P8116 Entropy of a Random Variable Average length of message needed to transmit the outcome of the random variable First used in Data compression Transmission rates over noisy channel Coding Interpretation of Entropy Given some distribution over events PX What is the average number of bits needed to encode a message a event string sequence Entropy of PX mm 2pltxgtlog2ltpltxgtgt Notation HX HpXHPHXpHpx What is the entropy of a fair coin A fair 32sided die What is the entropy of an unfair coin that always comes up heads What is the entropy of an unfair 6sided die that always 1 2 Upper and lower bound Prove lower bound Entropy and Expectation RecaH EX 2xeXg2X 39 PX Then E39092PX 2M9 Iogzltpltx W HltXgt Entropy of a coin 1 I I I I l v VVI I I I 7 39 HUD i 08 06 04 I 02 II I o I I I I I I I I I 0 01 02 03 04 05 06 07 08 09 Andrew McCallum UMass Amherst Entropy intuitively High entropy chaos fuzziness opposite of order Comes from physics Entropy does not go down unless energy is used Measure of uncertainty High entropy a lot of uncertainty about the outcome uniform distribution over outcomes Low entropy high certainty about the outcome Claude Shannon Claude Shannon 1916 2001 Creator of Information Theory Lays the foundation for implementing logic in digital circuits as part of his Masters Thesis 1939 A Mathematical Theory of Communication 1948 Joint Entropy and Conditional Entropy Two random variables X space 9 Y 11 Joint entropy no big deal XY considered a single event HXY 2x692 Eyep I0XY 39092 I0XY Conditional entropy HXIY 2ng Eye pXy I092 I0Xy recall that HX Elog2px weighted average and weights are not conditional How much extra information you need to supply to transmit X given that the other person knows Y Conditional Entropy another way HY I X 2 pxHY I X x Epox Epo I xgtlog2ltplty I x 22pxpy lx10g2py 3996 22pltxygtlog2ltplty I x Chain Rule for Entropy Since like random variables entropy is based on an expectation HX Y HYX HX HX Y HXY HY Cross Entropy What happens when you use a code that is suboptimal for your event distribution created my code to be efficient for a fair 8 sided die But the coin is unfair and always gives 1 or 2 uniformly How many bits on average for the optimal code How many bits on average for the suboptimal code qu 2pltxgtlog2ltqltxgtgt xEX KL Divergence What are the average number of bits that are wasted by encoding events from distribution p using distribution q Dp H q Hpq Hp 2 px10g2Qx Z pltxgtlog2ltpltxgtgt Zpx10g2px xEX A sort of distance between distributions p and q but It is not symmetric It does not satisfy the triangle inequality Andrew McCallum UMass Amherst Mutual Information Recall HX average bits for me to tell you which event occurred from distribution PX Now first tell you event y E Y HXY average bits necessary to tell you which event occurred from distribution PX By how many bits does knowledge of Y lower the entropy of X IXY HX HX Y HX HY HXY 1 1 pltxgtlogz pltygtlogzw pltxygtlog2pltxygt a 1 2 WW 2pm 0g pltxgtpltygt Andrew McCallum UMass Amherst Mutual Information Symmetric nonnegative Measure of independence XY 0 when X and Y are independent XY grows both with degree of dependence and entropy of the variables Sometimes also called information gain Used often in NLP clustering words word sense disambiguation feature selection Andrew McCallum UMass Amherst Pointwise Mutual Information Previously measuring mutual information between two random variables Could also measure mutual information between two events 190637 I 1 x 3 ngltxgtpltygt speech recognition machine translation OCR spell correction maximize p1T text categorization parsing part of speech tagging maximize pT I maximize joint probability pIT speech recognition amp other tasks above train HMM PCFG n grams clusters maximum likelihood smoothing max peldata max pe data p6pdata6 Sequences n grams FSMs FSTs Viterbi collocations Vectors na39l39ve Bayes clustering word senses Trees PCFGs CKY Earley Syntactic features morphology clustering collocations really so alternative Probabilistic Revolution ill Really a Revolution critics Sail Logprobabili ries no more Than scores in disguise We re just adding stuff up like the old corrupt regime did admits spokesperson Note Just as in probability bigger weights are better ngrams log pw7 W5W6 ogw8 w6 W7 PCFG log pNP VP S log pPapa NP log pVP PP VP HMM tagging log pt7 t5 t6 log pW7 t7 NOlSY Channel log psource log pdata source Na39l39ve Bayes IogCIass ogfeature1 Class ogfeature2 Class Can regard any linguistic object as a collection of features here doc a collection of words but could have nonword features Weight of the object total weight of features Our weights have always been conditional logprobs s 0 but that is going to change in a few minutes Can estimate para meters automatically Our results are more meaningful results can be meaningfully combined 2 modularity 5 02 Contains 45269 WK Contains Contains a dollar amount under Contains an imperative sentence Reading level 8 h grade Mentions money use word classes andor regexp to detect this 432 99 50 of spam has this 7 25x more likely than in ham 5 02 Contains a dollar amount under ul here are the emails With both features 7 only 25x Naive Bayes claims 5945 of spam has both features 7 259225X more likely than in ham 90 of spam has this 7 9x more likely than in ham Mentions money Can adjust scores to compensate for feature overlap ill act money sole alleady incllded log prob adjus e 9 x x a W63 at we 5 02 Contains a dollar amount under 9599 260 1 5 6 85 23 Mentions money 15 33 15 33 independent Contains Contains Contains a dollar amount under Contains an imperative sentence Reading level 7 h grade Mentions money use word classes andor regexp to detect this independent Contains Contains oth 5 Contains a dollar amount under Contains an imperative sentence Reading level 7 h grade Mentions money use word classes andor regexp to detect this log pfeats spam 577 pfeats spam exp 577 3205 pfeats spam exp 577 3205 120 m is the email message xi is weight of feature i fimEO1 according to whether m has feature i More generally allow fim count or strength of feature 120 is a normalizing factor making Em pm spam1 summed over all possible messages ml hard to find But So instead Spam 6 weight of this feature is og pspam a constant spam and Contains old spam model s weight for contains Buy spam and Contains ham 6 weight of this feature is og pIing a constant ham and Contains old ling model s weight for contains Buy ham and Contains PmIC 120 eXP 2i Ki fimC Old New Question Question Question OUCH Entropy 051 log 051 0025 log 0025 029 log 029 Largest if probabilities are evenly distributed Amazing Theorem PmC 120 eXP 2i Ki fimC 1x 39I39 prJ39Huger l Snlgtjw1 Tn rfLr r39xuVi and ZINE 1 I39 139 ITO solve the maxent problem we use Lagrange multipliers L 7 7 2mm mm 7 Z a ZUWlilx 7 7n 2m 71 0L l logpx 7 Z hfilx 7 L 7142 pxx 4 71 exp lial 2a 171 Z 0X1 Z lib3amp0 l plxl 7 N exp 2am I So feature constraints l maxent implies exponential family a Problem is convex so solution is unique Define two submanifolds on the probability simplex pix The first is E the set of all particular set of features H The second is N1 the set of They intersect at a single distribution yr the maxent maximum likelihood mil Exponential Model Likelihood 7 Maximum Likelihood Conditional Models Given a model form choose values of parameters to maximize the conditional likelihood of the data Exponential model form for a data set CD x L Vprtz logPClDl Z1ogpcld1 ude39 2 log magnum Z cpo AMICl d a Building a Maxent Model Define features indicator functions over data points Features represent sets of data points which are distinctive enough to deserve model parameters Usually features are added incrementally to target errorsr For any given feature weights we want to be able to calculate Data conditional likelihood Derivative ofthe likelihood wrt each feature weight Use expectations of each feature according to the model Find the optimum feature weights next part The Likelihood Value The log conditional likelihood is a function of the iid data CD and the parameters A logPCDl log HPcld1 ZlogPc 111 ccle39l maiden if there aren t many values of c it s easy to calculate epoAKc iogPCiD1 2 log swat u ZepoA gd We can separate this into two components logPCiD2gt Z logepoLw e Z logzepomczd ml 39N I K ll 39Ii t 1 ML 2 IogPC 111 Ni Ml The derivative is the difference between the derivatives of each component The Derivative I Numerator 6 logexp Awflu d a A I cd 5N0 7 piig zn 2 ingig7 2LZ 3 ax a 6amp aZM m inulku i all Z Icd wharf m Derivative of the numerator is the empirical countf c The Derivative Denominator Z IogzepoL c39d 6 5M idc 61 61 Z 1 agepoLfcgd wauqanizexpz wfxcuad 01 I Z c I 1 ZexpzmcudmZmcud memoZBXPZL d s 1 51 eXPZi CUd aziflc39 magma c39 ZeXPZL cquotd 91 ZPc id1fc d predicted c0untf A ca39eCD L The Derivative III 310ng D 1 M i actual countfC predicted count A The optimum parameters are the ones for which each feature39s predicted expectation equals its empirical expectation The optimum distribution is Always unique but parameters may not be unique Always exists if features counts are from actual data Features can have high model expectations predicted counts either because they have large weights or because they occur with other features which have large weights Summary We have a function to optimize D it i39 a 41 i logPC i DJ log Adam Zepollfxad We know the function s derivatives BlogPC DMD01 9116M 39 f i j predicted count 1 A Perfect situation for general optimization Part II Comparison to NaiveBayes NaiveBayes is another tool for classification We have a bunch of random variables a data features which we would like to use to predict another variable the class a u The NaiveBayes likelihood over classes is explog 35 210g PW l0 Pltc1P l c PCVU Pw 39639 Zexplogl c ZlogP lc explzicnltdcl NaiveBayes is just an exponential model Zexpzi fh d cv PCldJ n lt i N L a Comparison to NaiveBayes The primary differences between Naive Bayes and maxent models are Naive Bayes Trained to maximize joint likelihood of data and classes Features assumed to supply independent evidence Feature weights can be set independently Features must be of the conjunctive 1029 c c Maxent Trained to maximize the conditional likelihood of classes Features weights take feature dependence into account Feature weights must be mutually estimated Features need not be of the conjunctive form but usually are 39 max pgtdata max p0 data pgtpdatagt PO Smoothing Priors MAP What if we had a prior expectation that parameter values wouldn t be very large We could then balance evidence suggesting large parameters or infinite against our prior The evidence would never totally defeat the prior and parameters would be smoothed and kept finite We can do this explicitly by changing the optimization objective to maximum posterior likelihood log PCi D logPU logPC 1 DJ Posterior Prior Evidence Smoothing Priors 262 no Gaussian or quadratic priors Intuition parameters shouldn t be large 2 252 Formalization prior expectation that each 4 10 parameter will be distributed according to 352 a gaussian with mean p and variance oz 6 1 1 2 o 2 They don t even capitalize my name anymore 1 M PZgalmexp 20 Penalizes parameters for drifting to far from their mean prior value usually p0 2521 works surprisingly well th Baily1 Z 2119 J39lLU JJ U I39J39ED U What is Information Extraction Information Extraction Coreference and Relation Extraction AS a family information Extraction Lecture 22 of techniques segmentation classification association clustering Introduction to Natural Language Processing October 14 2002 400 am PT a quotThat39s a super important shift for us in terms of code access fou quotdef Frm Qn 39mmrp Fnllndation CMPSCI 585 Spring 2004 For years WW QEQ Bill I Un versiiy olMassachusetfs Amherst 931g railed against the economic philosophy Mlcrosoft Corporatlon 3 a of open source software with Orwellian fervor g g g denouncing its communal licensing as a a a a quotcancerquot that stifled technological innovation Gates 3 5 2 t Today W claims to quotlovequot the open source concept by which software code is MlcrOSO 3 made public to encourage improvement and Gates 2 development by outside programmers 9315 g 3 himself says M will gladly disclose its L crown jewels the coveted code behind the Veg hte Windows operating system to select MICfOSO E customers 1 iii Andrew McCalum quotWe can be open source We love the concept a of shared sourcequot said 39 In Richard Stanman g 39 iii 5 rI It Bill Gates a E 3 gt H H rI in BERN m rgundgr of the Egg Winn countered saying IE in Context Main Points Createontoiogy Coreference How to cast as classification Cardie Measures of string similarity Cohen Scaling up McCallum et aI Spider Filter by relevance 39 IE Relation extraction V th augmented grammar Miller et al 2000 V th joint inference Roth amp Yih Load DB Train extraction models Query Search Label training data Data mine Semisupewised Brin 3 4 Coreference Resol ution Insrde the Traditional Solution AKA quotrecord linkagequot quotdatabase record deduplicationquot quotcitation matchingquot quotobject correspondencequot quotidentity uncertaintyquot Pa39r39w39se A my Metnc Mention 3 Mention 4 mm omp m Mr Powell YIN Powell News article Number of entities N 3 with namedentity quotmentionsquot tagged 1 Today Secretary of State Colin Powell Y one word In common 13 met With gcretary Of State conquot Powell Y quotNormalizedquot mentions are string identical 39 bdrid l ZZah e ML Powell Y Capitalized word in common 17 II I I ivir39f o39w39oii at 39 Powell Y gt 50 character quotquot939am quotequotap 19 sixtiogiaont39ons39TfYY I39I Y Insame sentence 9 B h ice Condoleezza Rice Y WIthIn two sentences 8 US Rice Y quotHobbsDistancequotlt3 3 President Bush Eggu39 atcms 1399 Bus OVERALL SCORE 98 gt threshold0 5 6 Noun Phrase Coreference Identify all noun phrases that refer to the same entity Queen Elizabeth set about tmnsforming her husband King George VI into a viable monarch Logue a renowned speech therapist was summoned to help the King overcome his speech impediment Noun Phrase Coreference Identify all noun phrases that refer to the same entity Queen Elizabeth set about tmnsforming her husband King George VI into a viable monarch Logue a renowned speech therapist was summoned to help the King overcome his speech impediment Noun Phrase Coreference Identify all noun phrases that refer to the same entity Queen Elizabeth set about tmnsforming her husband King George VI into a viable monarch Logue a renowned speech therapist was summoned to help the King overcome his speech impediment Noun Phrase Coreference Identify all noun phrases that refer to the same entity Queen Elizabeth set about tmnsforming her husband King George VI into a viable monarch Logue a renowned speech rhei apist was summoned to help the King overcome his speech impediment IE Example Input Text SAN SALVADOR 15 JAN 90 ACANEFE TEX39H ARMANDO CALDERON SOL PRESIDENT OF THE NATIONALIST REPUBLICAN ALLIANCE ARENA T RULING SALVADORAN PARTY TODAY CALLED FOR AN INVESTIGATION INTOANY POSSIBLE CONNECTION BETWEEN THE MILITARY PERSONNEL IMPLICATED IN THE ASSASS NATION OF JESUIT PRIESTS quotIT IS SOMETHING SO HORRENDOUS SO MONSTROUS THAT WE MUST INVESTIGATE THE POSSIB LITY THAT THE FMLN FARABUNDO MARTI NATIONAL LIBERATION FRONT STAGED THESE MURDERS TO DISCREDIT THE GOVERNMENTquot CALDERON SOL SA D SALVADORAN PRESIDENTALFREDO CRISTIANI IMPLICATED FOUR OFFICERS INCLUDING ONE COLONEL AND FIVE MEMBERS OF THE ARMED FORCES IN THEASSASSINATION OF SIX JESUIT PRIESTS AND TWO WOMEN ON 16 NOVEMBER ATTHE CENTRAL AMERICAN UNIVERSITY H IE Example Output Template 1 DATE 16 NOV 90 2 LOCATION EL SALVADOR CENTRAL AMERICAN UNIVERSITY 3 TYPE 4 STAGE OF EXECUTION ACCOMPLISHED 5 INCIDENT CATEGORY TERRORISTACT 6 PERP INDIV DUAL D quotFOUR OFFICERSquot quotONE quotFIVE MEMBERS OF THEARMED FORCESquot 7 PERP ORGANIZATION ID quotARM FORCESquot quotFM quot 8 PERP CONF DENCE EPORTED AS FACT ACCUSED BY GOVT 9 HUM TGT DESCRIPTION 10 HUM TGT TYPE 11 HUMTGT NUMBER 2 quotWOMENquot 12 EFFECT OF INCIDENT DEATH quotJESUITSquot DEATH quotWOMENquot IE Example Coreference SAN SALVADOR 15 JAN 90 ACANEFE TEXT ARMANDO CALDERON SOL PRESIDENT OFTHE NATIONALIST REPUBLICAN ALLIANCE ARENA THE RULING SALVADORAN PARTY TODAY CALLED FOR AN INVESTIGATION INTOANY POSSIBLE CONNECTION BETWEEN THE MILITARY PERSONNEL IMPLICATED IN THE ASSASS NATION OF JESIJIT PRIESTS quotIT IS SOMETHING SO HORRENDOUS SO MONSTROUS THAT WE MUST INVESTIGATE THE POSSIB LITY THAT THE FMLN FARABUNDO MARTI NATIONAL LIBERATION FRONT STAGED THESE MURDERS TO DISCREDIT THE GOVERNMENTquot CALDERON SOLSA D SALVADORAN PRESIDENTALFREDO CRISTIANI IMPLICATED FOUR OFFICERS INCLUDING ONE COLONEL AND FIVE MEMBERS OF THE ARMED FORCES IN THEASSASSINATION OF SIX JESUIT PRIESTS AND TWO WOMEN ON 16 NOVEMBER ATTHE CENTRAL AMERICAN UNIVERSITY Why It s Hard Many sources of information play a role head noun matches IBM executives the executives syntactic constraints John neiped himseifto 1 4 inn neiped mm to number and gender agreement discourse focus recency syntactic parallelism semantic class world knowledge Why It s Hard No single source is a completely reliable indicator number agreement he a sassination these murders Identifying each of these features automatically accurately and in context is hard Coreference resolution subsumes the problem of pronoun resolution A Machine Learning Approach Classi cation given a description of two noun phrases NP and NP classify the pair as coreferent or not coi eferent coref 2 coref 2 I Queen Elizabeth set about transforming her husband I t not coref Acne Bennett IQ i quot urinwii et ai I994 MvCannv amp Leiinei39t 1 49 5an et ai 200 I i Ig a nrdie 2702 A Machine Learning Approach Clustering coordinates pairwise coreference decisions Queen Elizabeth Queen Elizabeth I b ttr f se a on ans Orrmng Clustering 1 Algorithm n husband not coref a rPnuwned speech lhempm Machine Learning Issues Training data creation Instance representation Learning algorithm Clustering algorithm Supervised Inductive Learning Examples ofN39P pairs features class ML Algorithm novel Pair of NPS Concept description features progmm label Training Data Creation Creating training instances texts annotated with coreference information one instance IVLYMNPI NPj for each pair ofNPs assumption NP precedes NP 39 feature vector describes the W0 NPS arid context 39 class value mref pairs on the sarne coreference man no mref otherwise Instance Representation 25 features perinstance e lexical3 string mateningrurprunuunspmpernameseummun riuuris e grammaticali8 noun demonstrative tne tnis inde nite it is raining number gender animacy appusmve geurge tne Ringlpreuieate numinative a nurse isamammal bindinguristrairits simpleuritraririuairigconstraints spannaxirnainp e sernantic2 ame WuruNEtclass alias e positional 1 istanee oetweerithe NPs interms urxur sentences 7 knowledgebase 1 naive pronoun resolution algorithm Learning Algorithm 0 RIPPER Cohen 1995 045 Quinlan 1994 rule learners input set oftraining instances output coreference classi er Learned classifier input test instance represents pair ofNPs outputcassi caion con dence of classi cation 22 Clustering Algorithm 0 Bestfirst singlelink clustering 7 Mark each NP as belonging to its own class NP E 7 Proceed through the NPS in lefttoright order For each NP NP create test instances imrNP NPJQ for all ofits preceding NPs NP Select as the antecedent for NPj the highestcon dence coreferent NP NPf according to the coreference classi er or none if all have below 5 con dence Merge Cj and Cj Evaluation MUG6 and MUO7 coreference data sets documents annotated wrt coreference 30 30 training texts dry run 30 20 test texts formal evaluation scoring program recall precision Fmeasure 2PRPR System output Problem 1 Baseline Results Coreference is a rare relation skewed class distributions 2 positive instances remove some negative instances Best iiiiuc System farthest antecedent 25 26 Problem 2 Problem 3 Coreference is a discourselevel problem Coreference is an equivalence relation different solutions for different types ofNPs loss oftransitivity Props names string matching and aliasing need to tighten the connection between classi ca ion and clustering r inclusion of hardquot positive training instances positive example selection selects easy positive training P quot e aimed quot155 Wlf the CIUSYe ing39leVe Go e e e we instancescf Harabagiu eta 2000 quot9 quot ction QueenElizabeth set about tmnsforming her i ll i nn coref coref King George VI into a viable monarch Logue Queen Elizabeth set about tmnsforming her husband ii the renowned speech therapist Was summoned to help not coref meleg overcome Comparison with Best MUC Systems Results MUC 6 MUC 7 P F R P F NEGSELECT lPOSSELECT NEG ELECT i POS ELECT 678 552 53 808 64 I w 780 538 NEGSELECTPOSSELEcTtRULESELECT km 769 595 542 763 634 59 72 65 am 688 618 634 763 693 595 561 E72 ElstNUCSYSTlM Ultimately large increase in Fmeasure due to gains in recall Supervised ML for NP Coreference Main Points gopogvpeergfggrtnance compared to other systems but lots ofroorn for c reference 7 Common nouns lt pronouns lt properriouris How to cast as classification Cardie r Tlgh eghc822tgneg erEinr ahsj gcatlon and clustering is possible Measures of String Cohen Edit lfi gl edi l if llgl iil Zl f gf m if ii gm Scaling up McCaIIum et all Wellner 2003 7 Need additional data sets release of ACE data from Penn s LDC ReIation extraction General problem reliance on manually annotated data V th augmented grammar Miller et al 2000 V th joint inference Roth amp Yih Semisupervised Brin Record linkage definition Record linkage terminology Record linkage determine if pairs of data The term recordMaggi is possibly 00 records describe the same entity referent With le nd record pairs that are co referent For DB eople data matching mergepurge Entities usually people or organizations or dupllcate deteCtlonl data Cleansmgl 39 extraction transfer and loading de duping Data records names addresses Job titles birth d For AlML people reference matching database quot39 hardening object consolidation Mam appllcatlons In NLP co referenceanaphora resolution Joining two heterogeneous relations 39 withing ILIL39El39li 1min Lllg39 Removing duplicates from a single relation Finding a technical paper 0 1995 The data integration problem Start with citation iii11ml uui rislilulmn lalMlalu t wmpIIhr aan lolmrl mum IllllV i l hll y al39 f39nllrUHIlA quot Expenencevnm aLeaming Personal Assistan a lit gf TM MitchellR CaruanaD Freitag J McDermott andD Zabowski Communications ofch ACM Vol 37No 7pp 8191uly1994 39s ltlilrurtlmxlli mmlr sririirv dollanlmnr s U min alt Find authors institution w INSPEC Ixsnncl Dl ll n39L39umpui Sci Find web host W NETFlND Z39ulilhrniul39nh nu Dir gn Find author s home page and Lquot 139 quotquot39 L 39 hopefully the paper by browsing liNsmcc DPpl orCunIpul 39 Slniillml Uni LA USA String distance metrics overview Termbased eg TFIDF as in WHIRL Distance depends on set of words contained in both 5 and t Editdistance metrics Distance is shortest sequence of edit commands that transform 5 to t Pair HMM based metrics Probabilistic extension of edit distance Other metrics Jaccard Distance Jaccard Score M Di sun 7 String distance metrics termbased Termbased eg TFIDF as in WHIRL Distance between 5 and tbased on set of words appearing in both 5 and t Order of words is not relevant Eg Cohen Williamquot VWIiam Cohenquot and James Joyce Joyce Jamesquot Words are usually weighted so common words count less Eg Brown counts less than Zubinsky Analogousto FelligiSunter s Method1 String distance metrics termbased Advantages Exploits frequency information Ef ciency Finding t simtsgtkis sublinear Alternative word orderings ignored William Cohen vs 0 en illiam Disadvantages Sensitive to spelling errors Mllliam Cohon Sensitive to abbreviations Univ vs University Alternative word orderings ignored James Joyce vs oyce James City National Bank vs National City Bank String distance metrics Levenshtein Editdistance metrics Distance is shortest sequence of edit commands that transform 5 to t Simplest set of operations Copy character from s over to t Delete a character in 5 cost 1 Insert a character in tcost 1 Substitute one character for another cost 1 This is Levenshtein distance Levenshtein distance example distance V iam Cohen Willliam Cohon 5 WI LLgapl AMCOHEN lll alignment lll tWILLLIAMCOHON OPCCCCICCCCCCCSC 50500001111111122 Computing Levenshtein distance 1 Dij score of best alignment from svi to t1y Diljl ifsitj copy Di1j11 if silaj vubvtitute mm Di1j1 imer Dij11 delefe Computing Levenshtein distance 2 Dij score of best alignment from svi to t1y Di1j1 dsitj vubvfcopy min Di1j1 imert Dij11 delete simplify by letting dcd0 if cd 1 else also let Di0i for i inserts and D0jj Computing Levenshtein distance 3 Di1j1 dsitj vubvfcopy Dijmin Di1j1 inverf poi n1 delefe Computing Levenshtein distance 4 Di1j1 dsitj vubvfcopy Dij min Di1j1 inverf poi n1 delefe A trace indicates Where the min value came from and can be used to nd edit operations andor a best alignment may be more than 1 Dsf 45 NeedlemanWunch distance Diljl vubvfcopy Dij min Di1j imer Dij1 delefe G gap cost dcd is an arbitraty distance function on characters e g related to typo frequencies William Cohen Wukkuin Cigeb ammo ac1 substitutibility etc SmithWaterman distance Instead of looking at each sequence in its entirety this compares segments of all possible lengths and chooses whichever maximise the similarity measure Thus it is a generalization of longest common subsequence For every cell the algorithm calculates all possible paths leading to it These paths can be of any length and can contain insertions and deletions SmithWaterman distance 0 start over Di1 j 1 dsitj vubvtcopy Dij max Di1j G r39mert Dij1 G delete G dc3c 2 dcd 1 SmithWaterman distance Monge amp Elkan s WEBFIND 1996 Internet rust Hislilulwn srirsrlodu mmpmer 5r departmele uniwmhy ol ralifornin nan lingo E bltlulbl39dedu computer science department slanl urd university palu allu California IVSPEC Dept oHZomput Sci falii39m39nin UniV San Diego La Iulla CA USA mspnc Dept ul39Uumpul Sci Stanford l39uiv JA USA Table 1 Example ul39NE39rrlND and INSPEL llclds SmithWaterman distance in Monge amp Elkan s WEBFIND 1996 Used a standard version of SmithWaterman with handtuned weights for inserts and character substitutions Split large text fields by separators like commas etc and found minimal cost over all possible pairings of the subfields since SW assigns alarge cost to large tmnspositiorrs Result competitive with plausible competitors Results SW from Monge amp Elkan 355 7 ms na nus on was any Affine gap distances SmithWaterman fails on some pairs that seem quite similar William W Cohen William W Don t call me Dubya ohen Intuitively single long insertions are cheaper an a lot of short insertions Affine gap distances 2 Idea Current cost of a gap of 11 characters nG Make this cost A n1B where A is cost of opening a gap and B is cost of continuing a gap Affine gap distances 3 Di1j1 dsitj Dij max ISIljl dsitj ITI1j1 dsitj 13ij max Di1j A Best score in which vi 7 Isi391vj39 B is aligned with a ggv Best score in which tj 7 D l A Iil 7 max IrlgaJJJB B is aligned with a gap Affine gap distances as automata JawsB A mm A dam lB Generative version of affine gap automata BilenkoampMooney TechReport 02 HMM emits pairs ad in state M pairs 5 in state D and pairs d in state I For each state there is a multinomial distribution The HMM can tmined with from a sample of pairs ofmatched strings gt Estep is forwardbackward Mstep uses some ad hoe smoothing W Affine gap editdistance learning experiments results Bilenko amp Mooney Jenni hrmc Doll t om mruvuw l y l minimums wealth l mu lhr M mm minim Experimental method parse records into elds append a few key elds together sort by similarity pick a threshold T and call all pairs with distancev t lt T duplicates picking Tto maximize Fmeasure Affine gap editdistance learning experiments results Bilenko amp Mooney Affine gap editdistance learning experiments results Bilenko amp Mooney Mammal Precisionrecall for MAILING dataset duplicate detection 6 n Affine gap distances experiments from McCallumNigamUngar KDDZOOO Goal is to match data like this Fanlman Scott K Lehiere Chri correlation learning arcliitec Tourer A D editor A man s in Neural Information Processing Systems volume 2 pp 524532 San Mateo CA Morgan Kanrmann 39an 1989 The cascade Falilnnan SE and Leliiere The Cascade Correlation Learning Architecture NLPS 2 pp 524 532 Morgan 39 1990 Kaiiimann alilmann s E and Leliiere C 1989 The cascade correlarion learning architecture n Ices in Neural In formation Processing Systems Mpg 2 Denver Colorado Figure 2 Three sample citations to the same paper Affine gap distances experiments from McCallumNigamUngar KDDZOOO Handtuned edit distance Lower costs for affine gaps Even lower cost for affine gaps near a HMMbased normalization to group title author booktitle etc into fields Affine gap distances experiments String distance metrics outline Termbased eg TFIDF as in VVlllRL Distance depends on set of words contained in both 5 and t Editdistance metrics Distance is shortest sequence of edit commands that transform 5 to t Pair HMM based metrics Probabilistic extension of edit distance 39 Other metrics Jaro metric Jaro metric is apparently tuned for personal names 39v st de ne cto be common in st if it sic tjc and i jrltmrnnst2 De ne cdto be a transposition if cdare common and Cd appear in different orders in s and t Jarost average of commons commonI and 05transposirionscommon Variant weight errors early in string more heavily Easy to compute note edit distance is Ost NB Tnis l5 my interpretation ofWinkler s description Jaro metric L L A Li a i c n 1 a n i n i m i ii ii i 1 u n ii i i m n ii ii n a I n i i i n n i c n t lilirsrmvrur n vir lmn rrwlrir Harri er Tl ir mam lunar u wl ll it 4i ril lHl iiiltll ilk Hill in A1lllllquotr in be ill rtluiiriui iiil l i iiiii gmiini i llll r i ifquot Tw39 m winnioti ii i and a39 Iquot no ufrh e no nitranspnsitions rnr 5 and r Soundex metric Soundex is a coarse phonetic indexing scheme widely used in genea og Every Soundex code consists ofa letter and three num e eg 36 for Bender The letter is always the rst letter ofthe surname The numbers hash together the rest ofthe na Vowels are generally ignored eg Lee Lu gt LOOO Later later conson 39 ame are ignored Similarsounding letters eg B P F differen iated nor are doubled letters V are not There are lots of Soundex variants Ngram metric Idea split every string 5 into a set of all character n grams that appear in s for nltk Then use tenn based approaches eg COHEN gt 00HENC00HHEENCOHOHEHEN For n4 or 5 this is competitive on retrieval tasks It doesn t seem to be competitive with small values ofn on matching tasks but it s useful as a fast approximate matching scheme Main Points Coreference How to cast as classification Cardie Measures of string similarity Cohen Scaling up McCallum et al Relation extraction V th augmented grammar Miller et al 2000 V th joint inference Roth Semisupervised Brin Reference Matching ahlrnan Scott amp Lebiere Christian 1989 The cascadencorrelation learning architecture in Touretzky D eortor Advances rn Neural information Processing Systems volume 2 pp 524532 San Mateo CA Morgan Kaufmann n S E and Leblere C Tne Cascade Correlation Learning Architecture NlPS Vol 2 pp 524632 Morgan Kaufrnann1990 ahlrnan s E teenrne recurrent cascadencorrelationlearning architecture in Lipprnan R P Moody J E and Touretzky D s editors NlPS 31907205 The Citation Clustering Data Over 1000000 citations About 100000 unique papers About 100000 unique vocabulary words Over 1 trillion distance calculations The Canopies Approach Two distance metrics cheap amp expensive First Pass very inexpensive distance metric create overlapping canopies Second Pass expensive accurate distance metric canopies determine which distances calculated Illustrating Canopies Overlapping Canopies Creating canopies with two thresholds Put all points in D Loop Pick a point X 39om D Put points within Kloose ofX in canopy Remove points wi hin Ktight ofX 39om D 9 Using canopies with Greedy Agglomerative Clustering Calculate expensive distances between points in the same canopy All other distances default to infinity Sort nite distances and iteratively merge closest Computational Savings inexpensive metric ltlt expensive metric canopies per data point f small but gt 1 number of canopies 0 large complexity reduction The Experimental Dataset All citations for authors Michael Kearns Robert Schapire Yoav Freund 1916 citations 121 unique papers Similar dataset used for parameter tuning Inexpensive Distance Metric for Text Wordlevel matching TFIDF Inexpensive using an inverted index Extracting Fields using HMMs Fahlman SE and Lebiere C The Cascade Correlation Learning Architecture NIPS Vol 2 pp 524532 Morgan Kaufmann Author Faniman s E and Leoiere c Title The Cascade Correlation LeamingArchitecture Venue NiPS Year 1990 Expensive Distance Metric for Text String edit distance Compute with Dynamic Programming Costs for character 5 insertion deletion c substitution 0 t t do F alilman vs Falman Experimental Results F1 Minutes Canopies Complete Existing Add precision recall along side F1 Main Points Coreference How to cast as classification Cardie Measures of string similarity Cohen Scaling up McCallum et al Relation extraction With augmented grammar Miller et al 2000 V th joint inference Roth amp Yih Semisupervised Brin Information Extraction Named Entity Recognition I P ts soared a1 Boeing Co easily topping forecasts on Wall Street as their CEO Alan Mulally announced rst quaner results UT Pro ts soared at iCornpany Boeing Co easily topping forecats on iLooarion Wall Sues i as their CEO iPerson Alan Mulallyj announced rst quarter results Relationships between Entities INPUT Boeing is located in Seattle Alan Mulally is the CEO OUTPUT Relaiorship Companylocaion Relationship EmployerEmployee Co 39 Employer Boein Co rnpany oeing Location Semiei Employee Alan Mulally Main Points Summary Goal build a parser that reooyers syntactic structure named entities descriptors and relations Corefe rence Annotation mark entity boundaries descriptor boundaries links between HOW to mg as 93935339f39 a 9 Card39e entities and descriptors Measures of string similarity Cohen 0 Enriched parse trees given annotation and a parse tree form a new scallng Up Mccallum et al enriched parsetree The statistical model nonaterminals now include syntadic caegory Relat39on eXtraCt39on semantic label headword headtg Rule probabilities are estimaed Lsing o vmtn augmented grammar Miller et al 2000 similar methods to syntactic parsers With jomt inference Roth amp Yih Results precision 81 recall 64 in recovering relaions o Semisupervised Brin W 98 1 Association With Graphical Models 1 Association With Graphical Models Capture arbitrarydistance R hampw 2397quot Also capture long distance R hampw 2397quot dependencies among M M dependencies among M M predictions Random mam Predictwns gt Random variable E overthe class of overthe class of l relation between relation between enutytzandtl enutytzandtl mm W 1 assert anon mate 39 39 39 melt overthe class of overthe class of quotHD arm 1 eg over arm 1 eg over person iocatron ll person iocatron mm P Local language new Local language models contribute 6 models contribute evidenceto relation evidenceto relation classification a classification Local language not L Immune models contnbute Dependencies between classes models contribute Dep d ncies between classes meme emW of entities and relations meme emW of entities and relations classification Z classification Pllt xi 1 Pllt Xi e o e W inference with loopy belief propagation 39 inference with loopy belief propagation m 1 Association With Graphical Models Main points Roth ampYih 2002 Also capture long distance s dependencies amon MW Predic i quots39 Random variable Coreference overtheclassof relationbeMeen How to cast as classification Cardie l murder arm 1 eg over person locatton in C entity 2 and 1 I I I I b o533 quot39 Measures of string slmllarlty Cohen n Local language 39 Scaling up McCallum et al iocagnge models contribute eliidence to relation classificatton Relation extraction V th augmented grammar Miller et al 2000 modeiscontnbute Depe ncies between classes 0 V th joint inference Roth amp Yih mgglcceagre rmtv mm of entltles and relations S d B my emisuperVIse rin ii 0 lm nw inference with loopy belief propagation 0 Answer Following procedure 1 39 alu middle 2 For any group Set urlrpre x to be longest common pre x of the p39s URLs pre x to be the longest common pre x of the group u ix to be th e longest common suf x 3 For each group39s patmm calculaue rts specr crty as Sprl p Tllmlrldt HharmMllmttlltrrff where n is the number of examples in the group rl is the length of r in chamcmrs 4 1r specr crty exceeds some threshold include the patmm s Else It all pattems occur on same webpage reject the patmm 5 Else create new subgroups grouped by characters in the urls which m g nre erre subgroups The Overall Algorithm 1 Use the seed examples to label some data 2 Induce patterns from the labeled examples using method described on the preVious slide 3 Apply the patterns to data to get a new set of authortitle pairs 4 Return to step 2 and iterate DIPRE Inducing Patterns from Data The patterns found in the rst iteration www sff netlocusc ltLlgtltE aneE byenrhmlt dnsc1ty t h dolphrn textssfawardhtm author H nrlel o The 5 seeds produced 199 labeled rns nces grvrng the 3 patterns above Applyrng the three pattems gave 4047 new book rns nces 39 quotquot 3972 me e books This gave 105 patmms 24 applied to more than one URL Applied to 2 million URLS produced 9359 unique authortitle parrs 0 Manual innervention removed 242 bogus imms where the author was Conclusion 0 Final iteration mn over 156000 documents which con ined the word books induced 346 patterns 15257 authortjt1e pairs 1 Association using Parse Tree Simultaneously POS tag parse extract amp associate where l 1 s A l l l mitt This is alsoagreat example Fquot r rquot T f TM f of extraction using a tree madel um W r rt aa r we am on clnrlaim Miller et a 2mm lnemse space M parse eunstnutes In Include emrty and relatrun tans eh hmd cans ent caedmy mnm u mnslnuem eatean x m parent nude postal wnm mu mennumnnunnr rarerrm permum Pwmlemlmlpuwhj rtmmeperlmperlmnrvbdrard n lll lllVl 7 gr maximum likelihood smoothing EM if unsupervised Incomplete data max p6data max p6i data p6pdata0 really so altemative ngrams FSMs L store na39I39ve Bayes clustering word senses quot PCFGs CKY Earley clustering collocations really so altemative Probabilistic BEVOIIIIiOII III Really a BEVOIIIIiIIII critics Sail Logprobabilities no more than scores in disguise We re just adding stuff up like the old corrupt regime did admits spokesperson Note Just in probability bigger weights are better ngrams log pw7 wSw6 ogw8 W6 W7 PCFG log plP VP 5 log pPapa NP log pVP PP VP can EStimate paramemrs aUtomaticaIy log pt7 15 t6 log pw7 t7 NOiSY Channel log psource log pdata source M reSUIts are more meaninngI Na39l39ve Bayes ogCIass ogfeature1 Class ogfeature2 Class m results can be meaningfuny combined gt moduarityg F x gt 7 r 1 3 Contains 7 391 11 02 ex 5 Car wires 7 0n this 7 X more llkel than in ham ey independent Can adjust scores to compensate for feature overlap 10g p B Juse 0 0 ew 3 0 1 56 85 23 independent pfeats spam exp 577 3205 7 120 ta 3 m is the email message 0 xi is weight of feature i fimE01 according to whether m has feature i allow mun 120 is a normalizing factor making 739s 1 m1 log pfeats spam 577 pfeats spam exp 577 3205 But So instead t of this feature is log pspam a constant old spam model s weight for contains BUY 6 weight of this feature is log pling a constant V 39 old ling model s weight for contains Buy Question Question Question OUCH 0025 0025 0025 0025 0025 0025 0025 Entropy 051 log 051 0025 log 0025 029 log 029 Largest if probabilities are evenly distributed Amazing Theorem 0 To solve the maxent problem we use Lagrange multipliers L 7 21 10mm 7 2939 Zlilxl NR 7 nu 7 lt21 71 lug 1Xl 7 5 Mxl 7 I 11 X HHL in tit3W 26 1 Zcxp x IXlHJ 7 110011 ZWKS c So feature constraints l maxent implies exponential family 0 Problem is convex so solution is unique 0 D Z ix lupplxlll Z nix EMxx lug ZJgt nixZfx 7 Nina Zia r x UK Z Hwyxx a A39Zprximixi Iilx y 2 ZMXWWX Z MX meiixi 7 39 lngZUI x Derivative of log partition function is the expectation of the feature At ML estimate model exectations match emincal feature counts 139 Siilijwl In 111 j39I39J7rilvi mul me39l 1 Define two submanifolds on the probability simplex fx particular set of features j39x The second is N1 the set of They intersect at a single distribution 1 the maxent maximum likelihood quotii FlliIl Z llquot39jquotll7l I39JnlJED EH10 Z ZI39HW I39 39ltquot39U39J Classification amp Information Theory Lecture 5 Introduction to Natural Language Processing CMPSCI 585 Fall 2004 University of Massachusetts Amherst Aron Culotta Slides courtesy of Andrew McCallum Document Classification by Machine Learning JD Temporal reasoning for 7 planning has long Testing y been studied formally Document V We discuss the semantics of several planning C afegnries Machine L earning 39 39 1 Training 39 data Garbage quotNeural networks quotPlanmj g based on colleenm for quotMultimedia quotUser and other machine Wlth t mporal the semantics Strongly typed streaming studies learning methods reasom g of program languages u h Video for of GUI v hasbeen dependence Work out Na39ive Bayes formulation interactively on the board Recipe for Solving a NLP Task Statistically 1 Data Notation representation 2 Problem Write down the problem in notation 3 Model Make some assumptions de ne a parametric model 4 Inference How to search through possible answers to find the best one 5 Learning How to estimate parameters 6 Implementation Engineering considerations for an ef cient implementation Engineering Components of a Na39ive Bayes Document Classifier Split documents into training and testing Cycle through all documents in each class Tokenize the character stream into words Count occurrences of each word in each class Estimate Pwc by a ratio of counts 1 prior For each test document calculate Pcd for each class Record predicted and true class and keep accuracy statistics A Probabilistic Approach to Classification Na39ive Bayes Pick the most probable class given the evidence c argmaxcj Prcj Id Cl a class like Planning d a document like language intelligence proof Bayes Rule Na39ive Bayes Me i d com I an z Cii PM W Pfd 2PrcrfPrwd 10k Wd the i th word in d like proof Estimate of Pc Pcj Estimate of Pwc Parameter Estimation in Nai39ve Bayes 13wilcj 1C0untd E cl IC 2Countd ck k 1 2Countwdk dkEcl V IV HE 2Countwdk i1 dkEci Programming Assignment 2 Help Small number Prc Id o Prc7f Prwdl IC 1 Id logPrcj d 0c logPrcj 210gPrwdf c7 oTo get back to Prcj Id Subtract a constant to make all positive eXPO Common words in Tom Sawyer 71370 words Frequencies of frequencies in Tom Sawyer Word Frequency of 71730 word tokens 1 3993 8018 word types 2 1292 3 664 4 410 5 243 6 199 7 172 8 131 9 82 10 91 1 150 540 51100 99 gt100 102 mm Em U5 the 3332 determiner article and 2972 conjunction a 1775 determiner to 1725 preposition verbal in nitive marker of 1440 preposition was 1161 auxiliary verb it 1027 personalexpletive pronoun in 906 preposition that 877 complementizer demonstrative he 877 personal pronoun 783 personal pronoun his 772 possessive pronoun you 686 personal pronoun Tom 679 proper noun with 642 preposition Ziph s law Tom Sawyer Word Freq Rank fr f r the 3332 1 3332 and 2972 2 5944 a 1775 3 5235 he 877 10 8770 but 710 20 8400 be 294 30 8820 there 222 40 8880 one 172 50 8600 about 158 60 9480 more 138 60 9480 never 124 80 9920 Oh 1 16 90 10440 two 104 100 10400 Ziph s law Tom Sawyer Word Freq Rank f r f r turned 51 200 10200 you ll 30 300 9000 name 21 400 8400 comes 1 6 500 8000 group 13 600 7800 lead 1 1 700 7700 friends 10 800 8000 begin 9 900 8100 family 8 1000 8000 brushed 4 2000 8000 sins 2 3000 6000 ould 2 4000 8000 Applausive 1 8000 8000 Zipf s law f lt r In other words there is a constant k such that frk Zipf s Law and the Brown Corpus 13mm man man in ma mun mo innnun What is Information Information Theom The sun will come up tomorrow Greenspan was shot and killed this morning Ef cient Encoding Entropy of a Random Variable l have a 8 sided die How many bits do I need to tell you what face ljust rolled My 8 sided die is unfair P105 P2o125 P3P800625 Average length of message needed to transmit the outcome ofthe random variable First used in Data compression 7 ransmlsslon rates overnolsy channel Coding Interpretation of Entropy Entropy and Expectation Given some distr bution over events PX Recall What is the average number of bits needed to EX SXEWX pX encode a message a event string sequence Then Entropy 0f F Xi E092PX SXEXM39IOQ2pX 39 PX HX HltpltXgtgt Q pltxgtlog2ltpltxgtgt XEX Notation HX HpXHpHXp HPx tr the entropyofafarrcorn Ararrsz rded dre7 tr the ehtropyorah unfarrcorn that alway corhe uphead 7 hatr the entropyofan unrarre rded drethataivvay 12 on vr Upper and lower bound Prove lower bourr Entropy of a coin Entropy intuitively we High entropy chaos fuzziness opposite of order Comes from physics Entropy does not go dovrm unless energy is used Measure of uncertainty High entropy a lot ofuncertainty about the outcome uniform distribution over outcomes quoty 7 Low entropy high certainty about he outcome Claude Shannon Joint Entropy and Conditional Entropy Claude Shannon Two random variables X space Via Y Y 1916 39 2 01 Joint entropy creator 0f Informat39on Theory no big deal XY considered a single event H X Y S S I 39 Lays the foundation for XEW veY Nva 092 P09 im lementin Io icin di ital p g g g Conditional entropy circuits as part 0 rs Masters Thesis 1939 HXIY SXENSyEY PX1Y092 PXY H recall that HX E og px A mathemancal Theory Of weighted average andzweights are not conditional Communicationquot 1948 How much extra information you need to supply to transmit X given that the other person knows Conditional Entropy another way HY i X 2pxHY i X x 2 p 2 py iX10g2py i x x y 22pltxgtplty ixgtlog2ltplty ix x y 22pxiy10g2py ix x y Chain Rule for Entropy Since like random variables entropy is based on an expectation HX Y HXY HX HX Y HYX HY Cross Entropy What happens when you use a code that is suboptimal for your event distr bution created my code to be ef cient for a fair 8sided die But the coin is unfair and always gives 1 or 2 uniformly How many bits on average for the optimal code How many bits on average for the suboptimal code HUM Epltxgtlog2ltqltxgtgt xEX KL Divergence What are the average number of bits that are wasted by encoding events from distrbution p using distribution q 01 q Hpiq HQ Epltxgtlog2ltqltxgtgt Epx10g2px ex ex Emuw ex 120 of distance between distributionspand q but it iS not 5 rnrnetrici it does not satisfy the triahgie inequaiityi Mutual Information Recall Hgtlt average oits rorrneto teii you which event occurred rrorn distribution Pgtlt Now rst i teii you event yE v HWY average oits necessaryto tell you which event occurred from distribution P X 7 By now rnany oits does knowledge or v iowerthe entropy ofX IXY e HXeHX i Y e HX Hme HXY Epltxtlogz 1 Epylogz 1 pm ma meiy ogz pm y may e 1 2 0g pxpv Mutual Information Symmetric nonnegative Measure ofindependence e igtltY 0 when x and v are independent 7 igtltY grows both with degree or dependence and entropy ortheyanaoies Sometimes also called information gainquot Used o en in NLP e ciustenng words 7 word sense disarnoiguation e feature seiection Pointwise Mutual Information etween two random vanab es Could also measure mutual information between two even s Previously measuring mutual information my Y 1 p7 x y 0g pXpy Collocations Lecture 5 Introduction to Natural Language Processing CMPSCI 585 Fall 2004 University of Massachusetts Amherst Andrew McCaIIum Words and their meaning Some upcoming lectures Word disambiguation one word multiple meanings Word clustering multiple words same meaning Collocations this lecture multiple words together different meaning than than the sum of its parts Simple measures on text yielding interesting insights into language meaning culture Today s Main Points What is collocation Why do people care Three ways of finding them automatically Collocations An expression consisting of two or more words that correspond to some conventional way of saying things Characterized by limited compositionality compositional meaning of expression can be predicted by meaning of its parts strong teaquot rich in calciumquot weapons of mass destructionquot kick the bucketquot hear it through the grapevinequot Collocations important for Terminology extraction Finding special phrases in technical domains Natural language generation To make natural output Computational lexicography To automatically identify phrases to be listed in a dictionary Parsing To give preference to parses with natural collocations Study of social phenomena Like the reinforcement of cultural stereotypes through language Stubbs 1996 Contextual Theory of Meaning In contrast with structural linguisticsquot which emphasizes abstractions properties of sentences Contextual Theory of Meaning emphasizes the importance of context context of the social setting not idealized speaker context of discourse not sentence in isolation context of surrounding words Firth a word is characterized by the company it keeps Example Halliday strong tea coffee cigarettes powerful drugs heroin cocaine Important for idiomatically correct English but also social implications of language use Method 1 Method 1 Frequency Frequency WIth POS FIlter AN NN AAN ANN NAN NNN NPN 80871 01 the 11 In the 11487 New York A N 26430 to the 7261 United State A N 21842 on the 5412 Lo Angele A N 9 lor the 3301 la 1 year N N 18568 and the 3191 Saudi Arabia N N 16121 that the 2699 la 1 eek A N m0 at the 2514 vice pre idem A N 15494 to be 2378 Per ian Gull A N 13899 In a 2161 an Franci co N N 13689 01 a 2106 Pre idem Bu h N N 13361 y the 2001 Middle Ea 1 A N 3183 with the 1942 Saddam Hu ein N N 12622 lrom the 1867 Soviet Union A N 11428 New York 1850 White Hou e A N 10007 he aId 1633 United Nation A N 1328 oi price N N 1210 next year A N 1074 chiel executive A N 1073 real e tale A N Method 2 Method 2 Mean and Variance Mean and Variance Sentence39 Some collocations are not of adjacent words but words in more flex ble distance relationship she knocked on his door they knocked at the door 100 women knocked on Donaldson s door a man knocked on the metal front door Not a constant distance relationship But enough evidence that knock is better than hit punch etc Stocks crash as rescue plan teeters Timeshifted bigrams 3 stocks crash stocks as stocks rescue crash as crash rescue crash plan as rescue as plan as teeters To ask about relationship between stocks and crash gather many such pairs and calculate the mean and variance of their offset H ll eu iamx s 7 Method 2 Mean and Variance Position of strong versus opposition mean115 deviation067 Method 2 Mean and Variance Position of strong versus support mean145 deviation107 Method 2 Method 2 Mean and Variance Mean and Variance d mean count Word1 Word2 043 097 11657 New York 048 183 24 previous games 015 298 46 minus points 049 387 131 hundreds dollars 4 03 044 36 editorial Atlanta 4 03 000 78 ring New 3 96 019 119 point hundredth Position of strong versus for mean112 deviation215 3 96 029 106 subscribers by Method 3 Method 3 Likelihood Ratios Determine which of two probabilistic models is more appropriate for the data H1 hypothesis ofmodel 1 H2 hypothesis ofmodel 2 L H likelihood iatio log LEHiO Hypothesis 1 pw2w1 p pw2w1 Hypothesis 2 pw2w2 p1 p2 pw2w1 Data N total count of all words c1 countofword c12 count ofbigram word1word2 Likelihood Ratios Determine which of two probabilistic models is more appropriate for the data H1 p2c2c1 2Nc1 are w1V2 39L H z i v 7 likeliilrmil rmU in A115 7 1m j I 1133 Method 3 Likelihood Ratio example data Mg Q m M W2 1291 12593 932 150 most powerful 99 379 932 10 politically power Jl 82 932 934 10 power Jl computers 80 932 3424 13 power Jl force 57 932 291 6 power Jl symbol 51 932 40 4 power Jl lobbies 51 171 932 5 economically powerful 51 932 43 4 power Jl gnet 50 4458 932 10 less powerful 50 6252 932 11 very o rful 49 932 2064 8 power Jl position 48 932 591 6 power Jl machines 47 932 2339 8 power Jl computer 43 932 396 5 power Jl magnets Collocation studies helping lexicography Want to help dictionarywriters bring out differences between strong and powerful Understand meaning ofa word by the company it keeps Church and Hanks 1989 through statistical analysis concluded that it is a matter of intrinsic vs extrinsic strong supportfrom a demographic group means committed but may not have capability powerful supporter is one who actually has capability to change things But also additional subtleties helps us analyze cultural attitudes strong tea versus powerful drugs tho strong versus powerful w Cstrong w m Cpoweriul w Uppon 50 force 13 afeiy 22 computer 10 a1e 21 p0 mom 5 oppo mon 19 men 5 nowmg 15 computer 5 en e 15 man 7 a 15 5 defen e 14 mmtary 5 1 13 country 5 111 m 1 eap n 5 p0 1b1hty 11 o 1 5 feehng 11 peop1e 5 demand 11 force 5 cnaHenge 11 amp 5 n 11 nge 11 anon 5 10 Germany 5 upponer 1o enator 4 1gna1 9 ne1gnbor 4 Likelihood Ratios across different corpora from different times ModeH modeiforNYTime51989 Rzlin 1 U U24 Kanm 1 U U37 E351 Eevhnevs U U37 M155 Manners U U39 17 earthquake U U41 HUD ma 1 U EIAE E351 Ge U U51 Prague Spnng 1989 Mushm denc She1kAbduiKr1m Obe1d abducted d151ntegrat1on ofcommumst Eastem Europe scanda1 1n HUD October 17 earthquake 113 San Franmsco M155 Manners no 1ongercamed by NYT1me51n o

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.