Introduction to Natural Language Processing
Introduction to Natural Language Processing CMPSCI 585
Popular in Course
Popular in ComputerScienence
This 14 page Class Notes was uploaded by Roman McCullough on Friday October 30, 2015. The Class Notes belongs to CMPSCI 585 at University of Massachusetts taught by Staff in Fall. Since its upload, it has received 11 views. For similar materials see /class/232275/cmpsci-585-university-of-massachusetts in ComputerScienence at University of Massachusetts.
Reviews for Introduction to Natural Language Processing
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/30/15
Probabilistic Parsing in Practice Lecture 15 Introduction to Natural Language Processing CMPSCI 585 Fall 2004 Andrew McCallum including slides from Michael Collins Chris Manning Jason Eisner Mary Harper i ld eW McCalum UMaSS Today s Main Points 0 Training data 0 How to evaluate parsers 0 Limitations of PCFGs enhancements amp alternatives Lexicalized PCFGs Structure sensitivity Leftcorner parsing Faster parsing with beam search Dependency parsers 0 Current state of the art Treebanks labeling data i ld eW McCalum UMaSS 6 6 o ills Treeban ks 0 Penn Treebank 0 Trees are represented via bracketing o Fairly flat structures for Noun Phrases NP Arizona real estate loans Tagged with grammatical and semantic functions SBJ LOC 0 Use empty nodes to indicate understood subjects and extraction gaps o NPoverNP structure 0 Wrong based on linguistic theory 0 Based on chunkAbney 1991 view Treeban ks Pure Grammar Induction Approaches tend not to produce the parse trees that people want Solution 3 Give a some example of parse trees that we want 6 Make a learning tool learn a grammar Treebank a A collection of such example parses a PennTreebank is most widely used S NPSBJ The move VP followed NP NP a round PP of NP NP similar increases PP by NP other lenders PP against NP Arizona real estate loans SADV NPSBJ VP reflecting NP a continuing decline PPLOC in NP that market Nmnmem WWW syxv mumsmmw Treebanks Nenypeeuenave awgued mam tens tnhave hngusts cansnucnng ueetanks man gwammaAs EEausemseaswew m mark nutmemnelftmrse wfsenlenues man minm melamme mam Dmsmemzmieslauansafa Dem rme argammaum mnsmaare Treebanking Issues Type ufdata Task dependem newspapeuuuma s nuve sJechnma meme dwa ugs emu s ze Themuveme bene R esuuvcerhmned Parse representater DependencyvsPavsetvee A nbmes Whath encude7wuvd5 mmphu ugy symax semamms R eevenceampbuukkeepmg da eume whu dwdwhat Organizational Issues Team 1 Team eadev buukkeepmghmng 1 Gmdehne persun 1 ngmshmssuespevsun l Annmamvs 172 Technma sta pmgvammmg 2 Checkmg persuns Deume annutatmn fpusswb e Treebanking Plan The mam pumts after gemng fundmg P anmng Easegmeexmeseevexepmem Annmanen ampgmdehnes ve nemem Censmeney checkmg gmdehnes nahzauun Packagmg and msmbmmn Twme needed unthe uvdev e12 years p211 rmHmn Wuvds un yabuu 1GuHhe ma e umsannmatmn Parser Evalua39 n Evaluation Ultimate goal isto build system for IE QA MT People are rarely interested in syntactic analysis for its own sake Evaluate the system for evaluate the parser For Simplicity and modularization and Convenience Compare parses from a parser with the result of hand parsing of a sentencegold standard What is objective criterion that we are trying to maximize Evaluation Tree Accuracy Exact match It is a very tough standard But in many ways it isa sensible one to use PARSEVAL Measures For some purposes partially corred parses can be useful Originally for nonstatistical parsers Evaluate the component pieces of a parse Measures Precision Fiecall Crossing brackets Evaluation Labeled Precision How many brackets in the parse match those in the correct tree Gold standard Labeled Recall How many of the brackets in the correct tree are in the parse Crossing brackets Average of how many constituents in one tree cross over constituent boundaries in the other tree Bl BZ B3 B4 Problems with PARSEVAL Even vanilla PCFG performs quite well It measures success at the level of individual decisions You must make many consecutive decisions correctlyto be corrred on the entire tree Problems with PARSEVAL 2 Behind story The structure of Penn Treebank Flat a Few bradltels a Low Crossing brackets Troublesome bradltels are avoided a High PrecisionRetail The errors in precision and recall are minimal n some cases wrong PP attachment penalizes Precision Recall and Crossing Bracket Accuracy minimally On the other hand attaching low instead of high then every node in the rightbranching tree will be wrong serious harm to Precision Recall and Crossing Bracket Accuracy w w WV mm mm t w W m n wumdu W t WWW n w in an m m r u w m M NH Evaluation Do PARSEVAL measures succeed in real tasks Many small parsing mistakes might not affed tasks of semantic interpretation Bonnema19961997 Tree Accuracy d the Per er 2 Correct Semantic interpretation we Hermajakob and Mooney 1997 Engli hto German ran lation At the moment people feel PARSEVAL measures are adequate for the comparing parsers Lexicalized Parsing Limitations of PCFGs o PCFGs assume Place invariance Context free Prule independent of n words outside span also words with overlapping derivation Ancestorfree Prule independent of Nonterminals above 0 0 Lack of sensitivity to lexical information 0 Lack of sensitivity to structural frequencies Lack of Lexical Dependency Means that PVP gt V NP NP is independent of the particular verb involved but much more likely with ditransitive verbs like gave He gave the boy a ball He ran to the store The Need for Lexical Dependency Probabilities dependent on Legtltical words Problem 1 Verb suboategorization VP expansion is independent ol the choice ol verb However lnduding actual words inlormation when making dedsions about tree strudure is necessan Weakening the independence assumption of PCFG Probabilities dependent on Legtltical words Problem 2 Phrasal Attachment Lexiwl content ol phrases provide inlormation lor decis39on Syntactic categony ol the phrases provide very little inlormation Standard PCFG is worse than ngram models K r anm W En rm vurrvruumm Trimt m5 Another case of PP attachment ambiguity a s N39NS VP PP workers VBD N IN N l l l dumped NNS into DT NN l l l sacks a bin Another case of PP attachment ambiguity b s l N39NS VBD NP l workers dumped NPPP Another case of PP attachment ambigwty uls R es s gtNPVP s gtNPVP NP gtNNS NP gtNNS VP ivppp NP inpp VPAVEDNP VPaVEDNP NP wNNS N PP gtiNNP PP ANN a NP gtDTNN b NP gtDTNN s 4 workers a workers VED gt dumped VED A dumped NNSHszcks NNSesseks N e into N e into DTaz mes liliebiri liliebiri If PN39PgtN39PPP i N39P gt PVPgtVPPP i VP then b is more probable else a is more probable A case of coordination ambiguity ere have identical rules and therefore identical probability under any asignmeni or pcrc rule pmlnbilities Weakening the independence assumption of PCFG Probabilities dependent on Lexical words lution Lexiwlize CFG Each phiasal node with its head word s N 39 wv rnTi Wm W A m mull P miller l Background idea Slrmglemd dependencle between nead andiher dependent Heads in ContextFree Rules Add annotations specifying the head ofeakh rule Note ssemenee VP phrase DTdelerrniner V lNpreposiriou erb phrase Nkuouii phrase PPprepositioual intransitive verb Vmansitive verb NNnoIIni More about heads Each contextrfree rule has one special child that is the head ofthe in1e eg S gt N39P VP VP is the head VP 4gt Vt NP Vt is the head N39P gt DT NN NN NN is the head a A core idea in linguistics Xebar Theory HeadeDriven Phiase Structure Grammar 0 Some intuitions The central subeconstituent ofeach rule The semantic predicate in each rule Rules which recover heads Example rules for NPs If the mle contairs NN NNS or NNP Choose the rightmost NN NNS or NNP Else It39the rule contains anN39P Choose the le most N39P Else Ifthe rule contains aJJ Choose the rightmost JJ ElseIt39the rule contains a CD Choose the rightmost CD Else Choose the rightmost child e 1 gt DT NN39P NN NP DT NN NN39P N39P gt N39P PP N39P DT JJ N39P gt DT Adding Headwords to Trees Squestioned N39Plawyer VPquestioned DT NN Vt N39Pwitness l me my questioned DT NN l l the witness a A constituent receives its headword from its head child 5 receives headword fromVP VP ieceives needwcid flomVl N39P ieceives needwcid flom N39N Adding Headtags to Trees 5qnesn39cned Vt miawya N39N VPqnesn39cned Vt DT NN Vt mwienessN1I the lawyer l questioned Dr NN Also propagate partofspeech tags up the trees Explosion of number of rules New rules might look like VPgave v Vgave NPman NPbook But this would be a massive explosion in number of rules and parameters Lexicalized Parsing with smoothing Lexicalized parsing Charniak1997 I A very simple conservative model of lexicalized PCFG I Probabilistic conditioning is topdownquot but actual com putation is bottomup Slose NPpo ts VPose J corporate NNS pro ts Vlose corporate pro ts rose Charniak1997 Generate head then head constituent amp rule Sme h pro ts C NP 17h rose 17C S NP VProse A PhlnhCnC PrlhCnC 3059 Srose NPpro ts VProse NPpro ts VProse A A J NNSpro ts A A hhead word chead consituent phparent head word parent head constituent Smoothing in Charniak1997 13hlvh6vc A1 PMtghlvhCvc A2ePMLEththCnC A3 PMLEl llClC MlelpMLEll llC l Alia is here afunction of how much one would expect to see acertain occurrence given the amount oftraining data word counts etc I Cph is semantic class of parent headword I Techniques like these for dealing with data sparseness are vital to successful model construction Charniak 1997 smoothing example PprftlroseNPS PcorplprftJJNP 0 0245 Phlphcpc Pthphcpc 000352 00150 Phlcpc 0000627 000533 PU ilc 0000557 000418 I Allows utilization of rich highly conditioned estimates but smoothes when suf cient data is unavailable I One can tjust use MLEs one commonly sees previously unseen events which would have probability 0 Charniak 1997 Rule probability with similar smoothing Prlhhcpck1ePrlhhcpc k2ePrlhhc k3ePrlChhc M6PrthaP0 k5ePrlhc Sparseness and the Penn Treebank I The Penn Treebank 7 1 million words of parsed English W5 7 has been a key resource because ofthe Widespread reliance on supenised learning I But 1 million words is like nothing B 965000 constituents but only 66 WHADJP of which only 6 aren t how much or how many but there is an in nite space ofthese howcleveroriginaIincompetent at risk assessment and evaluation I Most ofthe probabilities that you would like to compute you can t compute Sparseness and the Penn Treebank I Most intelligent processing depends on bilexical SlallS39 tics likelihoo s of relationships between pairs ofwords I Extremely sparse even on topics central to the W5 D stocks plummeted 2 occurrences D stocks stabilized l occurrence o stocks skyrocketed 0 occurrences D stocks discussed 0 occurrences So far there has been very modest success augmenting the Penn Treebank With extra unannotated materials or using semantic classes or clusters cf Charniak 1997 Charniak 2000 a as soon as there are more than tiny amounts of annotated training data Lexicalized Markov out from head Collins 1997 Markov model out from head I Charniak l 997 expands each phrase structure tree in a single step I This is good for capturing dependencies between child nodes I But it is bad because of data sparseness I A pure dependency one child at a time model is worse I But one can do better by in between models such as generating the children as a Markov process on both sides of the head Collins 1997 Charniak 2000 Modeling Rule Productions as Markov Processes Step 1 generate category of head child StoldV6 4 StoldV6 l VPtoldV6 PMVP l 5 told V6 Modeling Rule Productions as Markov Processes a Step 2 generate left modi ers in a Markov chain Modeling Rule Productions as Markov Processes 0 Step 2 generate left modi ers in a Markov chain StoldV6 VPtoldV6 ll StoldV6 N39PHillaryNN39P VPtoldV6 P39VPl S told V6gtltRNPHillaryNN39P l SVRtoldV6LEFI SmldV 6 7 7 NPHillaryN39NP VPloldV 6 ii StoldV 6 N39PyesterdayN39N NPHillaryN39NP VPtoldV 6 P wpi s told V6 gtlt HNPI 11ayNNP i SVPmldV6LEF1quotgtlt R1N39PyeserdayNN i SVPtoldV6LEFT Modeling Rule Productions as Markov Processes a Step 2 geuemte left modi ers in a Markov chain Stold V5D NPHillzryNNP VPlttoldV5 L Stold V5D 77 NPQIesterdzy N N STOP NPwesteidsyNN NPHi112ryNNP VFtold V5 5 LEFTgtlt nva i S told V5 gtlt Hi NRl lillzry mm S VPtold V VF told letLEFD I71NPyesterda1NN S VP told WSLLEFT gtlt izdsmP i s A Re nement Adding a Distance Variable A 1 ifpositiou is adjaceutto the head StoldV6 VPtoldV6 U StoldV6 N39PHillaryNN39P VPmldV6 1 VP s tcld V6 DX Pd39N39PHillaryNN39P SVRtoldV6LEFI A l Modeling Rule Productions as Markov Processes Step 3 geuemte right modi eis in a Nlarkov chain Stold V5D STOP NRyeSterdzyNN NRHillamNNm VFtold V5 77 Stold V5D STOP NPwesteidsyNN NPltHi112ry NNP VFtold V5 STOP PMij i S told V5 gtlt RNNPG lillzrynNNP S VP told WSLLEFT gtlt R NPltyesterdzyjNN SVPtoldW LLEFT gtlt PASTDP SVPtoldW LLEFTJ x PsSTOP S VP told W RIGHT Adding the ComplementAdjunct Distinction S A NP VP subject V Slttoldtvisigt verb NPdesterdzyNN NPHillzry1NNP VPlttold V5 N N NNP V5l yeste rdzy Hillary old 0 Hillaly is the subject 0 yesterday is a tampon modi er But nothing to distinguish them Adding Tags Making the r 39 quot39 Distinction S S N39PVC VP N39PVP subject v mgdi er v verb Stold V5D NPwesterdzyNN NPCHillzry1NNP VPlttoldV5 NN NNP V 1 Hillary told yesterday Adding dependency on structure Weakening the independence assumption of PCFG Probabilities dependent on structural context PCFGs are also deficient on purely strudural grounds too Really oontext independent Weakening the independence assumption of PCFG VIVS TO VIVVP l to VB PVVP see lN NPTP 1r NN NNS advertising works a VIVS TO VP l W to VE VP SEARVP see NSEAR SEBAR 1r NVS VP S NN NP VEZ VP advertising works Left Corner Parsing LeftCorner Parsing Technique for 1 word of lookahead in algorithms like Earley s 5 can also do multiword lookahead but it s harder Basic Earley s Algorithm attach lelli mmip 39 ONP Papa 0 Del the 0 Del 8 predict predict predict 1word Iookahead would help at predict predict 1word Iookahead would help No point in looking for words or 39 at can t start with ate Wat OROOT s ONPPapa Gs NPVP Gs NP VP GNP DeiN GNP NPPP WP VNP GNP Papa WP VPPP ODei the m ODei a V ate U lxllj ill i F With LeftCorner Filter now 1 PP P NPquot won t get created here either ONP Papa 0 Del the Need to know that ate can t ODel a start a PP 39 of all categories that it does start predict at predict predict Merging RightHand Sides 0 Grammar might have rules x a B G H P 0 Could end up with both of these in chart 2XaAGHP in columnS 2Xa BGHP in columnS But these are now interchangeable if one produces X then so will the other 0 To avoid this redundancy can always use dotted rules of this form X a G H Merging RightHand Sides 0 Similarly grammar might have rules x a AG H P x a A G H o 0 Could end up with both of these in chart 2XaAGHP in columnS 2XaAGHQincolumn5 Not interchangeable but we ll be processing them in parallel for a while 0 Solution write grammar as X a AG H PIQ Merging RightHand Sides Combining the two previous cases x a AG H P x a AG H o x a B G H P x a B G H 0 becomes XSABGHPIQ And often nice to write stuff like NPSDetleAdjN Merging RightHand Sides XSABGHPIQ NP Det i s Adr N 0 These are regular expressions 0 Build their minimal DFAs X a ovoanrovo B Q Automaton states replace dotted rules x 7 AG H P Merging RightHand Sides 0 Indeed all rules for NP can be unioned into a single DFA If col g of the Earley table contains 1 o 0 We predict 3 start state of Adj and 3 start state of N since Adj N are the arcs leaving 0 o If weM such an Adj or N we ll follow its arc to attach and get 10 or 1 O R AdJ NP a ovoVo Adj N 75 Probabilistic LeftCorner Grammars Use richer probabilistic conditioning Rather than conditioning just on parent as S in PCFGs PNP Det Adj Nl NP V Condition on left corner and goal categories PNP Det Adj N I Bet VP S Allow lefttoright online parsing which can hope to explain how people build partial interpretations online Faster parsing with beam search Pruning for Speed Heuristically throw away constituents that probably won t make it into a complete parse Use probabilities to decide which ones So probs are useful for speed as well as accuracy Both safe and unsafe methods exist Throw x away if px lt 10 20 and lower this threshold if we dont get a parse Throw x away if px lt 100 py for some y that spans the same set of words Throw x away if pxqx is small where qx is an estimate of probability of all rules needed to combine x with the other words in the sentence 7E Dependency Parsing Phrase Structure Grammars and Dependency Grammars Phrase Structure Grammar describes the structure of sentences with phrase structure tree Alternatively a Dependency grammar describes the structure with dependencies between wor s One word is the head of a sentence and All other words are dependent on that word Dependent on some other word which conneds to the headword through a sequence of dependencies T 3 e e The old Hum 69 ihe rice 1Vly Phrase Structure Grammars and Dependency Grammars Two key advantages of Dependency grammar are Easy to use lexical information Di amblguatlorl dec ion are beng made direqu Wirn word No need to build a large uper rrdddre Not nece aryto worry about how to lexicallze a PS tree Dependencies are one way of decomposing PS rules Lot d rare rlar tree in Penn Treebanllt a Spa e Data Can ge rea onable probabili ric e rimare if we decompo e ir 3V l V l i M l r mi il l Ni f W i l ll l i pa z Evaluation Method Recall Precision PCFGs Charniak 97 706 748 Decision trees Magerman 95 840 843 Lexicalized with backoff Charniak 97 867 866 Lexicalized with Markov Collins 97 M1 875 877 with subcategorization Collins 97 M2 881 883 MaxEntinspired Charniak 2000 901 901
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'