Note 15 for CMPSCI 585 at UMass
Note 15 for CMPSCI 585 at UMass
Popular in Course
Popular in Department
This 14 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 14 views.
Reviews for Note 15 for CMPSCI 585 at UMass
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Probabilistic Parsing in Practice Lecture 15 Introduction to Natural Language Processing CMPSCI 585 Fall 2004 Andrew McCallum including slides from Michael Collins Chris Manning Jason Eisner Mary Harper i ld eW McCalum UMaSS Today s Main Points 0 Training data 0 How to evaluate parsers 0 Limitations of PCFGs enhancements amp alternatives Lexicalized PCFGs Structure sensitivity Leftcorner parsing Faster parsing with beam search Dependency parsers 0 Current state of the art Treebanks labeling data i ld eW McCalum UMaSS 6 6 o ills Treeban ks 0 Penn Treebank 0 Trees are represented via bracketing o Fairly flat structures for Noun Phrases NP Arizona real estate loans Tagged with grammatical and semantic functions SBJ LOC 0 Use empty nodes to indicate understood subjects and extraction gaps o NPoverNP structure 0 Wrong based on linguistic theory 0 Based on chunkAbney 1991 view Treeban ks Pure Grammar Induction Approaches tend not to produce the parse trees that people want Solution 3 Give a some example of parse trees that we want 6 Make a learning tool learn a grammar Treebank a A collection of such example parses a PennTreebank is most widely used S NPSBJ The move VP followed NP NP a round PP of NP NP similar increases PP by NP other lenders PP against NP Arizona real estate loans SADV NPSBJ VP reflecting NP a continuing decline PPLOC in NP that market Nmnmem mmmm syxv mumsmmw Treebanks anypeuuehaveawguedmatms tens tnhave hngusts cansnucnng ueetanks man gwammaAs EEausemseaswew m mark nutmemnelftwrse wfsenlenues man minm eemme mam Dmsmemzmieslauansafa 92mm rme argammaum mnsmaare Treebanking Issues Type er data Task dependem newspapeuuuma s nuve sJechnma manua s ma ugs emu Sue Themuveme bene R esuuvcerhmned Parse representater DependencyvsPavsetvee A nbmes Whath encude7wuvd5 mmphu ugy syntax semamms R eevenceampbuukkeepmg da eume whu dwdwhat Organizational Issues Team 1 Team eadev buukkeepmghmng 1 Gmdehne peysen 1 ngm mwssuespevsun l Annmamvs 172 Temmca staNpmgvammmg 2 Checkmg persuns Deume annutatmn fpusswb e Treebanking Plan The mam pumts after gemng fundmg P anmng Easm gmdehnes deve upmem Annmanen ampgmdehnes re nement Censmeney checkmg gmdehnes nahzauun Packagmg and mmbmmn Twme needed unthe uvdev e12 years p211 rmHmn Wuvds un yabuu 1GuHhe ma e umsannmatmn Parser Evalua n Evaluation Ultimate goal isto build system for IE QA MT People are rarely interested in syntactic analysis for its own sake Evaluate the system for evaluate the parser For Simplicity and modularization and Convenience Compare parses from a parser with the result of hand parsing of a sentencegold standard What is objective criterion that we are trying to maximize Evaluation Tree Accuracy Exact match It is a very tough standard But in many ways it isa sensible one to use PARSEVAL Measures For some purposes partially corred parses can be useful Originally for nonstatistical parsers Evaluate the component pieces of a parse Measures Precision Fiecall Crossing brackets Evaluation Labeled Precision How many brackets in the parse match those in the correct tree Gold standard Labeled Recall How many of the brackets in the correct tree are in the parse Crossing brackets Average of how many constituents in one tree cross over constituent boundaries in the other tree Bl BZ B3 B4 w1 w2 w3 w4 w5 w w7 w8 Problems with PARSEVAL Even vanilla PCFG performs quite well It measures success at the level of individual decisions You must make many consecutive decisions correctlyto be corrred on the entire tree Problems with PARSEVAL 2 Behind story The structure of Penn Treebank Flat a Few bradltels a Low Crossing brackets Troublesome bradltels are avoided a High PrecisionRetail The errors in precision and recall are minimal In some cases wrong PP attachment penalizes Precision Recall and Crossing Bracket Accuracy minimally On the other hand attaching low instead of high then every node in the rightbranching tree will be wrong serious harm to Precision Recall and Crossing Bracket Accuracy in km w w WV Mule mm t w W m n WWW n w w n WE V slum mum M m mu Lanna Yrcuwn m M NH h Jl Fi41 lhl39HmH lnthnlxlU m with 3r cng mmm m mm at m n r it a Evaluation Do PARSEVAL measures succeed in real tasks Many small parsing mistakes might not affed tasks of semantic interpretation Bonnema19961997 Tree Accuracy d the Per e We Correct Semantic interpretation we Hermajakob and Mooney 1997 Engli hto German tran lation At the moment people feel PARSEVAL measures are adequate for the comparing parsers Lexicalized Parsing Limitations of PCFGs o PCFGs assume Place invariance Context free Prule independent of n words outside span also words with overlapping derivation Ancestorfree Prule independent of Nonterminals above 0 0 Lack of sensitivity to lexical information 0 Lack of sensitivity to structural frequencies Lack of Lexical Dependency Means that PVP gt V NP NP is independent of the particular verb involved but much more likely with ditransitive verbs like gave He gave the boy a ball He ran to the store The Need for Lexical Dependency Probabilities dependent on Legtltical words Problem 1 Verb suboategorization VP expansion is independent ol the choice ol verb However verb come rgt VP7gtVNP 11 321 02 13 9 VP7gtVPP 345 31 71 03 7gt VP 7gt v s 2 2 1 3 A 8 7o 8 lnduding actual words inlormation when making dedsions about tree strudure is necessan Weakening the independence assumption of PCFG Probabilities dependent on Legtltical words Problem 2 Phrasal Attachment Lexiwl content ol phrases provide inlormation lor decis39on Syntactic categon ol the phrases provide very little inlormation Standard PCFG is worse than ngram models NF Mavmi wm En rm IUtWUyu39thm Another case of PP attachment ambiguity a s NAP l N Vm l workers VBD NP 1N NP l l l dumped NNs into DT NN l l l sacks a bin Another case of PP attachment ambiguity b s N39PVP l NIFS VBD N39P workers dml ped NP PP l N39NS N N l l sacks into DT NN ll abin Another case of PP attachment ambigwty Rules Rules 5 gt NP VP 5 gt NP VP NP gt NNS NP gt NNS VP 7VPPP NP gtNPPP VPAVED NP VPaVED NP NP NNS NP gt NNS PP gtiNNP PP gtiNNP a NP gtDTNN b NP gtDT NN NNS 4 workers NNS a workers VED a dumped VED A dumped NNS A sacks NNS sacks m e into m e into DT 4 a DT e a NN a bin NN e bln If PN39PgtN39PPP i NP gt PVPgtVPPP i VP then b is more probable else a is more probable Attselrrnent decision is completely independent ofthe words A case of coordination ambiguity a N17 h NP W W l in ee in m i i in in up pp and NNS ear l i A i m quotis m in use quot 1 cf quot 1 l l l ms and Ms e m ins g i l l in lxex 1 19 a so so in luyccm n luyccm in iron in iron nuns Min nsnn nsnn in mi in in a P iNNS h P iNNS nurses more non mom in sr es m sr es ee an ee an in no in no Here the two parses have identical rules and therefore have identical pmlxalility under any asignment or 1ch rule probabilities Weakening the independence assumption of PCFG Probabilities dependent on Lexical words Solution Lexiwlize CFG Each phiasal node with its head word NT W no w m7 main W l r A no imAI n an in we we Background idea Sirorlglexlcal dependencle between nead andiher dependent Heads in ContextFree Rules Add annotations specifying the head ofeiwh rule S 3 NP VP Vl sleeps Vt saw VP s Vl NN A VP Vt NP NN maquot VP e VP PP NN l mu NP e DT NN DT lamps NP NP PP IN in P 9 IN NP 9 w IN ln Note ssentenee VP erb phrase NPnoun phrase PPplepositioual phrase DTdeterrniner V intransitive verb Vmansitive verb Nquotn 1Npreposin39on More about heads Each contextrfree rule has one special child that is the head ofthe in1e eg S gt N39P VP VP is the head VP 4gt Vt NP Vt is the head N39P gt DT NN NN NN is the head a A core idea in linguistics Xebar Theory HeadeDriven Phiase Structure Grammar 0 Some intuitions The central subeconstituent ofeach rule The semantic predicate in each rule Rules which recover heads Example rules for NPs Ifthe mle contairs NN NNS orNN39P Choose the rightmost NN NNS or NNP Else It39the rule contains anN39P Choose the le most N39P Else Ifthe rule contains aJJ Choose the rightmost JJ Else It39the rule contains a CD Choose the rightmost CD Else Choose the rightmost child e 1 gt DT NN39P NN NP DT NN NN39P N39P gt N39P PP N39P DT JJ N39P gt DT Adding Headwords to Trees Squestioned N39Plawyer VPquestioned DT NN Vt N39Pwitness l me my questioned DT NN l l the witness o A constituent receives its headword from its head child s gt N39P VP 5 receives headword fromVP Vt NP VP ieceives needwcid flomVl DT NN N39P ieceives needwcid fromN39N Adding Headtags to Trees 5qnesn39cned Vt miawya N39N VPqnesn39cned Vt DT NN Vt NPwienessNN the lawyer l questioned Dr NN the witness Also propagate partofspeech tags up the trees Explosion of number of rules New rules might look like VPgave v Vgave NPman NPbook But this would be a massive explosion in number of rules and parameters Lexicalized Parsing with smoothing Lexicalized parsing Charniak1997 I A very simple conservative model of lexicalized PCFG I Probabilistic conditioning is topdownquot but actual com putation is bottomup Slose NPpo ts VPose JJcopoate NNSpo ts Vlose corporate pro ts rose Charniak1997 Generate head then head constituent amp rule h pro ts C NP Srose phrosepCS NP VProse A A PhlnhCnC PrlhCnC Srose Srose NPprofits VProse NPpro ts VProse A A A J NNSpro ts A A hhead word chead consituent phparent head word parent head constituent Smoothing in Charniak1997 13hlvhCvc A12PMtghlvhCvc A2ePMtgththCnC A3 PMLEl llClC MlelpMLEll llC l Alia is here afunction of how much one would expect to see acertain occurrence given the amount oftraining data word counts etc I Cph is semantic class of parent headword I Techniques like these for dealing with data sparseness are vital to successful model construction Charniak 1997 smoothing example PprftlroseNPS PcorplprftJJNP Phlphcpc 0 0245 PthphcpC 000352 00150 Phlcpc 0000627 000533 Pl ilc 0000557 000418 I Allows utilization of rich highly conditioned estimates but smoothes when suf cient data is unavailable I One can tjust use MLEs one commonly sees previously unseen events which would have probability 0 Charniak 1997 Rule probability with similar smoothing Prlhhcpck1ePrlhhcpc k2ePrlhhc k3ePrlChhc M6PrthaP0 k5ePrlhc Sparseness and the Penn Treebank I The Penn Treebank 7 1 million words of parsed English W5 7 has been a key resource because ofthe Widespread reliance on supenised learning I But 1 million words is like nothing B 965000 constituents but only 66 WHADJP of which only 6 aren t how much or how many but there is an in nite space ofthese howcleveroriginaIincompetent at risk assessment and evaluation I Most ofthe probabilities that you would like to compute you can t compute Sparseness and the Penn Treebank I Most intelligent processing depends on bilexical SlallS39 tics likelihoods of relationships between pairs ofwords Extremely sparse even on topics central to the W5 D stocks plummeted 2 occurrences D stocks stabilized l occurrence o stocks skyrocketed 0 occurrences D stocks discussed 0 occurrences So far there has been very modest success augmenting the Penn Treebank With extra unannotated materials or using semantic classes or clusters cf Charniak 1997 Charniak 2000 a as soon as there are more than tiny amounts of annotated training data Lexicalized Markov out from head Collins 1997 Markov model out from head I Charniak l 997 expands each phrase structure tree in a single step I This is good for capturing dependencies between child nodes But it is bad because of data sparseness A pure dependency one child at a time model is worse But one can do better by in between models such as generating the children as a Markov process on both sides of the head Collins 1997 Charniak 2000 Modeling Rule Productions as Markov Processes Step 1 generate category of head child StoldV6 ll StoldV6 VPiDl iV6 P1VPl 5 told V6 Modeling Rule Productions as Markov Processes a Step 2 generate left modi ers in a Markov chain Modeling Rule Productions as Markov Processes 0 Step 2 generate left modi ers in a Markov chain StoldV6 VPtoldV6 ll StoldV6 N39PHillaryNN39P VPtoldV6 P39VPl S told V6gtltRNPHillaryNN39P l SVRtoldV6LEFI SmldV 6 7 7 NPHillaryN39NP VPloldV 6 ll StoldV 6 N39PyesterdayN39N NPHillaryN39NP VPtoldV 6 P wpi s told V6 gtlt HNPI 11ayNNP i SVPmldV6LEF1quotgtlt R1N39PyesterdayNN i SVPtoldV6LEFT Modeling Rule Productions as Markov Processes a Step 2 geuemte left modi ers in a Markov chain Stold V5D 77 NPwesteidsyNN NPHi112ryNNP VPlttoldV5 L Stold V5D STOP NPwesteidsyNN NPHi112ryNNP VFtold V5 PnL39VP 5 old V5 gtlt nttNRHillamNNm S VP told WSLLEFTJX I iNPyesteidayNN S VP told VwLLEFT gtlt IMSTOP S VP told WSLLEFT Modeling Rule Productions as Markov Processes Step 3 geuemte right modi eis in a Nlarkov chain Stold V5 STOP NRyeSterdzyNN NRHillamNNm VFtold V5 77 J Stold V5l STOP NPwesteidsyNN NRHillamNNP VFtold V5 STOP PMij i 5 old V5 gtlt m NPHi112ryNNP S VP told WSLLEFT gtlt RLNPltyesterd21NN S VP told WSLLEFT gtlt PASTDP S VP told WSLLEFT x PJSTDP S VEtold WSLRlGlIT A Re nement Adding a Distance Variable A 1 ifpositiou is adjaceutto the head StoldV6 VPtoldV6 U SmldV6 N39PHillaryNN39P VPmldV6 1 VP s told V6gtlt Pd39N39PHillaryNN39P SVRtoldV6LEFI A l Adding the ComplementAdjunct Distinction S N39P VP subject V 5told V5 verb NPdesterdzyNN NPHillzryNNP VPlttoldV5 NN NNP V5 i yesterday Hiusny old 0 Hillaly is the subject 0 yesterday is a tempoml modi er But nothing to distinguish them Adding Tags Making the ComplementAdjunct Distinction s s N39PVC VP N39P VP i subject V madi er V verb ve rb Stold V5D NPwesterdzyNN NPCthlzry1NNP VPlttoldV5 N N NNP V5 yesterday ihusny told Adding dependency on structure Weakening the independence Weakening the independence assumption of PCFG assumption of PCFG Probabilities dependent on structural context WAS WAS PCFGs are also deficient on purely strudural grounds too Really oontext independent T O WV TOW VIVVP to VB PVVP to VE VP SEAR VP EXDEHSlOH as Subi as Obi NP PEP 3 7 21 see M NH see Mama S SEAR NP a NNP a 5 G 9 l Np a Dr W 5 5 4 5 1r NN NNS M NH VH NP 439 NN i 4 2 8 NP 4 Np SEAR O 5 2 8 advertising works NN NP VEZ VP NP a NP PP 5 5 14 1 advertising works a b Left Corner Parsing LeftCorner Parsing Technique for 1 word of lookahead in algorithms like Earley s can also do multiword lookahead but it s harder Basic Earley s Algorithm 0 0 ROOT S G Papa l attach G ROOT s G NP Papa G s NP VP G iillquot UP predict ONP DelN ONP DelN ONPNP PP GNP NP PP GNP NP PP GNP Papa GNP Papa ODel the ODel the ODel a ODel a G Papa i OROOT s ONPPapa Gs NPVP Gs NP VP GNP DeiN OHF39MF PP predict GNP NPPP VP VNP GNP Papa VP VPPP ODei the ODei a 0 Papa I OROOT s GNPPapa Gs NPVP Gs NP VP GNP DeiN GNPNP PP GNP NPPP GNP Papa predict ODei the PP PNP ODei a V ate V drank V snoned 1word Iookahead would help Wat OROOT S ONPPapa OS NPVP OSNP VP ONP DeiN GNPNP PP ONP NPPP iVP VNP ONP Papa iVP VPPP ODei the NW PNP ODei a i ate 1V drank 1V Snoned 1P With 0 Papa i OROOT s Gs NPVP GNP DeiN GNP NPPP GNP Papa ODei the ODei a GNP Papa Gs NP VP ONP NP PP ii VNP iVP VPP iPP PNP 0 Papa 1 OROOT s Gs NPVP GNP DeiN GNP NPPP GNP Papa ODei the ODei a GNP Papa Gs NP VP P ONP NP PP iVP VNP iVP VP PP P NP i ate 1V drank 1V Snoned predict predict 1word Iookahead would help at 0 Papa i OROOT s ONPPapa Gs NPVP Gs NP VP GNP DeiN WP PP GNP NPPP VP VNP GNP Papa VP VPPP ODei the PNP ODei a V aie i drank 1 n Vi d in No point in looking for words or constituents that can t start with ate With LeftCorner Filter GP apalat OROOT s 39 attach i 3 NF lP ONE 0er now 1 PP P NPquot won t get Wquot i iF i i created here either ONP Papa ODel the Need to know that ate can t ODS a start a PP Take closure of all categories that it does start 0 Papa I at OROOT s ONPPapa Gs NPVP Gs NP VP GNP DeiN GNP NPPP in i llp predict GNP Papa lP VPPP ODel the ODel a Merging RightHand Sides 0 Grammar might have rules x a AG H P x a B G H P 0 Could end up with both of these in chart 2XaAGHP in columnS 2Xa BGHP in columnS But these are now interchangeable if one produces X then so will the other 0 To avoid this redundancy can always use dotted rules of this form X a G H P 71 9 Papa i at OROOT s ONPPapa Gs NPVP Uwiillquot iP predict GNP DelN GNP NPPP GNP Papa ODel the ODel a 0 Papa I at OROOT s ONPPapa Gs NPVP OSNP VP GNP DelN GNP NPPP GNP Papa ODel the l ale ODel a Merging RightHand Sides 0 Similarly grammar might have rules x a AG H P x a A G H o 0 Could end up with both of these in chart 2XaAGHP in columnS 2XaAGHQincolumn5 Not interchangeable but we ll be processing them in parallel for a while 0 Solution write grammar as X a AG H PIQ Merging RightHand Sides Combining the two previous cases x a AG H P x a AG H o x a B G H P x a B G H 0 becomes xsAiBGHPio And often nice to write stuff like NPSDetleAdjN 73 Merging RightHand Sides XSAIBGHPIQ NPs DetleAdjN These are regular expressions 0 Build their minimal DFAs A R X a ovoanrovo B Q Adj Automaton states I 0 replace dotted NP a o N Adl rules XSAG HP N u Merging RightHand Sides 0 Indeed all rules for NP can be unioned into a single DFA If col g of the Earley table contains 1 o 0 We predict 3 start state of Adj and 3 start state of N since Adj N are the arcs leaving 0 o If weM such an Adj or N we ll follow its arc to attach and get 10 or 1 O 1 lm NP a ovoVo Adj N 75 Probabilistic LeftCorner Grammars Use richer probabilistic conditioning Rather than conditioning just on parent as S in PCFGs PNP a Det Adj Nl NP V Condition on left corner and goal categories I PNP a Det Adj Nl Det VP S NP Det Adj N p Allow lefttoright online parsing which can hope to explain how people build partial interpretations online Faster parsing with beam search Pruning for Speed Heuristically throw away constituents that probably won t make it into a complete parse Use probabilities to decide which ones So probs are useful for speed as well as accuracy Both safe and unsafe methods exist Throw x away if px lt 1039200 and lower this threshold if we dont get a parse Throw x away if px lt 100 py for some y that spans the same set of words Throw x away if pxqx is small where qx is an estimate of probability of all rules needed to combine x with the other words in the sentence 7E Dependency Parsing 79 Phrase Structure Grammars and Dependency Grammars Phrase Structure Grammar describes the structure of sentences with phrase structure tree Alternatively a Dependency grammar describes the structure with dependencies between words One word is the head of a sentence and All other words are dependent on that word Dependent on some other word which conneds to the headword through a sequence of dependencies A 39 32 The 0121 Juan w ihe me slowly Phrase Structure Grammars and Dependency Grammars Two key advantages of Dependency grammar are Easy to use lexical information Di amblguatlorl dec ion are beng made direqu Wirn word No need to build a large uper tructure Not nece aryto worry about how to lexicallze a PS tree Dependencies are one way of decomposing PS rules Lot d rare rlar tree in Penn Treebanllt a Spa e Data Can ge rea onable probabili ric e rimare if we decompo e ir 39i l H1 V l i in w w l l Ni l min l ll l i ii pa Evaluation Method Recall Precision PCFGs Charniak 97 706 748 Decision trees Magerman 95 840 843 Lexicalized with backoff Charniak 97 867 866 Lexicalized with Markov Collins 97 M1 875 877 with subcategorization Collins 97 M2 881 883 MaxEntinspired Charniak 2000 901 901
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'