Introduction to Natural Language Processing
Introduction to Natural Language Processing CMPSCI 585
Popular in Course
Popular in ComputerScienence
This 49 page Class Notes was uploaded by Roman McCullough on Friday October 30, 2015. The Class Notes belongs to CMPSCI 585 at University of Massachusetts taught by Staff in Fall. Since its upload, it has received 20 views. For similar materials see /class/232275/cmpsci-585-university-of-massachusetts in ComputerScienence at University of Massachusetts.
Reviews for Introduction to Natural Language Processing
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/30/15
Machine Translation Lecture 17 Introduction to Natural Language Processing CMPSCI 585 Fall 2007 Andrew McCallum including slides from Michael Collins and Dan Klein The challenges of Machine Translation Differing Word Orders 0 English word order is subject verb object 0 Japanese word order is subject object verb English IBM bought Lotus Japanese IBM Lotus bought English Sources said that IBM bought Lotus yesterday Japanese Sources yesterday IBM Lotus bought that said Syntactic Structure is not Preserved Across Translations The bottle oated into the cave ii La botella entro a la cuerva otando the bottle entered the cave oating Syntactic Ambiguity Causes Problems John hit the dog With the stick ii John golpeo e1 perro con el paloque tenia e1 palo The Best Translation May not be 11 From Manning and Schuetze According to our survey 1988 sales of mineral water and soft drinks were much higher than in 1987 re ecting the growing popularity of these products Cola drink manufacturers in particular achieved above average growth rates Quant auX eauX minerales et auX limonades elles recontrent toujours plus d adeptes En effet notre sondage fait ressortir des ventes nettement superieures a celles de 1987 pour les boissons a base de cola notamment With regard to the mineral waters and the lemonades soft drinks they encounter still more users Indeed our survey makes stand out the sales clearly superior to those in 1987 for cola based drinks especially Machine Translation Example Atlanta preso il killer del palazzo di Giustizia ATLHNTA La grahde padra the per 26 ore ha attanagliato tlanta e finita Brian CL39TIJ s e e cercato rifugio nell39alloggio di una donna in un cornplesso d39appartamenti alla periferia della citta Per tutto il ioro i centro della citta sede della 39 i e dei Giochi 199E cuore di una popolosa area rnetropolitana era rimasto paralizzato Atlanta taken the killer of the palace of Justice TL NTn The great fear that for 26 hours has gripped Atlanta is ended Brian Nichols the man who had killed three persons to alace of utice an tha igjjirz y lei ir it cil wgl quot Lilli1 5 is dellyered to the police after to haye tried shelter in the lodging of one woman in a complex of apartments to the periphery of the city For all the da the center of the city center of the rid of Giochi 1996 heart of one popolosa metropolitan area was remained paralyzed History 1950 s Intensive research activity in MT 1960 s Direct wordfor word replacement 1966 ALPAC NRC Report on MT Conclusion MT no longer worthy of serious scientific investigation 19661975 Recovery period 19751985 Resurgence Europe Japan 1985present Gradual Resurgence US httpourwor1dcompuscrvccomhomepagesWJHutchinsMTS93htm Levels of Transfer Interlingua vaquOIS Semantic Semantic triangle Composition Decomposition Structure Structure Semantic semantic Semantic Analys1s Transfer Generation Syntactic Syntactic Syntactic Structure syntaguc Structure Syntactic Analysis Trans er Generation Word Word Structure Structure Morphological Morphological Analysis Generation Source Text Target Text General Approaches Rulebased approaches Expert systemlike rewrite systems Interlingua methods analyze and generate Lexicons come from humans Can be very fast and can accumulate a lot of knowledge over time eg Systran Statistical approaches Wordtoword translation Phrasebased translation Syntaxbased translation treetotree treetostring Trained on parallel corpora Usually noisychannel at least in spirit The Coding View One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography When I look at an article in Russian I say This is really written in English but it has been coded in some strange symbols I will now proceed to decode Warren Weaver 195518 quoting a letter he wrote in 1947 MT System Components Language Model Translation Model source e channel Pe Pfle f k d J i 7 best observed e decoder f argmax Pef argmax PfePe e k e V 1 Why Qt Simply Pef Finds an English translation which is both uent More data for Pe and semantically faithful to the French source A Brief Introduction to Statistical MT 0 Parallel corpora are available in several language pairs 0 Basic idea use a parallel corpus as a training set of translation examples 0 Classic example IBM work on FrenchEnglish translation using the Canadian Hansards 17 million sentences of 30 words or less in length The Sentence Alignment Problem o Might have 1003 sentences in sequence of English 987 sentences in sequence of French but which English sentences corresponds to which French sentences 61 f1 62 61 f 1 n 62 f2 63 f2 63 f 3 n e 64 f4 4 f3 65 f5 2 397 66 f 6 f5 67 f 7 86 f 6 67 f 7 o Might have 1 1 alignments 1 2 2 1 2 2 etc The Sentence Alignment Problem 0 Clearly needed before we can train a translation model 0 Also useful for other multilingual problems 0 TWO broad classes of methods we ll cover Methods based on sentence lengths alone Methods based on lexical matches or cognates Sentence Length Methods Gale and Church 1993 0 Method assumes paragraph alignment is known sentence alignment is not known 0 De ne l6 2 length of English sentence in characters lf 2 length of French sentence in characters 0 Assumption given length 6 length lf has a gaussiannormal distribution with mean c X 6 and variance 52 X 6 for some constants c and 3 0 Result we have a cost 008tltle If for any pairs of lengths l6 and lf Today The components of a simple MT system You already know about the LM Wordalignment based TMs IBM models 1 and 2 HMM model A simple decoder Not today More complex wordlevel and phraselevel TMs Treetotree and treetostring TMs More sophisticated decoders A WrLevel TM What might a model of Pfe look like J V How to estimate this What can go wrong here Alignments a hidden vector called an alignment specifies which IBM Mel 1 Brwn 3 English source is responsible for each French target word 1toMany Alignments Andl thez programg 112134 1399115 im plemenleda Le programmez a3 l 4 111155 ens applicatim39q Manyto 1 Alignments The balance wasa theA territory5 ofs they aboriginal peopleg K Le restez appartenaitg aux4 autochtonessg Manyto Many Alignments The poem don t haveq anys moneys 7 Lesl pauvresa sent demunia Monotonic Translation Japan shaken by two new quakes Le Japon secou par deux nouveaux s ismes Local Order Change Japan is at the junction of four tectonic plates Le Japon est au confluent de quatre plaques tectoniques IBM Mel 2 Alignments tend to the diagonal broadly at least 1305646 HPCLj ilj717JPfjl z39 J I P z 7 J Limo 9 Z Other schemes for biasing alignments towards the diagonal Relative alignment Asymmetric distances Learning a multinomial over distances IBM Mel 2 Alternative Model P01 iltj L J as a simple dense table 1305048 HPaj i1717JPfj 7L J In other words a simple multinomial over i for eachj I J eg Di2 j1 l6 J7 How to learn these parameters from pairs of sentences Notation switch 1 1 length of English document 1n J length of French document EM Training of Alignment and Translation Parameters A Crucial Step in the EM Algorithm o Say we have the following e7 f pair e 2 And the program has been implemented f Le programme a ete mis en application 0 Given that f was generated according to Model 2 what is the probability that a1 2 Formally Pr0ba1 2 f e Z Pa felm aza122 Step 2 Calculating the Expected Counts 0 Calculate the translation counts tcounte Z aij k 23336 ekz e flk7jlf o tcounte f is expected number of times that e is aligned with f in the corpus Step 2 Calculating the Expected Counts 0 Calculate the source counts scounte Z 21 aij k Us j ekie o scounte is expected number of times that e is aligned with any French word in the corpus Step 2 Calculating the Expected Counts 0 Calculate the alignment counts kz km acountjlm k lmk 0 Here acount j l m is expected number of times that 67 is aligned to fj in EnglishFrench sentences of lengths l and m respectively 0 acountj l m is number of times that we have sentences e and f of lengths l and m respectively Some examples of training Alignment probabilities i j k aijk 1 1 0 0526423237959726 2 1 0 0473576762040274 1 2 0 0552517995605817 2 2 0 0447482004394183 1 1 1 0466532602066533 2 1 1 0533467397933467 1 2 1 0356364544422507 2 2 1 0643635455577493 1 1 2 0571950438336247 2 1 2 0428049561663753 1 2 2 0439081311724508 2 2 2 0560918688275492 Expected counts e f tcount e f the le 099295584002626 the chi en 0552517995605817 the chat 03563 64544422507 the 1 0571950438336247 the autobus 043 908 13 1 1724508 dog 1e 0473 576762040274 dog chien 0447482004394183 dog chat 0 dog 1 0 dog autobus 0 cat 1e 0533467397933467 cat chi en 0 cat chat 0643 63 5455577493 cat 1 0 cat autobus 0 bus 1e 0 bus chi en 0 bus chat 0 bus 1 0428049561663753 bus autobus 0560918688275492 Old and new parameters e f old new the le 023 034 the chien 02 019 the chat 011 012 the 1 025 02 the autobus 021 015 dog 1e 02 051 dog chien 016 049 dog chat 033 0 dog 1 012 0 dog autobus 0 18 0 cat 1e 026 045 cat chien 028 0 cat chat 019 055 cat 1 024 0 cat autobus 003 0 bus 1e 022 0 bus chien 005 0 bus chat 026 0 bus 1 019 043 bus autobus 027 057 f the le 023 034 046 056 064 071 the chien 02 019 015 012 009 006 the chat 011 012 01 008 006 004 the 1 025 02 017 015 013 011 the autobus 021 015 012 01 008 007 dog 1e 02 051 046 039 033 028 dog chien 016 049 054 061 067 072 dog chat 033 0 0 0 0 0 dog 1 012 0 0 0 0 0 dog autobus 0 18 0 0 0 0 0 cat 1e 026 045 041 036 03 026 cat chien 028 0 0 0 0 0 cat chat 019 055 059 064 07 074 cat 1 024 0 0 0 0 0 cat autobus 003 0 0 0 0 0 bus 1e 022 0 0 0 0 0 bus chien 005 0 0 0 0 0 bus chat 026 0 0 0 0 0 bus 1 019 043 047 047 047 048 bus autobus 027 057 053 053 053 052 After 20 iterations 6 f the le 094 the chien 0 the chat 0 the 1 003 the autobus 002 dog 1e 006 dog chien 094 dog chat 0 dog 1 0 dog autobus 0 cat 1e 006 cat chien 0 cat chat 094 cat 1 0 cat autobus 0 bus 1e 0 bus chien 0 bus chat 0 bus 1 049 bus autobus 051 Evaluation of Machine Translation Unigram Precision o Unigram Precision of a candidate translation C N Where N is number of words in the candidate C is the number of words in the candidate Which are in at least one reference translation eg Candidate 1 It is a guide to action Which ensures that the military always obeys the commands of the party 17 P 760282077 only obeys is missing from all reference translations Modi ed Ngram Precision 0 Can generalize modi ed uni gram precision to other n grams 0 For example for candidates 1 and 2 above 10 Precision bigmm E Precision bigram 1 3 But Recall isn t Useful in this Case 0 Standard measure used in addition to precision is recall C R ll 60a N where C is number of n grams in candidate that are correct N is number of words in the references Candidate 1 I always invariably perpetually do Candidate 2 I always do Reference 1 I always do Reference 1 Iinvariably do Reference 1 I perpetually do The Final Score 0 Corpus precision for any n gram is p ZCECandidate anramEC OOUWtcliprTam n ZCECandidate anramEC COuntmgram ie number of correct ngrams in the candidates after clipping divided by total number of ngrams in the candidates a Final score is then 356 BP X P1P2P3P414 ie B P multiplied by the geometric mean of the unigram bigram trigram and four gram precisions Decoding In these wordtoword models Finding best alignments is easy Finding translations is hard why it is not clear l l l l CE NE EST PAS CLAIR Bag Generation Decoding Enact meanstmctian Please gine me your response as sonn passiible i PI EH SE give l ynur response as snnn pnssible Reconstruction pressWing manning Now let me Insntinn sums of the disadvantages 1 Let 111i1 mention sums of the disadvantagss new Garbage reconstruction In nur nl gsnizstinn research has twn missions 2 In nur Irlissinns resenrnh nrganisstinn has two Bag Generation is a TSP Imagine bag generation with a bigram LM Words are nodes Edge weights are Pw W Valid sentences are Hamiltonian paths Not the best news for wordbased MT Decoding Anyway Simplest possible decoder Enumerate sentences score each with TM and LM Greedy decoding Assign each French word it s most likely English translation Operators Change a translation Insert a word into the English zerofertile French Remove a word from the English nullgenerated French Swap two adjacent English words Do hillclimbing or annealing You should be able to build a model 12 translator now More on word alignment decoding next class Greedy Decoding NULL well heard 1 it talks a gmat victory W bian antende iI parle de une belle ulctaire translataTwonda2u ndarstnod a bout NULL we understand it talks about a great victory I translaha neWordee bin entendu il parla as me halls victoira NIUILL well understood hetalks abuut a great victory I translamTwondsU qu ita2naturally bien entendu il parle 63 um bells victoire 39 NULL quite naturally hatalks about a great victory H bien entendu il parle d3 um bells uictaire 39