Popular in Course
Popular in Linguistics
This 128 page Class Notes was uploaded by Delmer Bayer on Tuesday October 20, 2015. The Class Notes belongs to LING581 at San Diego State University taught by R.Malouf in Fall. Since its upload, it has received 21 views. For similar materials see /class/225285/ling581-san-diego-state-university in Linguistics at San Diego State University.
Reviews for COMPUTATIONALLINGUISTICS
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/20/15
Homework 0 For Wednesday 35 I Bambona vowels p 1537161 I Esperanto adjectives p 227224 I Esperanto nouns and adjectives p 22226 Rule ordering I Extrinsic rule ordering is a key part of this kind of linguistic analysis I A simple fictional example kaNpat gt kampat gt kammat 0 Two rules N gtmp1 O p gtmm Rule ordering Brazilian Portuguese Textitoiphones p 145ff c and ss are always pronounced s c is pronounced before e i 139 e ch is always pronounced otherwise c is pronounced k lh is pronounced 1 nh is pronounced N otherwise h is silent rr is pronounced R r is pronounced R word initially otherwise 1 is r e is pronounced i at the end of a word at the beginning between p and 1 and at the end before s o is pronounced u at the end of a word and at the end before s Rule ordering s is pronounced Z between vowels Z is pronounced s at the end of a word d is pronounced J before i regardless of the original letter t is pronounced C before i regardless of the original letter Tokenization I Tokenization is the dividing up of a sequence of characters into units eg words sentences Stop Mr Smith shouted becomes Stop MrSmithshouted I First step required before most corpus analysis I Often dismissed as uninteresting but not as easy as it 100 s Words 0 Generally finding word boundaries is straightforward in English using spaces and punctuation Often contractions are treated as multiple words for tagging or parsing can t gt ca n t they ll Ve gt they 11 Ve Multiword expressions like a little or in addition can be treated as a single unit Sentences 0 Identifying sentence boundaries is more difficult I Periods in English are ambiguous and may serve many functions I end of sentence I abbreviations 0 decimal points I ellipses O Other sentenceilike units aren t clearly marked lists captions etc Sentence breaks Simplest rule Every period is a sentence break Clearly wrong for titles abbreviations decimal num ers In the Brown corpus there are 52511 sentences ending in a period or question mark and 3569 contain a non terminal period So the simple rule yields 9320 correct Grefenstette 1999 Sentence breaks Summary of Grefenstette s 1999 results Method Errors Accuracy All periods 3869 9320 Numbers 3340 9364 Patterns 1229 9766 Corpus 869 9835 Dictionary 488 9907 Moral 1 the more you know the better you will do Moral 2 perfect accuracy is usually impossible Tokenization in stt section 92 Tokenization Tokenization is even more difficult for languages with no orthographic spaces or for written text MaXMatch algorithm is a greedy search using a dictionary thebigyellowhouse the big yellow house thetabledownthere theta bled own there the table down there A better approach finds the most likely sequence of words segue to nigram models Homework 0 Read Jurafsky and Martin 1st or 2nd ed Chapter 5 I Midterm sic exam 0 Out Friday April 11 0 Due Friday April 18 0 Open book open notes but work on it by yourself 0 Download and read fsa3 py as soon as possible Tagging I Pattiofispeech tagging assigns a grammatical category to tokens in a corpus I Since words may potentially occur as more than one pan of speech tagging is a limited kind of disambiguation The representative put the chairs on the table DET NOUN VERB DET NOUN PREP DET NOUN PERIOD 0 Tagging can be done by hand automatically or as a combination of the two Tagging 0 Penn tagset Tag 8 EEEZEEEEEEEESBB Description Example Tag Description Example Coorle Conjuncuon m1 bur m SYM Symbol 4 amp Cardlnal number one m rhree TO to m Detel39mmer a rhe UH lnlenecuon ah 00 Exlstenual there39 rhem VB erb basefolm ear Forelgn wor men culpa VED Verb pasttense are Preposluonsubrconj of m by VB Verb eru eanrlg Adlecuve elww VEN Verb pastpamclple earerl Adj compamtlve blgger VEP Verb noang pres ear Adj superlauve mlden VEZ Verb 35 res Llstltem marker 12 rle WD Whrdete lner whzchrhar Modal an hami WP Whrpron hm who Noun slng ormass lbw was Possesslve whr whore Noun plural lmm WR adverb haw where Propernoun slngular IBM is Dollarslgn Properno lural Camllmz ll undsl ll Predet l39ml nll borh Left quote or Possesslve endlng 39 nght quote 39 or quot Pers nal pronoun an Leftpa esl rr r lt Possesslve pronoun your a 239 nghtparenthesls 1 r r gt Adverb qnlekly rlzver Comma Adverb compamtlve flurer Sentencer nalpunc l7 Adverb superlatlve ancle Mldrsentence punc Ambiguity 0 Word types in the Brown corpus Tags Types 35340 3 760 264 61 12 2 1 ICJWU39lPLONH I In the Brown corpus 115 of the word types are ambiguous but more than 40 of the word tokens are ambiguous 0 Many ambiguous words are easy to disambiguate put 0 Stochastic tagging Stochastic taggers take advantage of uneven tag probabilities Ifwe want to assign a sequence of tags t1 tn to a sequence of words W1 Wn we need to maximize Pt1 tn W1 Wn We construct a probability model using some annotated ata Typically the annotated data available is divided into training validation and test data Stochastic tagging The simplest statistical tagger picks the most frequent tag for a given word The probability of a tag given a word can be estimated from training data times word occurs with tag P t d aglwor times word occurs Depending on the text and the tagset this tagger can give as high as 91 word accuracy Since it is so simple it is usually considered the baseline The human ceiling is about 98 Stochastic tagging 0 For example The representative put the chairs on the table 0 Out of 41191 occurrences ofput in the BNC 41145 are verbs and 46 are nouns So PVBDput 41145 41191 0999 PNNput 46 41191 0001 Rulebased tagging 0 Problems with the baseline tagger I always assigns a word the same tag regardless of context I cannot generalize to words that were not in the training data I Ruleibased taggers use rules written by linguists to narrow down the choices 0 Tags for that adverb that tall pronoun Look at that deterIniner Give me that one conjunction I knew that he would be here Rulebased tagging EngCG tagger Karlsson et al 199 Voutil men 1995 I First stage twoilevel lexicon and morphologic analysis adds features to words All possible tags that are nsisten with the morphology are included Pavlov PAVLOV N NOM SG PROPER had HAVE V PAST VFIN SVO HAVE PCP2 SVO shown SHOW PCP2 SVOO SVO SV that ADV PRON DEM SG DET CENTRAL DEM SG CS salivation N NOM SG Rulebased tagging 0 me har ases P is that stupid Pa onsiders that stupid 0 Adv rbialithat rule ADVERBIALeTHAT RULE Given input that if 1 AADVQUANT ifnext word is adj adverb or quanti er 2 SENTiLIM andfollowing which i a sentence boundary NOT 71 SVOCA and the previous word is not a verb like consider39 which allow adj a object complement then eliminate noniADV tags else eliminate ADV tag Transformationbased learning Transformationibased learning is a nonistatistical technique for increasing the context TBL starts with a baseline tagger and generates error correcting transformations e g replace VBD with NN after AT On each iteration pick the rule with reduces the error rate the most and repeat until the error rate stops improving Transformations can be conditioned on wider range of contexts than nigrams without running into sparse data problems Homework Symbolic NLP 0 Read chapters 1 and 2 in Jurafsky and Martin 0 Good OldrFashioned Artificial Intelligence addressed cognition as a symbolicrmanipulation problem 0 explicit representations of knowledge rules databases etc 0 procedures for working with knowledge logic 0 Natural language is itself a symbolic system so a symbolic approach to it seems reasonable 0 Symbolic NLP allows us to directly make use of work in theoretical linguistics and computer science 0 Formal language theory Regular expressions Regular expressions a match the letter a X match zero or more X s a z match any lowercase letter X match one or more X s A Z match any uppercase letter r X match zero or one X 0 9 match any digit 9 01 82 7 3645 match any digit Xm n match at least in but no more than n X s aeiouAEIOU match any vowel X Y match an X or a Y A aeiouAEIOU match anything except a vowel A Za z match any letter or a hyphen match any one character Regular expressions 0 Metacharacters can be escaped using 0 Some ordinary characters become metacharacters d match a digit s match a Whitespace character b match a word boundary Regular languages 0 Some basic notions alphabet 2 empty string 6 empty set E 0 A formal language L over an alphabet Z is a set of finite strings of symbols drawn from Z Regular languages The empty set E is a regular language Va E ZUS a is a regular language If L1 and L2 are regular languages then so are 0 L1 12 6le E Llyy E la the concatenation of L1 and L2 0 L1 U L2 the union or disjunction of L1 and L2 Li the Kleene closure of L1 the set of all finite strings including 6 Whose characters are members of L1 Nothing else is a regular language Regular languages 0 Some theorems Any finite language is regular Regular languages are closed under intersection If L1 and L2 are regular languages then L1 n L2 is a regular language Regular languages are also closed under difference complementation and reversal All regular expression operations except backreferences can be built up from concatenation union and Kleene closure Regular languages Finite State Automata 0 Some questions 0 A finite state automaton is an abstract machine that recognizes a class of strings 0 Is a language L a regular language 0 FSAs consist of states and transitions 0 Is a language L the empty language 1 d 1 v 0 There is one initial state or start state and one or O Are two anguages L1 an L2 equlva em more accepting states or nal states 0 Recognition problem Is a stringX an member of a languageL a a e b 0 c G Finite State Automata Finite State Automata 0 Beginning with the start state and FSA reads symbols 0 Beginning with the start state and F SA reads symbols off a tape and takes the corresponding transitions If off a tape and takes the corresponding transitions If the FSA is in afinal or accepting state at the end of the the FSA is in afinal or accepting state at the end of the tape the string is accepted tape the string is accepted 0 Finite State Automata 0 Beginning with the start state and FSA reads symbols off a tape and takes the corresponding transitions If the FSA is in afinal or accepting state at the end of the tape the string is accepted Finite State Automata 0 Beginning with the start state and F SA reads symbols off a tape and takes the corresponding transitions If the FSA is in afinal or accepting state at the end of the tape the string is accepted Finite state automata 0 We can also represent an FSM as a state transition table 09 DFSAs 0 A finite state automaton is the tuple Q 2 tie F 5 0 Q a finite set ofN states qo qi qt 0 Z a finite input alphabet of symbols 0 qo the start state 0 F the set of final states F Q Q 0 8qi the partial transition function 8 Q X Z gt Q 0 A deterministic FSA DFSA can only be in one state at a time 0 In a complete FSA the transition function 8 is a total 0 Recognition problem function function DVRECOGNIZEUape machine returns accept or reject 0 Any FSA can be made complete by adding a sink state MexEBegiuuiug om e cwrentestate Huitial state of machine 00p ifEud of input has been reached then if curreutrslahe is an accept slave then return accept e se return reject elsiftranszltilmetablehwrentestatetapehhde is empty then r turn re39ect se currentestate e trans12127quotetablecurrentrstatetapebhdeyd index lt index 1 NFSAs NFSAs Nonrdeterministic finite state automata NFSAs 0 Recognition for NFSAs is slightly more complicated generalize DFSAs 0 We need to disunguish search states from nodes FSA 0 transition function yields a set of states states 0 epsilon transitions 0 Maintain an agenda of search states partial solutions to be explored 0 More formally 0 transition function 8 Q X 2 U 6 39 HQ function NDV RECOGNIZEmp2 mmne returns acceptor retect agEMaz Inmal state of machine begmmng of tape 14 me matte lt NExrtagemta loop if ACCEPrrSTATE7cwmmr2wchrue retums true then return accept e agemtaeagemta u GENERATEV NEWVSTATEScwmmr2alchr1I2 if agemta ts empty then return retect e cwremrsewchrstme e NExrtagemta end function GENERATEVNEWVSTATEScwmmr1I2 returns a set of Search states cwremrmde tithe node the current searchrstate ts m xmtem the potnt on the tape the current searchrstate ts lootong at return a 1tst of search states from tmnsttton table as follows IrwxnonrmMeCwmmrmdeg mm U trwmonrmbl cwmmrmde mp2 1mm 1 1m 1 function AccEf vfr STATE72alchr1I2 returns true or false c 72 7 tie ethe node searchrstate ts tn xmtem the potnt on the tape searchrstate ts lootong at ifxmtems at the end of the tape and Cmrwllrmde ts an accept state of machtne en return true else return false NFSAs 0 Depending on how the agenda is maintained we can perform a breadthfirst or depthfirst traversal of the search space 0 Stack last in first out gives us depth first search 0 Queue first in first out gives us breadth first searc More complex heuristics can be used to guide search A best first NFSAs Q a 4 6 5 7 8 NFSAs e 1 2 t 2 Hanan 6 t 3 BEBE 4 4 J ff 5 MMMHH I ti NFSAs Finite state automata 0 NFSAs with 6 transitions can be converted to epsilon 0 Some more theorems free form 6 closure 0 If a language is accepted by some NFSA then it s also accepted by some DFSA a 0 If a language is accepted by a DFSA then it s regular gt 0 If a language is regular then it s accepted by an NFSA 0 Together these show the equivalence of DFSAs lt3 lt3 NFSAs and regular languages 0 KnuthrMorrisrPratt KMP algorithm to find substring match 0 Regular expressions for searching Unix perl python etc Homework Markov models 0 Read Jurafsky 8 Martin Chpter 6 pp 173193 0 An HMM MltY S A B consists of 0 Google Summer of Code 0 a output alphabet Y1 b http Wikipxthomorg mom Summerofcode 0 a state space S1 c with a unique initial state httpcodegooglecomg soc 20084 50 O a transition probability distribution As ls 0 an emission probability distribution BQ ls 0 HMMs are equivalent to weighted nite state automata with outputs on states Markov models Markov models 0 We want to assign a probability to a sequence of output 0 Given an HMM M the probability of a state sequence symbols Sso 5 is 0 Given an HMM M and a state sequence S so t s the probability of an output sequence O01 133le HPSlS1 1 0 is t t HASS1 POSM39 HP0ilsiM39 1 1 t HBoilsi 0 In general with HMMs we only see the output 1 sequence and the state sequence is unknown Marginalization We know how to calculate POSlM but we want PO M Ifwe knew PS l OM then we could use the chain rule We can also get rid of S by marginalization 13 le ZPOSlMJ s POlM is the marginal probability of 0 given the model M Marginalization is a common trick in Bayesian modeling Markov models 0 Given an HMM M the probability of an output sequence O01 HOW of 1s HOMO S 2POlSMPSlM S t t 21 130ii5i HAUiiSiil s i1 i1 t ZHBWJSJAUJSFD S 1 Forward algorithm Conceptually simple but computationally expensive requiring2 l Si 1 ls l T multiplications where T is the length of the sequence and l Si is the number of states in the HMM The problem is that there are a lot of possible state sequences Fortunately they share a lot of common subrsequences dynamic programming We can represent partial state sequences as a trellis Trellis Forward algorithm We define 151 as the probability of being in state s at position i Mi P017e70iysi SlM Base case 151 1 ifs50 otherwise and 151 0 We can recursively calculate as i 2A sls B0i s as i 7 l 5 Finally at the end P017A70T 20190 SSS Forward algorithm l l a N rug5 am a bo Visualizing the computation of a single element orr in the trellis by summing all the previous values oH weighted by their transition probabilities o and rrurln39plying hy the I quotm U r abilities are 0 so not all previous states will contribute to the forward prolnbility of the current state Hidden states are in circles observations in squares Shaded nodes are included in the probability computation for ar 1 Start and end states are not shown Forward algorithm We get the answer with only 2 l S 2T multiplications Partial sums could as well be computed right to left backward algorithm or from the middle out sU PomoTs S M In general for any position i MOMI 2 asi si sES This algorithm could be used to eg to identify which M is most likely to have produced an output sequence 0 Decoding 0 A more common task is decoding or finding the most likely state sequence given an output sequence 0 Used for most noisy channel model applications tagging speech recognition etc 0 The obvious brute force and greedy algorithms fail miserably 0 An improvement uses a variant of the forward algorithm to find the most likely state at each position a argmax aim30 sES but this can yield strange sequences Viterbi algorithm A better method uses dynamic programming to efficiently find the optimal state sequence We define 55035111 13517 wsiilaolvuwoiilvsiSlM 1 lil This is the probability of the most likely path leading to state s at position 139 At the first position 651 1 ifs50 and 651 0 otherwise Viterbi algorithm 0 Again we proceed recursively 55039 l m aXAU sf B0i s 55039 0 Since we want to know the identity of the best state sequence and notjust its probability we also need to store a backpointer MU l argmaxALvls B0i s 55 i 0 Finally when we reach the end of the sequence we can follow 1 backwards from the most likely final state Homework 0 Read Jurafsky and Martin 2nd ed Chapter 22 Beyond tagging 0 Tagging is an instance of the more general problem of sequence labeling Other interesting problems can be solved in the same way Word sense disambiguation is a kind of semantic tagging 0 The representative put the chair on the table I chair piece of furniture or committee leader I put on the table place on furniture or make a proposal WSD can be performed using the same methods as tagging but much more difficult Named entities PER Wolff currently a journalist in LOC Argentina played with PER Del Bosque in the final years of the seventies in ORG Real Madrid Labeled bracketing is a kind of tagging BiPER LPER BLOC IiLOC BiORG IiORG 0 Challenges 0 Many unknown words Bayan Olgiy Nambaryn Enkhbayar I Ambiguity Washington can be a person location or organization Shallow parsing Chunkers divide a sentence up into grammatical units The representative put the chairs on the table AT NN VBD AT NNS lN AT NN BVN er O BVN er O BVN er O The representative put the chairs on the table Multiple chunk taggers can be chained to produce a shallow parser Semantic role labeling A shallow parser can identify the main parts of a sentence and their connections to each other allowing the meaning to be extracted who did what to whom Information extraction 0 A cascade of sequence labelers n be assembled into an information extraction system Start with a relevant teX Citing high fuel priees Uniled Airlines said Friday it has inereased fares by 5 per round trip on igth is some ein39es also served by loweracostcaniers Arneriean Airlines a unit ofAMR Corp immediame malened the move spokesman Tim Wagner said Uniled a unit ofUAL Corp said the inerease took effect Tnnrsday and applies to most ronles where it eornpeles against discount earriers snel as Chicago to Dallas and Denver is San Fianeiseo I Nam Cl entity recognition reference resolution relation detecti event detection temporal analysis template f ing I A cascade of sequence labelers Information extraction I1 be assembled into an information extraction syste Start with a relevant teX Citing high fuel prices United Airline said Fri it has increased fares by 6 per round tnp on ights to 39 39 some ci ie also served by lowerrcost earners American competes against discount canieis such as ago to Da1 s and Denver to San Fiancisco FARErRAISE ATTEMPT LEAD AIRLINE UNITED AIRLINES AMOUNT 6 EFFECTIVE DATE 2006710726 F 0LLOWER AMERICAN AIRLINES Information extraction 0 FASTUS system SR1 uses POS taggers named entity recognition chunk taggers etc to guess the structure of sentences from news stories Finite state transducers used to map parts of sentences to news story templates Filled in templates can be used for information extraction text summarization question answering Information extraction Nn tep Description 1 Tokens Transfer an input stream of characters into a token sequence 2 Complex Words Recognize multieword phrases numbers and proper names 3 Basic phrases Segment sentences into noun groups Verb groups and particles 4 Complex phrases Identify complex noun groups and complex Verb grou s 5 Semantic Patterns Identify semantic entities and eVents and in sert into templates 6 Merging Merge references to the same entity or eVent from different parts of the text ls of processing in FASTUS Hobbs et al 1997 Each level extracts a e speci c type ofinformation which is then passed on to the next higher leVe Information ex raction PerformerrOrg prerlomtion locname PerfrOrgrSuf x PerformerrNoun nationality city prelocation Pel39fDI39IIBFNDUIH PerfrOrgrSuf x nationality symphony em Canadian American Meximn San Fiancisco London Bridgestone Sports Co said Friday it has set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be shipped to Japan The joint venture Bridgestone Sports Taiwan Co capitalized at 20 million new Taiwan dollars will start production in January 1990 with production of 20000 iron and metal wood clubs a month Bridgestone Sports Co said Friday it has set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be shipped to Japan The joint venture Bridgestone Sports Taiwan Co capitalized at 20 million new Taiwan dollars will start production in January 1990 with production of 20000 iron and metal wood clubs a month Phrase 39Iype Phrase rupany Bridgestoue sports Co Verb Group said Noun Group Friday Noun Group it er Group bad set up Noun Group a joint venture Preposition in Location Taiwan Preposition witb Noun Group a local concern Conjunction and Noun Group aJapanese trading bouse Verb Group to produce Noun Group golf clubs Verb Group to be shipped Preposition to cat39 Japan e s The outpu of Stage 2 of the FASTUS baslcrphmse extractor whlch uses nite state rules of the sort described by Appeltand Israel 1997 Bridgestone Sports Co said Friday it has set up a joint V ure in Taiwan with a local concern and a Japanese trading house to produce golf c s to be shipped to Japan The joint venture Bridgestone Sports Ta an Co capitalized at 20 million new Taiwan dollars will start production in Jan ry 1990 with production of 20000 iron and metal wood clubs a month 39Rxnplalelslot Value 1 RELATIONSHIP TIEVUP ENTITIEs Bridgeslone Spons Coquot a local concernquot a Japanese trading housequot 2 ACTIVITY PRODUCTION PRODUCT golf clubsquot 3 RELATIONSHIP TIEVUP JO INTVENTU RECOMPA NY Bridgeslone Spons Taiwan Coquot 2X 4 ACTIVITY PRODUCTION OMPANY Bridgeslone Spons TaiwanCoquot STARTDATE DURING January 1990 5 ACTIVITY PRODUCTION PRODUCT iron and metal woodquot clubsquot The ve pam39al nemplates produced by Stage 5 of the FASTUS system These templates wiII 39 39 In Fig 2226 on page 766 Information extraction San Salvador 19 Apr 89 ACANiEFE 7 TEXT Salvadoran Presidentielect Alfredo Cristiani condemned the terrorist killing of Attorney General Roberto Garcia Alvarado and accused the Farabundo Marti National Liberation Front FMLN of the crime Garcia Alvarado 56 was killed when a bomb placed by urban guerrillas on his vehicle exploded as it came to a halt at an intersection in downtown San Salvador Vice Presidentielect Francisco Merino said that when the attorney generalls car stopped at a light on a street in downtown San Salvador an individual placed a bomb on the roof of the armored vehicle According to the police and Garcia Alvarado7s driver who escaped unscathed the attorney general was traveling with two bodyguards One of them was injured Information extraction 0 Database entry Incident Date a 19 Apr 89 Incident Location El Salvador San Salvador CITY Incident Type Bombing Perpetrator Individual ID urban guerrillas Perpetrator Organization ID FMLN Perpetrator Organization Suspected or Accused by Authorities FMLN Physical Target Description ehiclequot Physical Target Effect Some Damage vehicle Human Target Name Roberto Garcia Alvaradoquot Human Target Description attorney generalquot Roberto Garcia Alvaradoquot quotdriverquot bodyguards Human Target Effect Death Roberto Garcia Alvaradoquot No Injury quotdriverquot Injury bodyguards An Extensible Regular Expression Workbench Rob Malouf Department of Linguistics and Oriental Languages San Diego State University Education Center on Computational Science and Engineering 15 December 2003 Computational linguistics Computational linguists study automatic processing of human language Descended from computer science and linguistics plus philosophy psychology neuroscience etc As computational linguistics matures as a field it has begun into influence related disciplines bioinformatics economics Computational linguistics program masters program offers a number of courses for undergraduates LING 365 LING 571 LING 581 Computational linguistics o A fundamental insight of linguistics is that language can be seen as a system of patterns 0 Even though language is infinite the patterns that we find can be expressed using finite resources a aa aaa aaaa aaaaa 0 Computational linguists find ways of representing these patterns that allow efficient storage and processing Regular expressions Regular expressionsR Es are one class of pattern description that is widely used in computational linguistics Their origin is in the McCulochPitts neuron 1943 formalized by Kleene 1951 and Rabin and Scott 1959 Phonological and morphological rules can be written as REs Syntactic rules can be usefully approximated as REs Equivalent to nite state automata Regular expressions 0 The simplest REs are just concatenations of symbols the big house 0 Symbols can be combined with disjunction thelalan biglwhitelold houselcar o Arbitrarily long sequences can be built up using iteration thelalan biglwhitelold houselcar Regular expressions 0 One important use of REs is to extract relevant linguistic examples from a corpus o Suppose we want to find examples of the WXDY Construction What is he doing here at this hour What are you doing here What are they doing on Mars What are you doing out of the block An RE that matches these examples verbWwhat s re isl arebAbdoingbA Regular expressions Teaching students to read write and process REs is a core part of the computational linguistics curriculum REs are built up using a very simple syntax but get complicated in a hurry Ol9 I 12 09 3Ol Ol9 Il012 l9l20dd Some RE matching rules backreferences leftmostlongestmatch can be unintuitive Corpus searching tools use REs but don t provide enough feedback RE Workbench 0 Languages like perl and python allow more exploration but require programming skills 0 The Regular Expression Workbench will provide students with a tool for learning about REs in a controlled environment httpedcenter2sdsueduRegEXWBinputpagehtm 0 Next steps polish user interface add online documentation show partial matches Homework 0 FSA programming assignment due Friday 0 Read 0 Beesley and Karttunen chapters 1 and 2 0 Jurafsky and Martin chapter 3 Morphology Suppose we want to build a machine that can process words in English or any other language But after all is not life itself lled with wonders mysteries and unfathomabilities far beyond our discursive understandingquot unfathomabilities LU l fath0mableitys Spelling checker grammar checker Information retrieval information extraction Text to speech Adaptive learners dictionaries Machine translation Morphology 0 Suppose we want to build a machine that can process words in English or any other language But after all is notlife itself filled with wonders mysteries and unfathomabilities far beyond our discursive understanding unfathomabilities unfathomableitys Turkish uygarlasnrarnadildarirnizdanrnissinizcasina WW I in mm dik 1w v 39139 J 1 397 behaving as if you are among those whom we could not civilize Word lists Concatenation union determinizauon and minimization are tools for efficiently representing word lists as FSAs If we take the regular expression book I books I quick I quicker I quickly we get the minimal FSA Word lists 0 Sometimes called a letter tree this kind of FSA makes dictionaries easy to store and searc acrobat acrobatic acrobats acronym acronyms acropolis baseball book books football quick quicker quickly English morphology 0 Word lists aren t sufficient for morphological analysis 0 In ectional morphology creates new variants of base words lexemes lemmas book books walk walks walking walked 0 Derivational morphology creates new lexemes bookish bookeU walker walkable 0 Compounding combines lexemes bookstore bookbinder bookmaker dog walker sleep walker crab walker English morphology 0 Noun lexicon regnoun EfOX cat dog aardvark irregplnoun a geese sheep mice irregsgnoun Egoose sheep mouse plural a s 0 Morphotacucs noun a regnounplural lirregplnoun l irregsgnoun English morphology 0 Regular verbs in English have four distinct forms stem walk merge try map is form walks merges tries maps ring participle walking merging trying mapping Past form walked merged tried mapped red participle walked merged tried mapped 0 Irregular verbs have up to five stem eat catch cut 5 form eats catches cuts fing participle eating catching cutting Past form ate caught cut red participle eaten caught cut English morphology English verbs 0 Verb lexicon 0 Some regular verb stems regrverbrstem irregrverbrstem irregyDastrverb walk roll talk plan dance bounce wa cut caug t speak ate talk sing eaten impeach sang spoken past pastrpart presrpart 35g fed fed ing is 0 Verb morphotacucs verb E inegpastverb regverbstempast pastpart regverbsteml inegverbstem Prespart 35g English verbs English verbs 0 Regular verb endings 0 Concatenate these FSAs to get an FSA that recognizes regular in ected verbs z s ed mg Oat 0 W English verbs 0 We just list all the irregular verb forms see saw seen seeing run runs ran running break broke breaking breaks English verbs 0 Union the regular FSA with the irregular FSA Letter trees 0 Verbs are either regular or irregular and regular verbs consist of a verb stem plus a verb ending 0 If we have an FSA for regular verb stems and one for regular verb endings then if we concatenate them we get a single FSA for regular verbs 0 If we have another FSA for irregular verbs the union of that with the one for regular verbs gives us an FSA that recognizes all verbs 0 A letter tree is an FSA which allows efficient storage and retrieval of very large word lists 0 Letter trees like all FSAs can be combined Via operations like concatenation and union and the results can be determinized and minimized Orthographic rules 0 So far we can tell thatfoxs must be a word but 0 we don t know that it is the plural of the noun fox 0 we would guess that it s spelledfoxs 0 The same problem comes up in speech bats bags boxes 0 To handle these orthographic or morphophonological rules we need another kind of finite state machine that accepts pairs of strings Finite state transducers match up pairs of strings Finite state transducers Finite state transducers are like twortape FSAs each transition is labeled with an input symbol and an output symbo An FST can be used to 0 accept or reject a pair of strings 0 given an input produce an output 0 given an output produce an input Finite state transducers 0 Given an FST ab c ba 0 We get a regular relation a relation between two regular languages abccba 0 We can accept or reject pairs or generate one member from the other aaaccbbbb bbbccaaaa Homework 0 Read chaptets 1 and 2 in Jurafsky and Martin Regular languages I The empty set E is a regular language Va 6 EU s a is a regular language If L1 and L2 are regular languages then so are 0 L1 39L2 xyix E Lhy 6 L2 the concatenation of L1 and L2 0 L1 U L2 the union or disjunction 0fL1 and L2 0 LT the Kleene closure of L1 the set of all finite strings including 6 whose characters are members of L1 Nothing else is a regular language Regular languages I Some theorems 0 Any finite language is regular 0 Regular languages are closed under intersection IfL1 and L2 are regular languages then L1 0 L2 is a regular language Regular languages are also closed under difference complementation and reversal All regular expression operations except backreferences can be built up from concatenation union and Kleene closure DFSAs I A finite state automaton is the tuple Q 2 qo F 5 I Q a finite set ofN states qo qi qN I 2 a finite input alphabet of symbols 0 qo the stat state I F the set of final states F E Q I 8qi the paItial transition function 8 Q X 2 gt Q A deterministic FSA DFSA can only be in one state at a time DFSAs 0 Recognition problem funetinn DrRECOGNIZEUape machine returns accept or reject index lt Beginning of tape currentistate lt Initial state of machine 100 if End of input has been reached then if currentistate is an accept state then return accept els return reject elsif transitionitabldcurrentistatetape ndexjj is empty then return reject currentistate lt transitionitabldcurmnmtatetape ndexj 39 exeindex 1 end NF SAS 0 Nonideterministic finite state automata NFSAs generalize DFSAs 0 transition function yields a set of states I epsilon transitions 0 More formally 0 transition function 8 Q X 2 U 6 39 TQ function NDeREcoGNizwape rmchine returns accept or reject agenda lt Iuitial state of machine beginning of tape cunemnemmmm NEXTagenda l oop if Ac 0 EFTV STAT E7cunemremchrmte returns true then return accept eke i i r irogendo is empty then return reject be currentrxeamhrxmte 47 NEXTagenda end function GENERATEVNEWVSTATESCunerLtrmte returns a set of search states cunenrnude Ellie node the current sear hastale is in mow the point on the tape the u nt searcnstate is looking at return 39 arch states from tmusitiou table 3 follows traititinmmblecunemrnudee 1 i dzx U traititinmmblecunemrnude rapeindall index I function ACCEFTVSTAT E7earchrmte returns true or false cunenrnudm the node searchstate is in index Hne point on the tape searcnstate is looking at return true return false Finite state automata Some more theorems If a language is accepted by some NFSA then it s also accepted by some DFSA If a language is accepted by a DFSA then it s regular If a language is regular then it s accepted by an NFSA 0 Together these show the equivalence of DFSAs NFSAs and regular languages KnuthiMorrisilDratt KMP algorithm to find substring match Regular expressions for searching Unix perl python etc Subset construction I If a language is accepted by some NFSA then it s also accepted by some DFSA 0 Proof by subset construction I NFSA transition function returns a set of states I We can take each subset of the set of states as a state in a corresponding DFSAs ab 9 Subset construction NFSAs don t have greater expressive power than DFSAs even though the recognition problem is harder In the worst case every subset of states in the NFSA becomes a state in the DFSA Implementation details can make a big difference in the nal size of the DFSA van Noord 2000 Because of the possible exponential explosion most practical FSA recognizers eg PCRE use NFSAs and backtracking Others use a hybrid algorithm grep awk Some references here httngswtchcomgrsczregexpz Kleene s Theorem I If a language is regular then it s accepted by an NFSA 0 Simple constructive proof by induction 0 If a language is accepted by an NFSA then it s regular 0 Proof is a little more complicated I Find paths through NFSA and convert them into separate regular expressions 0 Doesn t come up as often in practical applications but is very important e g for proofs of closure properties Python implementation 0 On web at http1bulbasdsued1maloufgling58 1fsa1py 0 Documentation at http1bulbasdsued11maloufgling58 11indexhtml Python implementation Interpreted vs compiled Whitespace vs punctuation Dynamic typing Supports many programming paradigms structured functional objectionented etc Rich inventory of data types strings lists tuples dictionaries sets etc Very large standard library http11docsp honorgzlibglibhtml Very very large nonistandard library Python implementation 0 Objectioriented design I Polymorphism I Encapsulation I Inheritance 0 Magic methods I Constructor i ni t 0 Printer r epr 0 Documentation 0 epydoc httngepydocsourceforgenetz Named entities I Named entity recognition systems identify and classify proper names T e nmple People Turing is often considered is be the father of modern computer science oiganimon quot1 39 39 39 39 39 39 The Mr Sunim loop hike begins at the base omelxhine Canyon Location Geoepoliu39eal Entity D 7 Facility Drivers were advised to consider either the Tuppml 222 Bridge or the Linruln 7hnneL The updated Mini Coupe retains its charm and agility Named entlty types wlth examples I Ambiguity FEM Washington was born into slavery on the farm ofJames Burroughs ORG Washington went up 2 games to l in the fouregame series Blair arrived in LDC Washington for what may well be his last state visit In June GPE Washington passed a primary seatbelt law The FAC Washington had proved to be a leaky ship every passage I made Examples oftype ambiguities in the use of the name Washington Named entities Related problems identifying temporal expressions numerical expressions Citing high fuel prices ORG United Airlines said TIME Friday it has increased fares by MONEY 6 per round trip on ights to some cities also served by lowerrcost carriers ORG American Airlines a unit of ORG AMR Corp immediately matched the move spokesman PERS Tim Wagner said ORG United a unit of ORG UAL Corp said the increase took effect TIME Thursday and applies to most routes where it competes against discount carriers such as LOC Chicago to LOC Dallas and LOC Denver to LOC San Francisco Named entities I NER systems typically use a sequence labeling model AmericanBrORG AirlinesIVORG O aO unitO ofO AMRBVORG CorpIVORG O immediatelyO matchedO theO moveO O spokesmanO TimBVPERS WagnerBr PERS saidO 0 Each word in context is represented as a set of features and is classified accordingly The tagging HMMs we looked at used two features current word and previous tag TBL taggers expanded the feature space to include more context Unknown word guessers use addition shape features capitalization suffixes Named entities 0 Features for NER Feature Lexlcal llems stemmed lexlcal llems Shape Chzmcter af xes Explanau39on The token to be labeled stemmed venslon of the target token The orthograpth pattem of the urge Chzmcter level af xes of the mrget a word nd Eurroundmg words an of speech 0 e wor Base phlase chunklabel Gazetteer or nanne llst Presence of word ln one or more named entltyllsts Predlcuve tokens Presence of predlcuve words n Eurroundlng text s Bag of Neslanns Words andor Neslanns ccc he Eurroundlng context Features Cummnnlyusedm memg namsdenmy mungmum sys Slnpe Ennple Lowel Cummings Capllallzed Washington All caps IRA Mixed case eBay Capllallzed chasacles with period H Ends in digi A9 Conlain yp en 1171 Selected shape features Features Label American NNP BNP cap BORG Airlines NNPS INF cap IORG PUNC O unc O a DT BNP lower 0 umt NN INF lower 0 of IN Bpp lower 0 AMR NNP BNP upper BORG Corp NNP INF cappunc IORG PUNC O unc 0 immediately RB BADVP lower 0 matc e VBD va ower 0 e DT BNP lower 0 move NN I lower 0 PUNC O unc O spokesman NN BNP lower 0 Tim NNP INF cap BPER Wagner NNP INF cap IPER said VBD va lower 0 PUNC O unc O Simple wordebyrword feature encoding for NER Named entities 0 Sequential classifier 5 o BORG f I Classifier i 7 i N V IN NNP hidi pu N VBD BPP BNP INP O BADVP BVP lower upper cappunc punc lower lower I a I unit of I AMR I Corp I I immediately I matched Figure 229 Named entity recognition as sequence labeling The features available to the Classi er during training and Classi cation are those in the boxed area 0 Independence assumptions 0 Accuracy precision recall F score Named entities Unknown terms will be an even bigger problem for NER Semiisupervised semiiclassification I First use highiprecision rules to tag unambiguous entity mentions Then search for subistring matches of the previously detected names using probabilistic string matching metrics Consult applicationispecific name lists to identify likely name entity mentions from the given domain Finally apply probabilistic sequence labeling techniques that make use of the tags from previous stages as additional features Relation detection 0 Given that we ve identified the named entities in a text what are their relations to each other Relations Examples Types Af liations Personal nmnied m Mather uf FER a FER Orgadzzdonzl pukexnmnfm pvexident uf FER a ORG Artifactual awne invented pmdure FER ORG a ART Geospzu39zl mxirnity near an autxkin LOC a LOC Dimcdonzl uutheaxtuf LOC a LOC Parka Orgadzzdonzl d unit u paventuf ORG a ORG Poliu39czl annexed arquired GFE a GFE Semantic relations with examples and the named entity types they involve 0 Two phases first identify pairs of related entities then classify relations Relation detection This approach makes relation detection into a classification problem Each named entity pair is converted to set of features I Named entity features entity types head words nonihead wor s I Context features unigrams and bigrams around entities between entities distance I Syntactic features presence of particular syntactic constructions dependency relations Relation detection NP NP PUNC NP NNP NNPS 1 NF PP American Airlines DT NN 1NNP a unit of NNP NNP AMR Inc Figure 2214 An apposmve construc on expressing an apartof relan39on Homework 0 Read Jurafsky 8 Martin Chpter 6 pp 1737193 0 Google Summer of Code httpz wikip honorgg moigg SummeIOfCode httpz 1 code goo glecomg soc120081 Markov models I An HMM MltX S A B consists of I a output alphabet Y1 b I a state space S1 c with a unique initial state 50 I a transition probability distribution As s I an emission probability distribution By s I HMMs are equivalent to weighted finite state automata with outputs on states Markov models I We want to assign a probability to a sequence of output symbols 0 Given an HMM M and a state sequence Sso St the probability of an output sequence O01 Or is P0SM Hpmisim i1 t HB0i Si i1 Markov models 0 Given an HMM M the probability of a state sequence S50 St is t PSM HPCYi Siil i1 t HACYi Siil i1 I In general with HMMs we only see the output sequence and the state sequence is unknown Marginalization We know how to calculate POS M39 but we want P0 W If we knew PS OM39 then we could use the chain rule We can also get rid of S by marginalization P0lM Hamn S PO M39 is the marginal probability of 0 given the model M Marginalization is a common trick in Bayesian modeling Markov models 0 Given an HMM M the probability of an output sequence O01 Or is P0M 2P0SM S 2P0SMPSM S t t 2HBUi Si HAUi Siil s i1 i1 t XHBWJSJAUJSL39A s i1 Forward algorithm Conceptually simple but computationally expensive requiring2 S 1 S T multiplications where T is the length of the sequence and S is the number of states in the HMM The problem is that there are a lot of possible state sequences Fortunately they share a lot of common subisequences dynamic programming We can represent partial state sequences as a trellis Trellis 4 7 94 4 Forward algorithm 0 We de ne asi as the probability of being in state 5 at position i Mi P017 70i75i SW 0 Base case as11 ifs50 otherwise and as10 I We can recursively calculate 053039 2AssBais ZSi7 1 0 Finally at the end Helium ja SSS Forward algorithm p 110 at1i aij quot3Fquot I qs I a I q 2 I a I q1 Ot1 Figure 68 Visualizing the computation of a single element 09139 in the trellis by summing all the previous values at1 weighted by their transition probabilities a and multiplying by the observation probability bi0t1 For many applications of HMMs many of the transition prob abilities are 0 so not all previous states will contribute to the forward probability of the current state Hidden states are in circles observations in squares Shaded nodes are included in the probability computation for at Start and end states are not shown Forward algorithm We get the answer with only 2 S 2T multiplications Partial sums could as well be computed right to left backward algorithm or from the middle out 30 P01770T7Si S M In general for any position i P0M 2 1143330 56 This algorithm could be used to eg to identify which M is most likely to have produced an output sequence 0 Decoding A more common task is decoding or finding the most likely state sequence given an output sequence Used for most noisy channel model applications tagging speech recognition etc The obvious brute force and greedy algorithms fail miserably An improvement uses a variant of the forward algorithm to find the most likely state at each position 6i argmax 115039 BS SSS but this can yield strange sequences Viterbi algorithm A better method uses dynamic programming to efficiently find the optimal state sequence We de ne Mi S InaSX P8177Si71701770i7175i SW 1gtgt iil This is the probability of the most likely path leading to state 5 at position i At the first position 551 1 ifsso and 551 0 otherwise Viterbi algorithm 0 Again we proceed recursively 53139 1 maxAms B0iS Mi S I Since we want to know the identity of the best state sequence and not just its probability we also need to store a backpointer 1130 1 argmaXAms Bais 53139 0 Finally when we reach the end of the sequence we can follow 1 backwards from the most likely final state Noisy channel model 0 One of Shannon s original motivations for developing information theory was to improve commuication in the face of noise I The noisy channel model I sender creates a message 5 0 message is transmitted over a communication channel which produces random changes to the message 0 receiver gets a message R I The challenge is to guess what the original message 5 was based on R and our knowledge of source and noise distributions Noisy channel model We can use conditional entropy H X Y to measure the amount of information conveyed by the communication channel HS R is a measure of how surprised you would to be to find out that the message being sent is 5 given that you received R For an ideal telegraph H S R 0 For an ideal cypher H S R HS For most real communication channels H S R is somewhere in between Noisy channel model Noisy channel model developed by Shannon to describe optimal error correcting codes We can use noisy channel models to solve general decoding problems given R guess at S For stochastic NLP we imagine the observed text is the output of a noisy channel The challenge is to nd a decoder which minimizes the average surprise in finding out what the underlying message was given the observed text Formally this comes down to minimizing H S R Noisy channel model Decoding Via Bayes Theorem S argmaXPSR S ar max PSPR S gs PR argmaXPSPRS S We need a model of underlying message probabilities PS plus a noise model PRS Noisy channel model First used for translation by IBM s TJ Watson research lab in 1970 s 5 PFrench English gt French P English Many different problems can be posed as a noisy channel model Noisy channel models are applicable whenever we want to guess an unknown thing given a known thing Since we need to calculate the probability of an unobserved sequence we need to use a Hidden Markov Model Markov models I An HMM MltX S A B consists of I a output alphabet Y1 b I a state space S1 c with a unique initial state 50 I a transition probability distribution As s I an emission probability distribution By s I HMMs are equivalent to weighted finite state automata with outputs on states Markov models b m am Am munux M n wn m Hg 61 Insteadofusmga 6 spe start V state probabllmes The gure m b shows sample probabllmes Markov models B B 1 P1 IHOT 2 P1 IGOLD 5 P HHS 4 P IGOLD 4 F I H 4 P 3 GOLD 1 I Hidden Markov Model for relating numbers of ice creams eaten by Jason due obsemdons to due weather H or C due hidden variables For mi example we are not using 2quot ML m u 1 Markov models I HMMs are widely used for tagging speech recognition machine translation etc 0 Useful whenever we want to label a sequence of elements time series analysis bioinformatics O Other uses eg linear interpolation Markov models I We want to assign a probability to a sequence of output symbols 0 Given an HMM M and a state sequence Sso St the probability of an output sequence O01 Or is P0SM Hpmisim i1 t HB0i Si i1 Homework 0 Read Jurafsky and Martin 1st or 2nd ed Chapter 5 I Midterm sic exam 0 Out Friday April 11 0 Due Friday April 18 0 Open book open notes but work on it by yourself 0 Download and read fsa3 py as soon as possible Transformationbased learning Transformationibased learning is a nonistatistical technique for increasing the context TBL starts with a baseline tagger and generates error correcting transformations e g replace VBD with NN after AT On each iteration pick the rule with reduces the error rate the most and repeat until the error rate stops improving Transformations can be conditioned on wider range of contexts than nigrams without running into sparse data problems Transformationbased learning I For example a baseline tagger might include the probabilities PNNrace 98 PVBrace 02 I This will incorrectly give us sequences like iSVBZ expectedVBN toTo raceNN tomorrowNN I But one of the rules we ll learn is Change NN to VB when the previous tag is To Transformationbased learning 0 Rule templates The preceding following word is tagged 1 The word two before after is tagged 1 One of the two preceding following words is tagged 1 One of the three preceding following words is tagged 1 The preceding word is tagged 1 and the following word is tagged W The preceding following word is tagged 1 and the word t 0 before after is tagged W Figure 520 Brill39s 1995 templates Each begins with Change tag a to tag b when The Wriables a h z and w range over partsrofrspeec Transformationbased learning 0 Learning algorithm function TB Lmrpu returns tramfnrmsqueue I ZE7WITH7MOSTLIKELYTAGSCUrpu until and condition is met do template e GENERATE7POTENTIALiRELEVANTiTEMPLATES A f f n39r nu TJI APPLYiTRANSFORMbetrramf0rm mrpus ENQUEUEbetZ ramf0rmrule tramfnrmsqueue returntramf0rmqueue Transformationbased learning 0 Learning algorithm function GETVB ESTVTRA NS FO RMcmpI4 templates returns mzrufmm for each template in template imtaztee cme lt GETVBESTVINSTA NcElempas template if more gt bestrtthsfmm cme then bestrmzrufmm e immnce Ccmz retum bextrtkufmm function GETVBESTVINSTANCElcmpHJ template returns tkufmm for framrtagefmm mg to tagquot do for posefmm 1 to empaysize do ifconeclegpo form ampamp Cmnznhtlz pgx fmmrtlzg numrgoodr trwtxfmmi canentrtagpo71 I c 7 m 1 r 1 numrbadrtmrufmmicurrentrta poxil l n M i nmnrgoodrtmrufmmibextrl r nmnrbadrtmnxfmrdbestrl gt extrinsmncescme then betmlelt Chaugetag from fmmtagto tortagifprevlag is 17231quot 1 mtnrnbet Tr nsform tionbased learning 0 Learning a ithm procedure APPLYTRANSFORMUmnSfarm corpus for pose from 1 to corpusisize do if currenHagpaSbest7ruleifmm amp L currentitagwasi1best7mleiprev currentitagpaslt bestimleim Transformationbased learning I Results Change tags F In To dition Example 1 VB Previous mg is TO mro mceNN a VB 2 VBP VB One of the previous 3 tags is MD mightMD mashVBP a VB 3 VB One of the previous 2 tags is MD mightMD not replyNNB VB 4 VB NN One of the previous 2 tags is UT 5 nevious 3 tags is VBZ VBD VBN The rst 20 nonlexicalized Lmnsfonnau39ons from Bn111995 0 Word accuracy is around 97 which very close to the human ceiling 0 Final transforms queue can be compiled into finite state transducer for efficient application HMM taggers I If we make a few simplifying assumptions HMMs provide a tool for adding context I a word s tag only depends on the previous word s tag Markov property I From the training data we need to collect probabilities for tag bigrams Pl2ll1 I And we need PW t M t HMM taggers DET gt NOUN gt VERB gt DET gt NOUN gt PREP gt DET gt 91 l iliH The representative put the chairs on the HMM taggers PNOUN DET 1 l l i l The representative put the chairs on the HMM taggers PN OUN DET VERB gt DET gt NOUN gt PREP gt DET gt lllli put the chairs on the Prepresentative NOUN HMM taggers 0 Markov models can be extended to take more context to account bigrams trigrams etc 0 Problems with Markov taggers 0 unknown words I longidistance relations I independence assumptions 0 More sophisticated statistical models are needed Homework 0 Finite state assignment due Wednesday 0 Read Jurafsky and Martin 2nd edition Chapter 4 Wanna bet Someone is willing the pay even money 1 to your 1 ifyou can roll a four or better Is this a good bet A correct method for answering questions like this was first given in Gerolamo Cardano s Liber de Ludo published until 1663 Wanna bet 0 Cardano s method is to compare the number of favorable outcomes with the number of unfavorable outcomes Lose Win a e e a e e 0 The odds ofwinning are 33 or 11 Wanna bet Now someone offers you 2 to your 1 if you can get at least one six on two throws of a die Is this a good bet To use Cardano s method we need to know how many possible outcomes there are There are six outcomes for one throw and each of those could followed up by six possible second throws 36 How many times to we win There are six ways to win on the first throw and five ways to win on the second 5 6 1 1 So the odds are 2511 against us or about 231 Pref Cardano gambler39s folklore put the odds at 21 Probability 0 Gamblers tend to talk about chance outcomes in terms of odds the ratio of favorable outcomes to unfavorable outcomes 0 In other contexts it is usually easier to talk about probability the ratio of favorable outcomes to total possible outcomes f p 7 For impossibilities there are no favorable outcomes p 7 Vi 0 For certainties all outcomes are favorable p rt Random experiments 0 To make reasoning about more complicated games easier we want a mathematical model which abstracts away from some the details 0 A random experiment or trial is a observation where 0 All possible outcomes are known in advance 0 The actual outcome is not known in advance and 0 The experiment can be carried out repeatedly under identical conditions 0 For a random experiment the set of possible outcomes is the sample space 9 Random variables 0 A random variable X is a function from a sample space to the set of real numbers X 2 IR 0 For example if the experiment is one roll of a die the sample space is 0 For this sample set a random variable X might be the number of pips showing Another random variable Y might be the number of 6 s thrown Events 0 Suppose X is a random variable for an experiment with sample space 9 An event B is a statement about X that picks out a subset of 2 SEQ Xs EB For example if the random variableX is the number of pips on one roll then one interesting event isX 4 the set of outcomes SEQ Xs4 The set of all possible events the event space 9 is the power set of 2 ie the set of all possible subsets ofthe sample space Probability 0 A probability distribution is a function from an event to a real number between 0 and 1 inclusive P 9quot a 01 that satisfies the conditions 0 The probability of a certain event is one PQ 1 0 For any mutually exclusive events A and B PM U B PA PB Theorems 0 The probability of an event and its complement must add up to one PQ 7A 1 7PA 0 The probability of an impossible event is zero P O 0 In the case where all outcomes are equally likely the probability of any one outcome IS PSEQ1 9 Probability 0 Take a standard 52 card deck Then the probability of drawing any particular card is 1 PM 7 E 0 What s the probability of drawing an ace llll l PAQ orAuln orAQ orAO Ti 0 Sum Rule PM or B PA PB whenA and B are mutually exclusive Draw Poker 0 Drawing to an inside straight A o O 0 V 0 Four cards any queen would complete the straight and there 47 cards left in the deck So Pstraight 417 m 009 0 The odds are 434 or about 10751 against making the straight Draw Poker 0 Drawing to an inside straight ush 0 Four cards any queen would complete the straight 0 Nine cards any spade would complete the flush 0 One card queen of spades would complete the straight flush Draw Poker 0 The chance of hitting a straight or a flush is P straight U ush P straight P ush 7 Pstraight ush 4 9 1 E t W E 2 47 The odds are 3512 or about 2921 against making a hand 0 Sum Rule PA or B PA PB 7PA and B Probability 0 Three card monte 0 Guess which card is the red queen 1 P 7 queen 3 0 Ifthe house is paying 2 for your 1 then it s a fair game right Conditional probability 0 Often the sample space 9 is too large to be interesting or even known Instead we only care about a subset of possible outcomes that have some relevant property 0 The conditional probability of an event B given event A is HEW PA B lA Bl XE lA Bl Tm m w w 0 The joint probability PAB is the probability of events A andB occurring together PA F B Conditional probability 0 Someone you have just met has two children What is the probability that they are both girls 0 Someone you have just met has two children At least one of them is a girl What is the probability that they are both girls 0 Someone you have just met has two children The oldest is a girl What is the probability that they are both girls Noisy channel model 0 One of Shannon s original motivations for developing information theory was to improve commuication in the face of noise 0 The noisy channel model 0 sender creates a message S 0 message is transmitted over a communication channel which produces random changes to the message 0 receiver gets a message R 0 The challenge is to guess what the original message S was based on R and our knowledge of source and noise distributions Noisy channel model 0 We can use conditional entropy H X Y to measure the amount of information conveyed by the communication channel 0 HSR is a measure of how surprised you would to be to find out that the message being sent is S given that you received R For an ideal telegraph HS R 0 For an ideal cypher HS R HS O For most real communication channels HSR is somewhere in between Noisy channel model 0 Noisy channel model developed by Shannon to describe optimal error correcting codes 0 We can use noisy channel models to solve general decoding problems given R guess at S 0 For stochastic NLE we imagine the observed text is the output of a noisy channel The challenge is to find a decoder which minimizes the average surprise in finding out what the underlying message was given the observed text Formally this comes down to minimizing HSR Noisy channel model 0 Decoding via Bayes Theorem S argmaxPSR S ar maxP SPR S gs Pa argmaxPS PRS S 0 We need a model of underlying message probabilities PS plus a noise model PR S Noisy channel model First used for translation by IBM s TJ Watson research lab in 1970 s English PEngjsh PFrenchEnglish gt French Many different problems can be posed as a noisy channel mode Noisy channel models are applicable whenever we want to guess an unknown thing given a known thing Since we need to calculate the probability of an unobserved sequence we need to use a Hidden Markov Model Markov models 0 An HMM MltY S A B consists of 0 a output alphabet Y1 b 0 a state space S1 c with a unique initial state so 0 a transition probability distribution As ls 0 an emission probability distribution BQ ls 0 HMMs are equivalent to weighted nite state automata with outputs on states Markov models m 61 mesa of smg a special star state pmbabilmes The gure in b slums sample pmbabdmes Markov models r Hrdden Markov Model for relaung numbers of ice creams eaten by Jason the obsemuons to the weather H or c the hidden mnables For this example we are not using an endrstate rnstead auowrng both states 1 and 2 to be a nal aeeepnng state Markov models Markov models 0 HMMs are widely used for tagging speech recognition 0 We want to assign a probability to a sequence of output machine translation etc symbols 0 Useful whenever we want to label a sequence of 0 Given an HMM M and a state sequence S so elements time series analysis bioinformatics 0 the probability of an output sequence O01 0 is 0 Other uses eg linear interpolation t MODEM HP0ilSi7M39 i1 t HB0iSi i1