Class Note for PSYC 438 at UA
Popular in Course
Popular in Department
This 35 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Arizona taught by a professor in Fall. Since its upload, it has received 26 views.
Reviews for Class Note for PSYC 438 at UA
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
LINGC SCPSYC 438538 Lecture 20 Sandiway Fong Administrivia Midterm Not yet ready Last Time Gentle introduction to probability Important notions sample space events rule of counting weighted events probability conditional probability PAB the importance of conditional probability or expectation in language Just then the white rabbit the expectation is prabbitwhite gt pthelwhite conditional but pthe gt prabbit unconditional Language Models and Ngrams Given a word sequence chain rule how to compute the probability of a sequence of words pW1 W2 W3quot39Wn pW1 pW2W1 pW3W1W2 pWnW1quot39Wn2Wn1 Bigram approximation just look at the previous word only not all the proceedings words Markov Assumption finite length history 1st order Markov Model PW1 W2 W3quot39Wn g PW1 PW2W1 PW3W2PWnWn1 Trigram approximation 2nd order Markov Model just look at the preceding two words only PW1 W2 W3quot39Wn 39 PW1 PW2W1 PW3IW1W2PW4IW2W3PWn IWn2 Wn1 Language Models and Ngrams estimating from corpora how to compute bigram probabilities pWnWn1 fWn1WnfWn1W W is any word Since fWn1W fWn1 fWn1 unigram frequency for Wn1 pwnwn1 fwn1wnfWn1 relative frequency Note The technique of estimating true probabilities using a relative frequency measure over a training corpus is known as maximum likelihood estimation MLE Language Models and Ngrams 39 39 I f t39 Typical Practice 09 unc Ion Logprob calculations used Z V Question 1 Why sum negative log of probabilities Answer Part 2 A BC ogA ogB ogC probabilities are in range 0 1 Note want probabilities to be nonzero region of ogC 39 interest log of probabilites will be negative up to 0 take negative to make them positive Motivation for smoothing Smoothing avoid zero probability estimates Consider pW1 W2 W339quotWn PW1 PW2W1 PW3IW2PWnWn1 what happens when any individual probability component is zero multiplication law OXX O verybrittle even in a very large corpus many possible ngrams over vocabulary space will have zero frequency particularly so for larger ngrams Language Models and Ngrams wmwn bigram 0 Example frequencies vv07 11 I 1 want 1 to 1 eat 1 Chinese 1 food 1 lunc 8 1087 0 13 0 0 0 I 3437 W want 3 0 786 0 6 8 6 want 1215 n1 6 3 0 10 860 3 o 12 to 3256 211 0 o 2 0 19 2 52 eat 938 unig ram Chinese 2 0 0 0 0 120 1 Chinese 213 freq uencies bod 19 0 17 0 0 0 0 food 1506 Lunch 4 0 0 0 0 1 0 lunCh 459 Figure 64 Bigram counts for seven of the words out of 1616 total word types in the Berkeley Restaurant Project corpus of m 10000 sentences 1 11 I 1 vvant1 to 1 eat 1 Chinese 1 focuI 1Iuh 1 I vvant to em Chinese food lunch 0023 0025 00092 0 0094 013 0087 32 00000 0 0 65 0031 0021 0 011 0 0038 0 26 0 0 0 0 0 0049 00092 020 0 0 0 0 0066 0 0021 56 0 0022 0 0049 0037 055 0047 0 0 Figure 65 Bigrarn probabilities for seven of the words out of 1616 total word types in the Berkeley Restaurant Project corpus of m10000 sentences bigram probabilities sparse matrix zeros render probabilities unusable we ll need to add fudge factors ie do smoothing Smoothing and Ngrams sparse dataset means zeros are a problem Zero probabilities are a problem pW1 W2 W3quot39Wn pW1 pW2W1 pW3W2pWnIWn1 bigram mOdel one zero and the whole product is zero Zero frequencies are a problem pWnWn1 fWn1WnfWn1 relative frequency bigram fw1w doesn t exist in dataset smoothing refers to ways of assigning zero probability ngrams a nonzero value we ll look at two ways here just one of them today Smoothing and Ngrams AddOne Smoothing add 1 to all frequency counts simple and no more zeros but there are better methods unigram pw fwN before AddOne must rescale so N size of corpus that total pW fw1NV with AddOne probability fw fw1NNV with AddOne mass Stays at 1 V number ofdistinct words in corpus NNV normalization factor adjusting for the effective increase in the corpus size caused by AddOne bigram pWnWn1 fWn1WnfWn1 before AddOHS pWnWn1 fWn1W1fW1V after Add One fltwn1wn fwn1wngt1gtfwn1fwn1gtv after AddOne Smoothing and Ngrams AddOne Smoothing add 1 to all frequency counts bigram pWnWn1 fWn1Wn1fWn1V fWn1Wn1fWn1fWn1V frequencies I want to eat Chinese food lunch I 8 1087 0 13 0 0 0 want 3 0 786 0 6 8 6 to 3 0 10 860 3 0 12 eat 0 0 2 0 19 2 52 39 gure 6394 Chinese 2 0 0 0 0 120 1 food 19 0 17 0 0 0 0 lunch 4 0 0 0 0 1 0 I want to eat Chinese food lunch I 612 74005 068 952 068 068 068 want 172 043 33776 043 300 386 300 to 267 067 735 57541 267 067 869 eat 037 037 110 037 735 110 1947 Chinese 035 012 012 012 012 1409 023 food 965 048 868 048 048 048 048 lunch 111 022 022 022 022 044 022 Re ma rks perturbation problem addone causes large changes in some frequencies due to relative size of V1616 want to 786 2 338 figure 68 Smoothing and Ngrams AddOne Smoothing add 1 to all frequency counts bigram pWnWn1 fWn1Wn1fWn1V fWn1Wn1fWn1fWn1V Re ma rks perturbation problem similar changes in Probabilities Pr babquot39 es I want to eat Chinese food lunch 1 000233 031626 000000 000378 000000 000000 000000 want 000247 000000 064691 000000 000494 000658 000494 to 000092 000000 000307 026413 000092 000000 000369 eat 000000 000000 000213 000000 002026 000213 005544 figure 65 Chinese 000939 000000 000000 000000 000000 056338 000469 food 001262 000000 001129 000000 000000 000000 000000 lunch 000871 000000 000000 000000 000000 000218 000000 I want to eat Chinese food lunch 1 000178 021532 000020 000277 000020 000020 000020 want 000141 000035 027799 000035 000247 000318 000247 figure 6 to 000082 000021 000226 017672 000082 000021 000267 eat 000039 000039 000117 000039 000783 000117 002075 Chinese 000164 000055 000055 000055 000055 006616 000109 food 000641 000032 000577 000032 000032 000032 000032 lunch 000241 000048 000048 000048 000048 000096 000048 Smoothing and Ngrams let s illustrate the Aprobability mass problem fW1W take the bigram case Wn1Wn pWnWn1 fWn1WnfWn1 fWn1 suppose there are cases Wn1vv 1 that don t occur in the corpus fWn1W010 fwn1W0mo Smoothing and Ngrams Aprobability mass give everyone 1 fWn1Wn1 fWn1 fWn1W011 fwn1W0m1 Smoothing and Ngrams Aprobability mass give everyone 1 fWn1Wn1 redistribution of probability mass pltwnlwn1gt fltw1wgt1ltfltwn1vgt fW1 fWn1W011 V W fwn1W0m1 Smoothing and Ngrams Excel spreadsheet available addonexls Smoothing and Ngrams WittenBell Smoothing equate zero frequency items with frequency 1 items use frequency ofthings seen once to estimate frequency ofthings we haven t seen yet smallerimpact than AddOne unigram a zero frequency word unigram is an event that hasn t happened yet count the number of different words T we ve observed in the corpus i0W TZNT w is a word with zero frequency Z number of zero frequency words N size of corpus bigram original for zero bigrams afterWittenBell pWnWn1 fWn1WnfWn1 pWW1 TW1ZW1TW1NW1 Tw1 number of seen bigrams beginning With WM Zw1 number of unseen bigrams beginning with w1 Zw1 possible bigrams beginning with WM the ones we ve seen ZWn1 V TWn1 TWn1 ZWn1 fWn1fWn1 TWn1 frequency pWnWn1 fWn1WnfWn1TWn1 estimated zero bigram for nonzero bigrams afterWittenBell Smoothing and Ngrams WittenBell Smoothing use frequency of things seen once to estimate frequency ofthings we haven t seen yet bigram TWn1 ZWn1 fWn1fWn1 TWn1 estimated zero bigram frequency Twn1 number of bigrams beginning with Wn1 Zw1 number of unseen bigrams beginning with w1 I want to eat Chinese food lunch I 8 1087 0 13 0 0 0 Remark want 3 0 786 0 6 8 6 to 3 0 10 850 3 0 12 gure 6 4 smaller Changes eat 0 0 2 0 19 2 52 39 Chinese 2 0 0 0 0 120 1 food 19 0 17 0 0 0 0 lunch 4 0 0 0 0 1 0 I want to eat Chinese food lunch I 7785 1057763 0061 12650 0061 0061 0061 want 2823 0046 739729 0046 5647 7529 5647 to 2885 0084 9616 826982 2885 0084 11539 figure 69 eat 0073 0073 1766 0073 16782 1766 45928 Chinese 1828 0011 0011 0011 0011 109700 0914 food 18019 0051 16122 0051 0051 0051 0051 lunch 3643 0026 0026 0026 0026 0911 0026 Smoothing and Ngrams WittenBell excel spreadsheet wbxls WittenBell Smoothing Implementation Excel onelineformula x wan m m cums ma mm 1 5 mm o n u u 0 am want a u m u s u s 1215 m 2 n m m a u 12 3255 m u u 2 u 19 2 52 am am 2 a u u n m 1 213 face 19 u 17 a a a n 1505 my 4 u n u u 1 u as W a v D E IFSheet11320leSK2 SheeuSJZShaetlSJZ SJZSheeK1132 1ugg Sheet SJZSheeu slbslzm 3 sheet2 rescaled sheet1 wan m an Chinese man lunch T 2 1 I 185 105Is n as 12 550 o as o as 005 as 152 warn 2323 u was 139129 mm 5547 7 529 567 7s 15w w 2 an 0030 9515 E25 932 2 335 a mu 1 539 no Has as um cm 1755 am 1575 was 5923 24 1492 Cmnus L825 man mm mm mun m9 700 0934 20 was room 18 039 0051 15122 005 nos cos nos 31 1530 m n 3553 0025 was 0925 mm 0911 9025 as 1571 Language Models and Ngrams Ngram models they re technically easy to compute in the sense that lots of training data are available but just how good are these ngram language models and what can they show us about language Language Models and Ngrams approximating Shakespeare generate random sentences using ngrams train on complete Works of Shakespeare Unigram pick random unconnected words 39 LI x To him swallowed confess hear both Which Of save on trail for are ay device and rote life have b Every enter now severally so let Ic Hill he late speaks or a more to leg less rst you enter Id Will rash been and by I the me loves gentle me not slavish page the and hour ill let to Are where exeunt and sighs have rise excellency took of Sleep knave we near vile like Big ram 3 What means sir I confess she then all sorts he is trim captain b Why dost stand forth thy canopy forsooth he is this palpable hit the King Henry Live king Follow c What we hath got so she that I rest and sent to scold and nature bankrupt nor the rst gentleman d Enter Menenius if it so many good direction found st thou art a strong upon command of fear not a liberal largess given away Falstaff Exeunt e Thou whoreson chops Consumption catch your dearest friend well and I know where many mouths upon my undoing all but be how soon then we ll execute upon my love s bonds and we do you will f The world shall my lord Language Models and Ngrams Approximating Shakespeare section 62 generate random sentences using ngrams train on complete Works of Shakespeare Trig ram a Sweet prince Falstaff shall die Harry of Monmouth s grave b This shall forbid it should be branded if renown made it empty c What is t that cried d Indeed the duke and had a very good friend e Fly and will rid me these news of price Therefore the sadness of part ing as they say tis done f The sweet How many then shall posthumus end his miseries Quadrigram a King Henry What I will go seek the traitor Gloucester Exeunt some of the watch A great banquet serv d in b Will you not tell me who I am c It cannot be but so d Indeed the short and the long Marry tis a noble Lepidus e They say all lovers swear more performance than they are wont to keep obliged faith unforfeitedl f Enter Leonato s brother Antonio and the rest but seek the weary beds of people sick Remarks dataset size problem training set is small 884647 words 29066 different words 290662 844832356 possible bigrams for the random sentence generator this means very limited choices for possible continuations which means program can t be very innovative for higher n Possible Application Language Models and Ngrams Aside httphemispheresm agazinecomlconleslsZ004 nlrohlm HEMISPlEHES Sunk mm mum Camus minimum can commrs Haw a mm the mm zuua ramkvnr Damdms Hmmgway pamm zuu ramkm mm Hemman Diwmcs zuuz mum Dammzs Hzmwgwav Dama Es 2nu1 mum Dama us Hcmu zgwav Dawn s zuuu mum Dams Hcmh gwav mm CONTESTS m not Faux mman lmIlaqun mm comm mm gamma mmr gwav Dlmc cg A I Lay Laughmg anrrl mm min nulkncr mm 1mm Hbmmgwny mmny nnmu nu ma mm mmccunvc ym hmnuphtru u mam m mm m wmmng mm um 1h annual mu Iaulknnrmm lvmmmu unmmgmy pamdycbmnu mm uicnmn mm m mm mm m glnbc mm mm mm om udgm mch a mmmman Inbmz m m Tum Smngzi and a mmmgmymc ammmu parsmul m Judges mu yw mum Ammr Schlmnnu Jr John umm My mauer Dlghy Dmhl mm mm mmmmm Sam dcnrgn Fhmpmn m mdnmamn aulknlf mm m All 4 year arm cmhpmmcu mm Dim m um yw mlmm m m 15m annual Faulknnr mmwmzmmm Shal um w m w um leh In A I Lay Kvmnmg mm mm mm mm Dan 5m mm mnk m 2am annual mmng pml mm nmv m m lalmhmg hzgm Language Models and Ngrams Ngram models smoothing one consequence of smoothing is that every possible concatentation or sequence of words has a nonzero probability Colorless green ideas examples 1 colorless green ideas sleep furiously 2 furiously sleep ideas green colorless Chomsky 1957 It is fair to assume that neither sentence 1 nor 2 nor indeed any part of these sentences has ever occurred in an English discourse Hence in any statistical model for grammaticalness these sentences will be ruled out on identical grounds as equally remote39 from English Yet 1 though nonsensical is grammatical while 2 is not idea 1 is syntactically valid 2 is word salad Statistical Experiment Pereira 2002 Colorless green ideas examples 1 colorless green ideas sleep furiously 2 furiously sleep ideas green colorless Statistical Experiment Pereira 2002 WM wt pw1wn pw1 Hpom wiil I bigram language model i2 Using this estimate for the probability of a string and an aggregate model with C 16 trained on newspaper text using the expectationemaximization method Dempster Laird amp Rubin 1977 we nd that pColorless green ideas sleep furiously N 2 X 105 pFuriously sleep ideas green colorless Thus a suitably constrained statistical model even a Very simple one can meet Chomsky s particular challenge Interesting things to 300816 0 example colorless green ideas sleep furioust 0 First hit Web r Colorless green ideas sleeg furioust Chomsky39s famous sentence 39Colorless green ideas sleep furiously39 is examined and is shown to be a specimen of irony rather being meaningiess home liac neUcnl1997chomsky him A 4k A quotiiiniii Interesting things to 300316 0 example colorless green ideas sleep furioust first hit compositional semantics a green idea is according to well established usage of the word quotgreenquot is one that is an idea that is new and untried again a colorless idea is one without vividness dull and unexciting so it follows that a colorless green idea is a new untried idea that is without vividness dull and unexciting to sleep is among other things is to be in a state of dormancy or inactivity or in a state of unconsciousness to sleep furiously may seem a puzzling turn of phrase but one reflects that the mind in sleep often indeed moves furiously with ideas and images flickering in and out Interesting things to 500316 0 example colorless green ideas sleep furioust 0 another hit a story quotSo this is our ranking systemquot said Chomsky quotAs you can see the highest rank is yellowquot quotAnd the new ideasquot quotThe green ones Oh the green ones don39t get a color until they39ve had some seasoning These ones anyway are still too angry Even when they39re asleep they39re furious We39ve had to kick them out ofthe dormitories they39rejust unmanageablequot quotSo where are theyquot quotLookquot said Chomsky and pointed out ofthe window There below on the lawn the colorless green ideas slept furiously More on Ngrams How to degrade gracefully when we don t have evidence Backoff Deleted Interpolation Ngrams and Spelling Correction 1 2 3 Backoff idea Hierarchy of approximations trigram gt bigram gt unigram degrade gracefuly Given a word sequence fragment W2 Wn1 Wn preference rule pWn IWn2 Wn1 if fWn2Wn1 Wn 0 OL1pWnWn1 fWn1 Wn g 0 OL2pWn notes a1 and a2 are fudge factors to ensure that probabilities still sum to 1 Backoff preference rule pWn IWn2 Wn1 if fWn2Wn1 Wn 0 OL1pWnWn1 fWn1 Wn O OL2pWn problem if fw2w1 W 0 we use one of the estimates from 2 or 3 assume the backoff value is nonzero then we are introducing nonzero probability for pww2 Wn1 which is zero in the corpus then this adds probability mass to pw w2 Wn1 which is not in the original system therefore we have to be careful to juggle the probabilities to still sum to 1 Deleted Interpolation fundamental idea of interpolation equation trigram pWnWn2Wn1 A1pWnWn2 Wn1 A2pWnWn2 A3pWn Note k1 x2 and x3 are fudge factors to ensure that probabilities still sum to 1
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'