Popular in Course
Popular in Linguistics
This 5 page Class Notes was uploaded by Magnus Kshlerin on Tuesday October 20, 2015. The Class Notes belongs to LING581 at San Diego State University taught by Staff in Fall. Since its upload, it has received 38 views. For similar materials see /class/225287/ling581-san-diego-state-university in Linguistics at San Diego State University.
Reviews for COMPUTATIONALLINGUISTICS
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/20/15
Homework Tagging 0 Read Jurafsky and Martin 1st or 2nd ed Chapter 5 0 Partrofrspeech tagging assigns a grammatical category r to tokens in a corpus 0 Midterm s1c exam 0 Since words may potentially occur as more than one part of speech tagging is a limited kind of disambiguation 0 Out Friday April 11 0 Due Friday April 18 The representative put the chairs on the table NOUN 0 Open book open notes but work on it by yourself DET Er NOUN PREP DET NOUN PERIOD 0 Tagging can be done by hand automatically or as a combination of the two Parts of speech Tagging 0 Traditional grammar posits eight parts of speech 0 Various tag sets identified by a mix of semantic functional and distributional criteria CLAWSS Brown Penn ICE 0 The only reliable way to define parts of speech is by She PNP FPS pRP PROMPeIS ng reference to the other parts of speech that they was VBD BEDZ VBD AUXpass paSt b th told WN VBN VBN Vditredp com me W1 that C JT cs IN CONJ NCsubord 0 Major distinction is between open class and closed the ATO AT DT ARTmeQ lass words also called content words and function Jammy NNl NN NN NC0m smg c 39 h VMO MD MD AUXmodalpast words klll W1 VB VB Vmontrinfin i her PNP PPO PRP PRONposssing 0 Major open class categories noun verb adjective FUN pUquer a ver 0 Closed class categories prepositions determiners particles pronouns conjunctions Tagging 0 Tag sets differ greatly in the number and kind of distinctions they make 0 Brown 179 0 Penn treebank 45 0 CLAWSl 132 0 CLAWSZ 166 0 CLAWSS 65 0 LondonsLund 197 0 Tagsets are language and app a ion dep ndent Tagging 0 Penn tagset Tag escrip on Example quotmg nesm39ption Example cc CDDrdJn cbmuneubn amibwmr SYM Symbbi amp CD Cardinal number uMqu Mme 10 bequot u DT Demnnmer a he UH inneneeubr all uupr EX Exrsuermsi Lhere here VB Verb base farm ea FW Fumignumxd mamlpd V D Verb pasthense ale m Prepbsmbnsubebm ufJan VBG Verb gerund eating 1 Abbeeuve 1qu VBN Verb pastpammple emen MR Aeb campamnve bigger VBF Verb nun sg pres all Us Ad s in wilder VBZ Verb 35g pres 1 LS L1 12 One WDT Wh determiner mm mm M can rmm WP h mun w NN Nbun 5mg Urmass 11W VIPs Pbssesswe whr wrmre NNS Nbun pluml 1W WR have by where NNF Pmpermun singular IBM s Dullarsign NNFS Pmpermun pluml szmzr 4r Faundsign PDT Preeeberrmner 41111th Lenqu br FOS Pbssesswe ending 39r quot Right queue br quot FRP ersbnai pmnnun Iyanh Lenparentbesrs r lt mas Passessive pmnnun yamqu Rightparenlh sls gt RE Advexb may never CDmma RBR Advexb eumpamuve farm Senueme neipupe l 7 RES Advexb superlative farm Mrebsermemepune er RF Pameie npuff Figure 51 Tagging 0 Ex Inple TheD ndJJquyNN comm entedVBD onIN aDT numberNN ofIN he JJ 0 mIEX ar BP 70CD childrenNNS thereRB Alt u mor RBR thanIN aDT yearNN agoIN theUT latestHS resultsNNS app rVBPinIN todayNN sIPOS NewNN39P EnglandNN39P JoumalNNP of NMed cindNN39P MrsN39NP ShaeferNN39P neverRB gotVBD am miRP wTO joiningVBG AllDTwePRP gotmVBN doVB isVBZ goVB moundIN theDT comerN39N ChateauNN39P PetrusNN39P costsVBZ munleB ZSOCD Tagging Some distinctions cannot be made reliably Pat worked through the problem Pat worked the problem through Pat walked through the door Pat walked the door through More detailed tagsets can paradoxically be easier to auxiliaries be do have modals can will 0 gerunds Pat s constantly humming showtunes is annoying Tagging 0 More challenges cottonNN sweate rNN incomertaXJJ retarnNN theDT Gramerudm anNP ActNP ChineseNN cookingNN Paci cNN watersNNS They were marriedVBN by the Justice of the Peace At the time she was already marriedJJ Formal languages Chomsky hierarchy Context sensitive Context free IRegular Regular languages constitute a subset of possible formal languages A language is a regular language if and only if it can be described using a regular expression or an FSA It s easy to prove that a language is regular harder to prove that it s not Pumping Lemma The Pumping Lemma is a useful tool for showing that a language isn t regu ar The key intuition any nonrfinite regular language must have a loop somewhere in its corresponding FSA O 77777777 lt5 77777777 ampgt 77777777 Q The FSA accepts xyz but must also accept m2 W2 W2 etc or xynz in general Pumping Lemma Pumping Lemma Let L be an infinite regular language Then there are strings x y and 2 such that y e and Xyan L for 1120 Every regular language has some substring y that can be pumped Nonrregular languages may have strings that can be pumped the pumping lemma is a necessary but not sufficient condition for showing a language is regular For example anbn MyhilliNerode theorem Pumping Lemma Is English syntax a regular language Partee et al 1990 Chapter 16 Center embedding The cat likes tuna sh The cat the dog chased likes tunafish The cat the dog the rat bit chased likes tuna sh The cat the dog the rat the elephant admired bit chased likes tuna sh These examples all have the form the noun transitive verbn391 likes tuna fish Pumping Lemma 0 Take the intersection of the set of English sentences and the regular language A 3 likes tuna sh 0 This gives us the language L X yn39l likes tuna sh XEA yeB which is not regular by the Pumping Lemma 0 Since regular languages are closed under intersection English must not be a regular language