LANG DIVERSIFICATION AND DEATH
LANG DIVERSIFICATION AND DEATH LIN 392
Popular in Course
Popular in Linguistics
This 51 page Class Notes was uploaded by Erich Mueller on Sunday September 6, 2015. The Class Notes belongs to LIN 392 at University of Texas at Austin taught by Staff in Fall. Since its upload, it has received 48 views. For similar materials see /class/181578/lin-392-university-of-texas-at-austin in Linguistics at University of Texas at Austin.
Reviews for LANG DIVERSIFICATION AND DEATH
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/06/15
Introduction to Umx and Python mm m LIN m kamgwnh Comm swmgzooa January 19 2009 Ugmmwmmmmmmsmmem A collection of Unix commands 0 Changing your password passwd o Directories 7 print current pwd 7 list contents 1s ltpathgt 7 change directory cd ltpathgt 7 makeremove directory mkdir ltpathgt7 rmdirltpathgt 7 special paths current 7 parent 7 home 77 0 Managing les 7 copying cp ltfi1enamegt ltpathgt 7 moving mv ltfi1enamegt ltpathgt 7 removing rm ltfi1enamegt 0 Getting a shell on another computer ssh ltusergt ltmachine namegt First steps in Python Python has a shell7 which you can invoke from the Unix shell Invoke the Python shell to be able to execute Python commands ln fact7 there are two ways of getting a Python shell A bit more basic python Or7 with more advanced editing options idle The idle shell executes commands that you type in after the prompt gtgtgt Type in a command after the gtgtgt to have it executed ln examples from the idle shell7 everything after gtgtgt is what you need to type into the shell Lines that dont start with gtgtgt show the answer that the Python shell gives We start examples from the idle shell with IDLE IDLE gtgtgt print quotHello worldquot Hello world gtgtgt print 2 2 gtgtgt pr int 23 5 gtgtgt print 67 42 gtgtgt print 23 8 print is a cammaud7 Hello world is an argument or parameter The command pr int causes something to be printed to the screen Expressions in arguments are evaluated before the command is executed 23 is an expression that evaluates to 5 67 is an expression that evaluates to 42 23 evaluates to 8 because is used to say to the power of77 in Pyt on Escape characters in strings IDLE gtgtgt print quotHello worldquot Hello world gtgtgt print quotHello n worldquot Hello world gtgtgt print quottHello worldquot Hello world o n stands for newline o t stands for tabstop Executing a Whole program at once 0 In your favorite editor type the following lines and save the result as ASCII as first py print quotHello worldquot print quotThis is another stringquot Note that this is not an idle shell example as you can see from the fact that the typed text does not start With IDLE These lines are lines that you need to type they are not responses by the idle shelll You only see gtgtgt When you are interactively typing Python commands into the idle shell not When you Write programs that you want to process as a Whole 0 Get a Unix shell not a Python shelll Make sure you are in the directory Where you stored first pyl Type python first py So yo can execute a sequence of Python commands 7 a Python program 7 that you have collected in a le by passing the le on to python in a Unix shelll This is useful if you have complex sequences of steps that you want to apply to more than one dataset or When you have a program that you want to save because you may need it again later We Will see much more complex programs laterl Comments in Python Whatever occurs in a line after a is ignored by Pythonl If is the rst character in the line the Whole line is ignore Comments are only useful When you Write longer programs that you execute as a Whole it firstpy it Katrin Erk January 2009 it This program prints quotHello world to the it screen print quotHello worldquot it another comment Writing useful comments 0 At the beginning of the program 7 Who Wrote it When 7 What does it do 0 Within the program 7 Headings for different parts of the program7 for example it Read corpus it Count of occurrences of each word it Print number of occurrences to the screen Many people use sequences of 7s to section off parts of a program 7 Explain anything that is not obvious o Deactivate code commenting out77 it print quotHello worldquot print quotgoodbye cruel worldquot Data types pr int 27 is a command consisting ofa command name pr int and a parameter 27 The parameter is a piece of data that is passed on to the pr int command In this case this piece of data happens to be an integer number 27 In pr int 49 the parameter is also an integer the result of evaluating the expression 49 to 36 In print quotHello worldquot the parameter quotHello worldquot is a shiny If you apply to numbers you add them If you apply to strings you concatenate them IDLE gtgtgt pr int 23 5 gtgtgt print quotHello quot quotworldquot Hello world 0 What can you do with numbers Add subtract multiply o What can you do with strings Concatenate shorten 0 In general what you can do with data depends on what kind of beast it is 0 Each piece of data has a type What happens if you don7t pay attention to data types IDLE gtgtgt pr int 23 5 gtgtgt print quotHello quot quotworldquot Hello world gtgtgt print quotHello quot 2 Traceback most recent call last File quotltpyshell35gtquot line 1 in ltmodulegt print quotHello quot TypeError cannot concatenate str and int objects This produces an error Note When there is an error7 Python tries to tell you pretty precisely what is wrong Reading the error message is often the fastest way to guring out what went wrong You can View the type of a piece of data IDLE gtgtgt print type1 lttype int gt gtgtgt print typequotIIello worldquot lttype str gt Variables ln Python you can pass pieces of data to a command name to be processed as we did in print quotHello world39K You can also store a piece of data in a variable gtgtgt name quotworldquot gtgtgt print quotHello quot name Hello world gtgtgt name quotKatrinquot gtgtgt print quotHello quot name Hello Katrin Variables are like containers Python evaluates a variable like it evaluates an expression and uses Whatever it nds in the variable Variable names You choose the names for the variables you use ldeally choose names that Will re ect the kind of content you put into the variable This Will help you understand your own programs When you re read themi What can you choose the name of a variable to be 0 Variable name may contain letters numbers underscore 0 They must not start With a number 0 They must not be identical to one of the reserved words77 that Python has already de ned Storing data in variables A commannd var expression assigns to the variable var the result of eval uating expression IDLE gtgtgt name quothelloquot gtgtgt print name hello gtgtgt name 3M4 gtgtgt print name As you can see the expression 34is evaluated before storing the result in name We can also do this IDLE gtgtgt a 2 gtgtgt b 3 gtgtgt var a b gtgtgt print var What happens in varab o The variable a is evaluated to 2 o The variable b is evaluated to 3 o The expression 2 3 is evaluated to 5 o The result is stored in the variable vari You can update a variable IDLE gtgtgt var 1 gtgtgt var var 1 gtgtgt pr int var 2 We Will use variable updating extensively later When we use loops Variables have types7 too IDLE gtgtgt a quothelloquot gtgtgt b 2 gtgtgt pr int a b Traceback most recent call last File quotltpyshe1141gtquot line 1 in ltmodu1egt print a b TypeError cannot concatenate str and int objects Functions Sometimes you will want to apply the same piece of program code to different pieces of data For example a piece of code that transforms Fahrenheit to Celsius temperatures IDLE gtgtgt print 67 32 5 9 19 gtgtgt print 74 32 5 9 23 gtgtgt print 100 32 5 9 37 67 F are 19 C 74 F are 23 C 100 OF are 37 C We can generalize a bit using a variable This will be helpful since we are now reusing the exact same line and the idle shell makes this easy IDLE gtgtgt temp 6 gtgtgt print temp 32 59 gtgtgt temp 7 gtgtgt print temp 32 59 gtgtgt temp gtgtgt print temp 32 59 7 De ning a function We can give a name to a piece of code IDLE gtgtgt def fahrenheittocelsiustemp print temp 32 59 We are assigning to the name fahrenhe it tocelsius a piece of code that will be executed whenever we call fahrenheittocelsiusi We put a variable name in brackets behind fahrenheittocelsius When we use our new function we put a value into the brackets Then this value gets substituted for temp and the function is executed with it IDLE gtgtgt fahrenhe ittocelsius 87 30 gtgtgt fahrenheittocelsius 92 33 This is a bit like functions in mathematics They too are de ned with argu ments parameters in brackets 2 12 Then is the result you get when you substitute 3 for z in the function de nition 17 We could also have de ned our Fahrenheitto Celsius function in a slightly more involved way IDLE gtgtgt def fahrenheittocelsius temp print quotDegrees Fahrenheit quot temp print quotDegrees Celsius quot temp 32 59 gtgtgt fahrenheittocelsius87 Degrees Fahrenheit 87 Degrees Celsius 30 gtgtgt fahrenheittocelsius92 Degrees Fahrenheit 92 Degrees Celsius 33 There are several things to note here 1 The pr int command is able to handle multiple parameters separated by commas IDLE gtgtgt print 12 1 2 It inserts a single blank space between the things it prints and prints them all on the same line Except of course when the string it gets to print contains a newline IDLE gtgtgt print 1 quotnquot 2 1 2 2 In the print command in print quotDegrees Celsiusquot temp 32 59 the rather complex expression temp 32 59 is evaluated before it is handed on to print This is not new but it is a nice example of how complex expressions can be 3 How did Python know when the function de nition was ended lt corn prised two lines of code in this case Python knew by the indentation of the code gtgtgt def fahrenheittocelsiustemp print quotDegrees Fahrenheit2quot temp print quotDegrees Celsiusquot temp 32 59 The second and third line have the same indentation7 by one tabstop Python uses the indentation to gure out when a block of code ends If you change the amount of whitespace7 Python gets confused IDLE gtgtgt def fahrenheittocelsiustemp print quotDegrees Fahrenheit2quot temp print quotDegrees Celsiusquot temp 32 59 Synt axError invalid syntax 4 Note the general shape of the function de nition def functionname parameters indented code indented code indented code You will see the general shape ltsomethinggt indent ed code often in Python Functions that have values We can assign the value of an expression to a variable IDLE gtgtgt var 39 gtgtgt pr int var 27 The function fahrenheittoc elsius that we have de ned above does not yield a value IDLE gtgtgt def fahrenheittocelsius temp print temp 32 59 gtgtgt x fahrenheittoce1sius87 30 gtgtgt X gtgtgt Or rather it yields the special value of None which we will get to know later We could de ne it in a different way such that it yields a value IDLE gtgtgt def newfahrenheittocelsiustemp return temp 32 59 Note that the newfahrenheittoc elsius function de nition uses the reserved word return instead of issuing a pr int command Now that we have used the return command the value of the function becomes the value of the expression handed to return in our case temp 32 59 This value can be printed and it can be used as part of a larger expression IDLE gtgtgt x newfahrenheittocelsius87 gtgtgt print x 30 gtgtgt print newfahrenheittoce1sius74 1 24 Note that we cannot do this with the old fahrenheititoicelsius function which as you remember yields a value of None IDLE gtgtgt print fahrenheittoce1sius74 1 23 Traceback most recent call last File quotltpyshe1137gtquot line 1 in ltmodulegt print fahrenheittoce1sius74 1 TypeError unsupported operand types for NoneType and int Python tries to evaluate fahrenheititoicelsius 74 and in doing so exe cutes the print command Afterwards it tries to evaluate fahrenheitto celsius741 and runs into an error As you can see the data type of None is NoneType What does return do exactly IDLE gtgtgt def testingreturnva1ue print quotThis is the first print statementquot value return quotxyzquot print quotThis is the second print statementquot gtgtgt print testingreturn3 This is the first print statement 3 xyz What does this mean Where does the 7xyz7 come from Why didn t Python print 77This is the second print statement7 Functions With more than one argument If you want to Write a function With more than one argument separate argu ments With a comma IDLE gtgtgt def concatstringpairs1 s2 return s1 s2 gtgtgt print concatstringpairquothelloquot quot worldquot hello world Expressions and commands As you know expressions get evaluated and can be assigned to variables Com mands are execute Now just to confuse you if you type a bare expression Without any command into the idle shell it evaluates the expression and prints its value IDLE gtgtgt pr int 67 42 gtgtgt 67 42 This should not make you confuse expressions and commands It is just the default thing that the idle shell does With an expression An it makes sense that this should be the default action of the shell on seeing an expression because often you Will construct a function to compute some value for example the frequency of a particular pattern in a corpus and What you Will want to see When the function has finished executing is the result of evaluating the function Longer strings and documenting functions To make a string that runs over more than one line7 use 777ml i 77 These are three double quotes at the beginning7 and three double quotes at the end of the string IDLE gtgtgt print quotquotquotThis is a long string It goes on and on and on u u n This is a long string It goes on and on and on In Python7 these long strings are also used to document functions def fahrenheittocelsius ftemp quotquotquotThis function converts temperatures in Fahrenheit to Celsius and returns the Celsius temperature as an integer nun return ftemp 32 59 777777 A long string H7777 directly at the beginning of a function de nition is ignored in execution It serves as a documentation of the function It is good practice to document your functions Python A bit of functional programming LIN 392 Spring 2009 Working with Corpora Katrin Erk How can you sort a dictionary by its values Sorting dictionary keys is easy Assume we have the following corpus frequency counts gtgtgt mycounts Z the 1034 a 1724 bathtu 77 2 go 37 gtgtgt words I mycountskeys gtgtgt wordssort gtgtgt words 39a39 39bathtub39 39go39 39the But especially in the case of frequency counts we may be more interested in sorting the dictionary by value ie by frequencies How do we do that How can you sort a dictionary by its values The idea for sorting a dictionary by its values is as follows We still sort keys but not alphabetically but by their values in this particular dictionary gtgtgt mycounts Z the 1034 a 1724 bathtu 77 2 go 37 gtgtgt words I mycountskeys gtgtgt wordssortkey Z lambda wmycountsw gtgtgt words 39a39 39bathtub39 39go39 39the Anonymous functions with lambda What we have done on the previous slide is to pass on to sort a function that maps each item to a key before sorting it lambda w mycountsw is an anonymous function that is a function without a name It has one parameter W A normal Python function with a name would be defined using def myfuncw return mycountsw Lambda functions are different in that they 0 don t get brackets around parameters 0 return the value of the expression in their body without the keyword return In fact you cannot use return in a lambda function Anonymous functions with lambda Here is another use of anonymous functions The Python builtin function map takes a function and a list and applies the function to each item on the list It returns the results again in a list Here is an example where we take a list of words7 strip punctuation from the beginning and end ofeach word7 and collect the result in a new list of words import string f open UserskatrinerkDesktop1epoe10txt text fread wordswithpunct textsplit words I maplambda w wstripstringpunctuation7 wordswithpunct Anonymous functions with lambda 0 Can you use map to take a list of words and convert each of them to lowercase 0 Can you sort a list of strings by their last letter List comprehensions gtgtgt mylist I 1234 5 gtgtgt i1 for i in mylist 27 3 4 5 6 The construction i1 for i in mylist is a list comprehension In natural language it says I want the list that you get when you add one to each item on mylist The general shape of a list comprehension is fx for x in somelistobj ect So for each object x in a list somelistobj ect compute some function of that x and keep it in my new list List comprehensions This gives us another way of stripping punctuation from the beginning and end of words import string f I open UserskatrinerkDesktop1epoe10txt text Z fread wordswithpunct 2 text split words I Wstripstring punctuation for W in wordswithpunct Can you use a list comprehension to produce a list of all the words in 1epoe10txt lowercased List comprehensions with a condition There is one additional thing you can do with a list comprehension You can restrict the items from the original list that you use Here is a simple example gtgtgt mylist I 1234 5 gtgtgt i1foriinmylistifi I O 3 5 is the modulo7 operator So 3 2 is 1 and4 2 is O This list comprehension says Take all items on mylist and store that item plus one but only if that item is even So the extended general shape of a list comprehension is fx for x in somelistobj ect if conditionx Tasks with list comprehensions 1 Can you use a list comprehension to produce a list of all words from 1epoe10txt lowercased and with all punctuation stripped r0 Can you use a list comprehension to produce a list of all words from 1epoe10txt but retain only words of length gt2 3 Don t strip punctuation for this one 3 Now think about how to lowercase words and strip punctuation m retain only words of length gt2 3 Watch outYou want to retain all words that have at least 3 letters afti stripping punctuation You may need more than one line for doing this Annotation Quality testing and automation LIN 392 Working with Corpora Spring 2009 Katrin Erk Quality testing and automation 0 How good is a given annotation 0 Is it correct 0 Is it consistent 0 How can you check this for thousands of sentences 0 The annotation manual may easily be 50 or 100 pages long Annotation takes a lot of time 0 SALSA 20000 sentences7 about 4 years 0 Is there any way we can speed this up Overview 0 Annotation quality testing 0 Inter annotator agreement 0 Intra annotator agreement 0 Automatic quality testing 0 The kappa measure 0 Semi automatic annotation 0 Automatic pre annotation 0 Automatic selection of items to annotate Active Learning Annotation quality testing the problem 0 No annotation is error free 0 Problem of annotation consistency 0 Same phenomenon annotated the same way today and 6 months ago 0 Change in annotation guidelines must lead to changes in old annotations 0 Are the guidelines clear enough Will all annotators understand them the same way 0 Simple oversight Intera n notator agreement 0 Two or more annotators annotate the same text 0 How often do their analyses agree 0 Time consuming since time will be spent re annotating the same text rather than annotating new text Intera n notator agreement 0 Salsa 0 Each lemma is annotated independently by two annotators 0 Adjudication A third person looks at points where the two annotators disagree Adjudicator chooses one of the two analyses or substitutes a totally different one 0 Metaadjudication Two adjudicators instead of one Look at disagreements between adjudicators Intraan notator agreement 0 How consistent is a single annotator 0 Re annotate a text you have annotated a few months ago assess disagreement with yourself Automatically detecting annotation errors 0 Turn annotation guidelines into rules 0 WS POS tagging manual Hyphenated nominal modifiers should always be tagged as adjectives77 0 PCS tags for Closed Classes No word that doesn t belong to the finite Class may have the tag 0 Dickinson and Meurers 2003 same context different tag potential error Automatically detecting annotation errors 0 Dickinson and Meurers 2003 error checking for POS tagging 0 variation ngraln same context words but one word variation nucleus with different tag to ward off a hostile takeover attempt by two European shipping concerns 0 Long ngraln Probably an error threshold n26 0 Variation at fringe of ngram probably not an error 0 Later generalized to syntactic analysis 0 In general not much work on automatic error checking for annotation How to measure agreement between annotators 0 Simplest measure percentage of agreement 0 But what does it mean How good is 50 agreement 0 Just 2 choices7 eg distinguishing between celestial body77 and well known person77 sense of star 50 is very bad 0 40 choices7 eg word senses ofa high frequency verb like go 50 not great7 but not abysmal either Chance agreement 0 Imagine two annotators are assigning random tags 0 Two tags both Chosen equally often Annotators will agree 50 of the time Pagree PA1ta91 1014205091 PA1tag2 39 3120092 0 Two tags one Chosen 95 of the time Pag39ree PA1tl191 39 PA2tagl 1314105092 39 342056192 095095 005005 0905 Estimating Chance agreement 0 Two annotators Ann and Bob N labels 0 Probability that 0 Ann Chose label 1 AND Bob Chose label 1 OR 0 Ann Chose label 2 AND Bob Chose label 2 OR t 0 Ann Chose label N AND Bob Chose label N 0 For independent probabilities AND is multiplication OR is addition The kappa measure Correcting for Chance agreement Cariclla 19 ciimpiiiaiiimai Linguistics 222 0 Bluasurt from mutant analysis PA PE 1 PE Pm immimi agreement What is a good kappa value 0 Krippendorff 1980 0 kappa lt 067 discard 0 kappa between 067 and 08 allows tentative conclusions 0 kappa of08 or greater allows definite conclusions 0 Also depends on the task Problems with kappa 0 Skew through uneven classes 0 Suppose you have 2 labels7 discourse marker77 and no discourse marker 0 Label no discourse marker77 will be much more likely 0 So7 high chance agreement 0 This penalizes each disagreement btw annotators more and lowers kappa Problems with kappa 0 Kappa assumes that each item will get one label 0 But What if only some items get labels 0 Semantic role assignment not every syntactic constituent bears a role 0 Discourse analysis not every syntactic constituent is discourse c 77 marker or argument Problems with kappa 0 Kappa assumes that each item will get one label 0 But What if items can have more than one label 0 Vagueness and ambiguity in word sense assignment 0 Can we measure partial agreement Other approaches to ascertaining annotation quality 0 OntoNotes the 90 solution 0 Task word sense annotation 0 Idea measure inter annotator agreement ifit is below 907 re de ne the sense labels7 then re annotate repeat if necessary 0 What does that mean for the word sense labels that they are assigning Other approaches to ascertaining annotation quality 0 Annotation as a psycholinguistic experiment 0 Have many people do the same task7 at least 20 annotators per item 0 View disagreement between annotators as a graded label7 eg 60 of annotators assigned label A7 40 assigned label B7 then the label is a mixed label7 60A7 40 B 0 But is this valid What if it s just the annotation manual that is bad and leads to disagreements Overview 0 Annotation quality testing 0 Inter annotator agreement 0 Intra annotator agreement 0 Automatic quality testing 0 The kappa measure 0 Semi automatic annotation 0 Automatic pre annotation 0 Automatic selection of items to annotate Active Learning Automatic annotation 0 For word sense annotationWord Sense Disambiguation system 0 input a word in context for example The astronomer married the star 0 output a sense label for the target word for example well known person 0 For syntacic annotation parser 0 input a sentence for example Fruit ies like a banana 0 output a syntactic tree Automatic annotation how doesn it work 0 Many systems use machine learning 0 software that learns from examples 0 It looks at some previously annotated samples training data 0 Then it applies what it has learned to new cases 0 Learning 0 generalizing over seen training items 0 so the system can treat new cases the same way as similar training items 0 What does similar mean 0 many ways of defining similarity 0 needed some sort of formal representation of training and test items Automatic pre an notation 0 Aim speeding up annotation 0 Problem automatic annotation more error prone than manual annotation 0 Solution 0 Data automatically annotated 0 Human annotator Checks automatic annotation and corrects errors Automatic pre an notation 0 PCS tagging Torsten Brants 2000 One human post editor reduces error rate from 3 3 to 12 German corpus 0 Syntactic annotation inTIGER 0 Interactive semi automatic annotation 0 System proposes one constituent 0 Human con rms or corrects 0 System proposes next constituent Active learning 0 Software and human annotator annotate together 0 Software figures out the item it is most uncertain about 0 Those items it gives to the human to annotate 0 The others it does automatically 0 Also it continually learns from What the human annotator does Master and Apprentice setting 0 apprentice software does easy tasks it can already do 0 for more complicated tasks it asks the master the human 0 from observing the master it learns to solve the more difficult cases too Active learning 0 Aim reduce the amount of data that a human annotator has to label 0 Use machine learning 0 The more training data a machine learning system has the better it works 0 But often less and well chosen training data is better than more random training data Tools for searching an notated corpora LIN 392 Spring 2009 Working with Corpora Katrin Erk Searching corpora Corpus types and search tools 0 Raw text 0 regular expressions grep regular expressions in Python 0 PCS tagged corpora 0 cclp queries IMS corpus workbench 0 Treebanks 0 query languages that can talk about syntactic trees TIGERSearch tregex Linguist s Search Engine CCllO 0 query language developed at IMS Stuttgart 0 search for words POS tags lemmas in tagged text 0 used in CWB corpus workbench XKWIC web search forms qu 0 Basic units words may have tags like pos lemma 0 Simple search terms Note the difference between grep and cqp p05 1N In grep all characters are equal 80 grep 77 77 does not know what a word 2 walk 8 P05 NN l word is In contrast in 77 cqp the basic unit is a word plus its tags word I believe 0 Multiple description for same unit 0 Conjunction Via 81 Disjunction Via pos I IN pos 2 PP 0 water short for word water opus 0 http urdletrugnl tieden1anOPUS 0 Queryable versions of several parallel corpora 0 Query format cqp 0 Search form for Europarl http urdletrug nl tiedeman OPUScwbEuroparl frames cgp html 0 tnt instead of pos77 for part of speech cqp query elements Regular expressions within the description of one word 0 Variance in words tags regular expressions 0 bracket expressions word I Tthe 0 for zero or more for arbitrary letter word I confuse 0 7 for may be there or not word I confused cqp query elements and or not 0 amp conjunction word I the amp pos Z DT 0 disjunction word the pos DT 0 to group things that belong together 0 negation word I HtTheH amp pos Z quotDTquot 0 What do these mean 0 W0rd I 77waterquot pos quotNNquot 0 W0rd I 77waterquot amp pos quotNNquot 0 word l 77waterquot amp pos quotNNquot cqp query elements consecutive words vvletu the PCS HNNH u out n of the bag 0 Word sequences can be connected by 81 or like entries Within a word pos Z HNNquot pos Z HNNquot llNNll quotofquot llNNllgt Quantifiers apply to Whole words word I the zero or more occurrences of the pos DT zero or one determiner POS HNNH Hoflt PCS HNNHgt cqp query elements any word 0 means any one word 0 give up give then any word then up 0 Normal quantifiers apply Hgiveu HupH cqp query elements context 0 HgiveH quotupquot Within 7 Within 7 words 0 let out Within 5 Within the same sentence Tasks 0 search for variants of some idioms and collocations for example 0 bite the bullet 0 blow sth out of proportion 0 tip of the iceberg Tasks 0 a noun followed by either is or was followed by a verb ending in ed 0 catch or caught followed by a determiner any number of adjectives and a noun or a noun followed by was or were followed by caught Tasks 0 Non core grammatical constructions With the Wicked Witch dead the people rej oiced Has anyone heard of let alone read this blog What are your children doing trampling my flowers How would you search for such constructions using cqp Searching corpora Corpus types and search tools 0 Raw text 0 regular expressions grep regular expressions in Python 0 PCS tagged corpora 0 cclp queries IMS corpus workbench 0 Treebanks 0 query languages that can talk about syntactic trees TIGERSearCh tregex Linguistls Search Engine The Linguist s Search Engine 0 Developed by Philip Resnik and his group 0 http lseun1iacsurndedu8080 0 Search over syntactically parsed data The Linguist Search Engine 0 How to search 0 Make up a sentence showing the phenomenon 0 Syntactic tree for that sentence Generalize by cutting pieces away Until you have a pattern describing your phenomenon 0 Then search using that pattern 0 You can save results of your search 0 You can also supply your own corpora tregex tool for matching Patterns in trees I Written by Galen Andrew and Roger Levy 0 h 1113 stantm llLl 39wrtu e Wreuexxahtml Roger Levy and Galen Andrew 2006 Tregex and Tsurgeon tools for querying and manipulating tree data structures Proceedings of LREC 2006 available at http ingucsdeduerevypaperslevy7andrew71rec2006pdf I Formam handled by the tool 0 Penn Troubank 0 others as wall it from The Wonderful World ofTre ex httpulpslanfordednjavaulpmorials SN regex a quotA ls TregeX ThLWouderfuLWorldiofiTregexppt V39NIquotV Ajava utility in javanlp for identifying patterns in trees Like regular expressions for strings based on tgrep syntax I Simple example NP lt NN tregexpsh NP lt NNquot lename s NP VP K A DT NN VBD vP VBG NP PP PRP MN N The rm stopped using 03ng in its cigarette lters from The Wonderful World of Tregex nlpsmnfordedujzv2nlptutorizls hquot Syntax Relations Relationships between tree nodes can be speci ed There are many different relations Here are a few Symbol Description Symbol Description A lt B A is the parent of B A ltlt B A is an ancestor of B A B A and B are sisters A B B next sister ofA A lti B B is ifquot child ofA A lt B B is only child ofA A ltlt B A on head path of B A ltlt B B is rightmost A B A precedes B in depth rst traversal of tree A C B A dominates B via unbroken chain of Cs tregex details I The following details on tregex will not be covered in class but may come in handy if you are planning to use the tool Op ons 0 C only count matches don t print 0 W print Whole matching tree not just matching subtree f print filename i ltfilenamegt read search pattern from ltfilenamegt rather than the command line 0 s print each match on one line instead of multi line pretty printing 0 u only print labels of matching nodes not complete subtrees 0 t print terminals only tregex Matching single nodes 0 You can give the Whole name of a node 0 tregexsh NP cf01mrg 0 tregexsh meeting cf01mrg or better tregexsh W meeting cf01mrg 0 You can use regular expressions in describing a node name 0 tregexsh NP cf01mrg any node name that includes NP 0 Note enclose regular expressions in for tregex tregex Matching single nodes 0 Still using regular expressions in describing a node name 0 tregexsh NP cf01mrg node names containng NP preceded by at least one character77 0 tregexsh NP cf01mrg node names ending in NP77 0 tregexsh Tthe cf01mrg node name containing The or the tregex relations between nodes 0 tregexsh NP lt NNS cf01mrg NP node that is the parent of an NNS node 0 Note NP lt NNS in quotes otherwise Unix would think we re using a pipe lt More complex queries 0 Head concept exists in tgrep tgrepZ tregex 0 Head first node mentioned in the query 0 All relations that you state are relative to the head 0 tregexsh NP lt NNS 35 PP cf01rnrg An NP with an NNS child and a PP sister It is the NP that has the PP sister not the NNS More complex queries 0 Use brackets to introduce more heads 0 tregexsh NP lt NNS 35 PP cf01n391rg An NP with an NNS child and that NNS child has a PP sister so basically same as NP lt NNS lt PP Examples 0 Verbs and particles 0 tregeXsh VP ltVB lt PR 7 cf01mrg VP with a verb and a particle child 0 tregeXsh VP lt1 VB lt2 PR 7 cf01mrg VP with first child a verb7 second child a particle 0 tregeXsh VP lt1 VB lt2 PRT cf01mrg VP with first child some kind of verb7 second child a particle 0 In this case7 all three give the same result Examples 0 An NP with 2 adjective children 0 NP ltlt ltlt doesn t do the trick may be same both times 0 ltlt 0 How do you construct a search pattern 0 Find an instance of the phenomenon you are looking for 0 Describe the pattern you see there 0 Query 0 Inspect the result If unsatisfactory change pattern Tregex ca n do more things Read them up if needed 0 Naming 0 Give a name to a node 0 So you can later refer to the node 0 Negation 0 You can negate a description iNP 0 You can negate a relationVP lltlt up 0 You can negate your Whole query Via option v 0 Sub queries can be combined using and and or Tregex can do more things Read them up if needed 0 An example of really complicated query o NP lt NN lt dog 3 VP ltlt barks gtVBZ 0 An NP with an NN child that has a child dog and the NP also has aVP sister headed by barks and the direct parent ofi barks isVBZ Tregexjava classes Include Tregex in your Java programs Object orientation in Python Bake your own data types LIN 392 Spring 2009 Working with Corpora Katrin Erk Defining your own Classes in Python We have talked about Object Orientation Objects have methods associated With them according to their data type For example all string objects have a method startswith that tests if a string starts with a given prefix You can define your own data types in Python called m and then you can make objects of your self defined classes Self defined classes can have associated 0 variables 0 and methods So you can use them as composite data structures with associated methods A first example of a new Class With variables only no methods Making a new class class MyPoint This class represents a Zedimensional poiIit x None integer xeaxis value y None integer yeaxis value This defines a constructor function MyPoint that you can use to make an object of the new class gtgtgt myvar Point gtgtgt myvarx 0 Once myvar is a variable of type MyPoint7 it has associated instance gtgtgt myvarx variables X and y that you can read and 0 write using the dot notation Making your own types Classes Different instances of the same data type can hold different values in their instance variables gtgtgt a MyPoint gtgtgt ax 0 gtgtgt b MyPoint gtgtgt bx 2 gtgtgt ax 0 gtgtgt bx 2 Making your own types classes Warning If you initialize a variable in the class definition to a mutahle data type all instances will point to the same Piece of data so you will not have different value for different instances of your data type gtgtgt class B 11 gtgtgt a 130 gtgtgt alappend1 gtgtgt al 1 gtgtgt 13 130 gtgtgt b1 1 Making and using your own types Class MyReCtangle We are defining a rectangle width None through 1ts Wldth7 length7 and length None upper left corner point The upperilefticorner None upper left corner we deflne as an a MyReCtangleo 1nstance of our MyPomt class We awidth 10 alength 35 aupperileft7corner MyPoint access its X and y coordinates using two dots one to access the upperleftcorner of object a7 aupperilefticomenX 3 and a second one to access the X lupperilefticorneny 91 and y coordinates of the upper left corner Defining a class with methods So far we have only defined classes with associated instance variables Now we add methods The first is a very simple method that prints out the coordinates of our point Class MyPoint X None y None defprintmeself print X selfX7 y selfy Note that the method definition looks almost like the function definitions you have seen before7 but it is Within the block of Class MyPoint Defining a class with methods Class MyPoint X None y None def printmeself print X selfX7 y selfy The parameter self7 of the MyPoint method printme is the object to which the method belongs So7 selfX is the X instance variable of the object to Which the printme method belongs
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'