New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Kurtis Spencer I


Kurtis Spencer I
GPA 3.56

Fei Xia

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Fei Xia
Class Notes
25 ?




Popular in Course

Popular in Linguistics

This 57 page Class Notes was uploaded by Kurtis Spencer I on Wednesday September 9, 2015. The Class Notes belongs to LING 570 at University of Washington taught by Fei Xia in Fall. Since its upload, it has received 17 views. For similar materials see /class/192174/ling-570-university-of-washington in Linguistics at University of Washington.




Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/09/15
Summary LING 570 Fei Xia Week 11 120507 Outline Main units Main techniques What s next Main units Unit 0 introduction and prereq 1 week Quiz 1 Hw1 Hw2 Part I Course overview Tokenization Introduction to probability theory Unit 1 finitestate machine 3 weeks Quiz 2 Hw2 Hw4 Formal language Formal grammar Regular expression FSA Regular relation and FST Morphological analyzer Unit 2 LM 15 weeks Hw5 Hw6 LM ngram models Smoothing Interpolation Katz backoff Addone GoodTuring etc Unit 3 PCS tagging and HMM 2 weeks Quiz 3 Hw7Hw9 HMM two types of HMM relation with WFA POS tagging ngram models and others Unit 4 Classification and sequence labeling problems 2 weeks Quiz 4 Hw10 Classification tasks Text classification WSD Sequence labeling problem IOB scheme POS tagging NE tagging treated as a tagging problem Chunking treated as a tagging problem Unit 5 IE and summary 1 week IE NE tagging Co reference Relation detection Main techniques Main techniques Probability theory The chain rule The Bayes rule The conditional independence Regexp regular language and regular grammar Main techniques cont FSA FST morphological analysis Simple FSTs in a pipeline can be very powerful The concept of LM and Smoothing HMM ngram model Viterbi algorithm The Markov assumption Main techniques cont Classification and sequence labeling problems Selecting features is very important Many problems can be treated as classification or sequence labeling problems Beam search Tools created English Tokenizer hw1 regexp Morphological analyzer hw4 FST LM hw5hw6 Taggen Unigram model hw3 FST Trigram model hw7hw9 HMM Classifier hw10 Using existing packages Carmel Mallet What s next What s next Othertasks 9 NLP 571 winter Ex parsing semantics discourse Supervised learning 9 NLP 572 winter Ex MaxEnt Na39ive Bayes SVM Unsupervised learning 9 NLP 575 winter Ex EM co training Tentative plan for LING 572 subject to change Unit 0 Introduction 1 week by Dan Jinguji Features trainingtesting Classification algorithms Unit1 Simple algorithms 2 weeks kNN Decision tree Na39I39ve Bayes LING 572 cont Unit 2 More sophisticated algorithms 25 weeks MaxEnt SVM Unit 3 sequence labeling problem 2 weeks TBL if time permits HMM and CRF Unit 4 system combination 1 week Other topics 15 weeks A head start with LING 572 Textbook none Tentative schedule httpcourseswashinqtonedulinq572winterO8svll abushtm 30 of material will be changed from last year s schedule Removing Rocchio Decision List Boosting Bagging selftraining and EM Adding CRF and SVM LING 575 Theme unsupervised tagging andor parsing Not a survey course Starting with EM modify it to improve the performance Goal labeled Fmeasure 20 50 LING 575 prerequisites Supervised parsing CYK algorithm Insideoutside algorithm Good at CC and debugging Comfortable with math formulas and statistics Inner loop of the Insideoutside algorithm Given an input sequence and 1 Calculate inside probability Base Case I 131k pNj wk Recursnve case jmq 2 PW NNs rltp d sltd 161 2 Calculate outside probability Base case 1 if j 1 alt1mgt j 0 otherwzse Recursnve case 0610741 Z iaAP WUVf gtNjNg gILe f g eq1 ZiafeqPNf gt Nng gep 1 fg e1 7 7 M pm St Mum lt MHZ 31M lt xNWQ Z m1 M M pm St N w lt INM w lt INWQ 3 39 M SNJNlt N INMZZ SALAH ADJMDZZ N N NM M SNJNlt N fmcl SNJAH NW9 S SJSIISUJBJBCI aq ampdn pue azneuuoN 397 WM m g 1 Mpasn sz N 1Ivu ADC ngw q gw w DZ quotquot404 R 511 12 I m b IPygP dJgyLNlt fNdb dva Ah 5 quotquotMI XN1Nlt NaWC s1unoo aq manoo 399 woo walJ05B epgsmo epgsm Introduction to Classification LING 570 Fei Xia Week 9 111907 Outline What is a classification problem How to solve a classification problem Case study What is a classification problem An example text classification task Task given an article predict its category Categories Politics sports entertainment travel Spam or not spam What kind of information is useful to solve the problem Classification task Task C is a finite set of labels aka categories classes Given a x decide its category y E C Instance x y x the thing to be labeledclassified y E C Data a set of instances Labeled data y is known Unlabeled data y is unknown Training data test data More examples Spam filtering Call center Sentiment detection Good vs Bad 5star system 1 2 3 4 5 POS tagging Task given a sentence predict the tag of each word in the sentence Is it a classification problem Categories noun verb adjective What information is useful What are the differences between the text classification task and POS tagging Sequence labeling problem Tokenization Word segmentation Task given a string break it into words Categories NB no break B with break B beginning I inside E end Exc1c2c3 C4 C5 c1NB c2B cBNB c4NB c5B c1B c2E c3B c4I c5E Relation to P08 tagging How to solve a classification problem Two stages Training stage Learner Training data classifier Testing stage Decoder Test data classifier classification results Others Preprocessing stage Postprocessing stage Evaluation How to represent x The number of possible values for x could be in nHe Representing x as a feature vector xltv1v2 vngt xltf1v1f2v2 fnvngt What is a good feature 11 An example Task text classification Categories sports entertainment living politics doc1 debate immigration Iraq doc2 suspension Dolphins receiver doc3 song filmmakers charts rap Training data attributevalue table Input to the training stage f1 f2 fK Target x1 0 1 25 1000 c2 x2 25 O O 20 c1 A classifier It is the output of the training stage Narrow definition fx y x is input y E C More general definition fx ci scorei ci e C Test stage Input test data and a x1 x2 x3 classifier G1 01 04 0 Output a decision matrix 02 09 0 0 03 0 01 04 c4 0 05 06 Evaluation Gold System TP FP FN TN Precision TPTPFP Recall TPTPFN Fscore 2PRPR AccuracyTPTNTPTNFPFN Fscore or Accuracy Why Fscore An Example GOId System 1 4 5 90 Accuracy91 Precision 15 Recall 16 21516 F score W 211 Steps for solving a classification task Prepare the data Convert the task into a classification problem optional Split data into trainingdevtest Convert the data into attributevalue table Training Testing Postprocessing optional convert the label sequence to something else Evaluation Important subtasks for you Convert the problem into a classification task Converting the data into attributevalue table Define feature types Feature selection Convert an instance into a feature vector Select a classification algorithm Classification algorithms Decision Tree DT K nearest neighbor kNN Na39ive Bayes NB Maximum Entropy MaxEnt Supporting vector machine SVM Conditional random field CRF Will be covered in LING572 20 More about attributevalue table 21 Attributevalue table f1 f2 fK Target x1 0 1 25 1000 c2 x2 25 O O 20 c1 22 Binary features vs realvalued features Some ML methods can use realvalued features others cannot Very often we convert realvalued features into binary ones temp 69 Use one threshold lsTempBelow60 O Use multiple thresholds TempBelowO O TempBetOAnd50 O TempBet51And8O 1 TempAbove810 23 Feature templates vs Features Afeature template CurWord Corresponding features CurWordMary CurWordthe CurWordbook CurWordbuy One feature template corresponds to many features 24 Feature templates vs features cont curWord book can be seen as a shorthand of curWordtheO curWordaO curWordMaryO curWordbook1 25 An example Mary will come tomorrow W1 Wo W1 Wo W1 Y x1 ltsgt Mary ltsgt Mary will PN x2 Mary will Mary will come V x3 will come will come tomorrow V This can be seen as a shorthand of a much bigger table 26 Attributevalue table It is a very sparse matrix In practice it is often represented in a dense format Ex x1ltf10 f20 f31 f40 f51 f60gt x1 f31 f51 x1 f3 f5 27 Case study 28 Case study I The NE tagging task Ex John visited New York last Friday person John visited location New York time last Friday Is it a classification problem JohnpersonS visited NewlocationB YorkIocationE lasttimeB FridaytimeE What is x What is y 29 Case study II Task identify tables in a document What is x What is y What features are useful 30 An example Wu mm m 3mm whm v13 nu ma sunc algamhm on me IGT dam ma Gummy was any 5D 2 W In mm m Minding m the language xmmes ncmung m the aman Nelda m mum u 65 Aquot 1r lan Because n wags Hams umduad mm m in 0mm dam haatsu39wpmz NLF Mali maul icmllv naggem and pmvrjmz scald mm 0mm 31 GT m bamscrapving NLP maxi Sin5 the ngm hm in 101 data dam m Cami mm r 39 F q q r r w 7 m men wk ax Rimming mluuon Emblem whm IGI quotMum me me wartom and um lulzuage name Wading In the aman an the mm A 1m 39 39 a m g V m the 2mm Allanlug us to rmly any good x3 zluuan algauthms such M 5m 8 a1 20m Ng cmich u v v huuuunp 0 mm such 11 tum 311 Enriching GT In mmm study Km and was man we pm mm a unewcp mots m cnnch GT data m pm the mum umulatmn nth an Enghgh parser Case study Ill Task Co reference task Ex John called Mary on Monday She was not at home He left a message on her answer machine What is x What is y What features are useful 32 Summary Important concepts Instance xy Labeled vs unlabeled data Training data vs test data Training stage vs test stage Learner vs decoder Classifier Accuracy vs precision recall f score 33 Summary cont Attributevalue table vs decision matrix Feature vs Feature template Binary features vs realvalued features Number of features can be huge Representation of attributevalue table 34


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Kyle Maynard Purdue

"When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the I made $280 on my first study guide!"

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.