New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Advanced Language Techologies

by: Lacey Collier

Advanced Language Techologies CS 6740

Marketplace > Cornell University > ComputerScienence > CS 6740 > Advanced Language Techologies
Lacey Collier
GPA 3.84


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in ComputerScienence

This 9 page Class Notes was uploaded by Lacey Collier on Saturday September 26, 2015. The Class Notes belongs to CS 6740 at Cornell University taught by Staff in Fall. Since its upload, it has received 58 views. For similar materials see /class/214337/cs-6740-cornell-university in ComputerScienence at Cornell University.

Similar to CS 6740 at Cornell

Popular in ComputerScienence


Reviews for Advanced Language Techologies


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/26/15
086740INFO6300 Sentiment Analysis Today Background and motivation Sentiment categorization Text categorization Next class Product review mining Later in the semester Opinion extraction and summarization Subjectivity vs Sentiment Subjective sentences express private states ie internal mental or emotional states speculations beliefs emotions evaluations goals opinions judgments 1 Jill said quotI hate Billquot 2 Jack thought he won the race 3 Judy hoped her presentation would go well Subjectivity vs Sentiment Sentiment expressions are a type of subjective expression expressions of positive and negative emotions judgments evaluations 1 Jill said quotI hate Billquot 2 Jack thought he won the race 3 Judy hoped her presentation would go well Information Retrieval mm Welr images m rm M Desktop III spiiiunssuugissuicssAsia a Web Restlls io ulahnul 353m lurnliiliwilanngle Ai icts 012 secnnds A2 Belcw Lll lllOd The iizmu piwvss ms and npinl mime an Naiiuiai Wurlit Eusmess Ebts ciniia lel Ed Eiucsmri Raw Of ce ElEEUElly Cummiss wwwrirherald m Wavgailsa lrnlthEX sinuin 7 Wk m Digital MUElC News uiismiy ME incus imains an meim arrt its pa39EM Google am majnr mitia was uiigirallyhletwihlhe Us Paailarid 0mm gust WWWUlgl llilu l ilewsLinAmi L ii i ocigle39 Dream 39l Asia gt We a si gt gt O ioes iii Kuala iimpui S Jnhuiaiid Ferak Pinps1y Malaysia Bin ziiymaiaysia biagsime cm ijvd upliiiims am tommzhts nri viala si y a mrmRugl haN sidiMallySl ill nasaisindiECJriuihjHual7EstatPV 25i itiiis Google Dii39eciorvr Re39 onal gt Asia gt Pl llllEEl39leS gt BLsiness and T n SW01 Pthll lMEiEEt Enwm w O icel ieiuaiin cunsuhalinns is nularizzlims at pnusinnsinsssraiiujmimmyLegai giiitesi Asia Business Mecia This ias uraii ninveii mun be lha case iiis GvuiiIeBaidu ardTauaauEhavl Uniim nusi Question Answering Queries of a Subjective Nature How have business views towards global climate change varied over the past decade How have consumers and businesses responded to anSWel39 the release of Gore s An Inconvenient Truthquot supporting text I What is the reaction in Asia to the to the Bush policy towards the Kyoto Protocol teXt COIIeCtlon I Who were the first people to talk about bailout options for banks in the current economic crisis mg fgs 39m What does Sarah Palin think about ltxgt WilburIrma Blame Game Netflix offers 1 mln prize for better movte picks st cm i zuus ii usm a smithsthee i em LOS ANGELES out Reuteisyohiihe DVD iehtai cumpany urtxiitsttiixinci Netiiiy ihe NFLX o Quule Piu le T Research Em Sunda Last amieiiheee that it Wuuld pay an miiiieh tn the hist pEYsDn tn develup ia ulNeti lix 2755 ch e thumba setwaieteimpieyetheaeeu cy smuviEYEEummEndaliun Revenue mm 39mm system by in pattern EPS Netiiiys Webrbased ieeemmeheatieh system learns what kinds ethims suhseiiheis like by askm tei iate the lms they watch The system 1m Market Cap steamy a Wm quot9 4 39quotE t eh Vecummends iists ulsimilavlilles uniquelu eaeh usei Netiiiy chaiimah aha chietEyeeutiye Reed Hastings whei made his teituhe designing su waie pYEdiElEd that Webrbased ieeeimmeheatieih systems Wuuld play ah inEYEasmgly signi cant cummeicial ieiie in the tutuie Right HDW we ie eiiymg the MeeeiTyeisieih etwhat is pussible Hastings said in a statement ann uncing the Wine We Want tei build a FEllali and establishing the Netiiiy Piize is the next step 1d 5d 3m 5m ly 2y 5y max The Winning settwaie dEsignEl must implDVE the accuracy ethetiiiy s EUHEM mums PICYHEES Smywayis Yalmg system by in pattern The system 0 P H is cumpused etmeie thahi hiiiieh i WW 39 E lings in which suhseiiheis use the tn gtAnalvsl 39eseavch ve stais tn descvibe huw much they liked a r m i eeitaischaiee Eeslpicluies The Netiiiy system uses these Yatmgs lu iiemihe iastzt heiiis 6 windows Live predict huw mahy stais a eehsumei vewsIuzsnow r Wuuld assign lu each uHhe ES UUU titles in its iihiaiy ihgitiiiaai39xi g Early Work Learning semantic orientation of ad39ectives llatzivassiloglou amp McKeown ACL 1997f Polarity beautiful vs ugly Effects of adjective polarity and gradability on sentence subjectivity llatzivassiloglou amp Wiebe COLNG 2000 Gradability ugly No subjectivity sentence classifiers created or evaluated C86740INF06300 Sentiment Categorization Background and motivation Sentiment categorization Text categorization Product review mining Sentiment Categorization Is the overall sentiment in the Document document positive negative Pang et al EMNLP 2002 Tu rney ACL 2002 Turney amp Littman TOIS 2003 Text Classification binary E D andF MAN TO BUY JNTO HONGKONG FIRM TheU K Based commodity house E D andF Man Ltd and Singapore s Yeo Hiap Seng Ltd joint1 announced that Man will buy a substantiaI stake in Yeo s 71 1 pct he1d unit Yeo Hia Seng Enterprises Ltd Man will deve1op the locally 1isted so drinks manufacturer into a securities and commodities brokerage arm and will rename the rm Man Paci c Holdings Ltd About a cotporatc acquisition Text Classification multiclass E D andF MAN TO BUY lNTO HONG KONG FIRM The U K Based commodity house E D andF Man Ltd and Singapore s Yeo Hiap Seng Ltd jointly Ltd Man will develop the locally listed so drinks cturer into a securities an commoditi brokerage arm and will rename the rm Man Paci c Holdings Ltd business I I travel I I music I sports I Text Classification Assign pieces of text to predefined categories based on content Types of text Documents typical Paragraphs Sentences VWWVsites Different types of categories Bytopic Byfunction By author By style Text Classification Applications HelpDesk Support Who is an appropriate expert for a particular problem Information Filtering Agent Which news articles are interesting to a particular person Relevance Feedback What are other documents relevant for a particular query Knowledge Management Organizing a document database by semantic categories Focused Crawling Find all the WVWV pages on a particular topic Why Learn Text Classifiers Classifying documents by hand is costly and does not scale well eg browse all VWWV pages to filter out those about job announcements Humans are not really good at constructing text classification rules It is hard to write good queries Sometimes there is no expert available eg rules for routing email Sometimes training data is cheap and plenty eg existing databases TraininL7 SCI Learning Setting RealWorld Process Classi er Learner New Documents Learning Setting Training Examples ML algorithm Goal Find a classification rule hwith low prediction error wrt the selected loss function on new examples from distribution PXO Prediction Error and Loss Function Loss function Assigns amount of penality when making a mistake ZeroOne Prediction error 0839 Also generalization error or true error Probability of making an error on a new example drawn from the same distribution F X Y Equivalent Expected value of loss function Bayes Rule for Text Classification Want to compute Use Bayes theorem to get PX Assumptions of NB Probability Estimations Words occur independently given the class Each document is in exactly one class Word probabilities do not depend on the document length Unigram Model for Text Categorization What is the probability of seeing a document in class 1 vs class 1 Need to estimate PXx Y1PV1 and PXx V1 PY1 Assume that words are drawn randomly from class dependent lexicons with replacement Result IX is the total number ofwords in the document x w is the ith word in the document Is PX my 1 1391 PM 0le 1 i1 Naive Bayes Classifier for Text Train a separate model for each class Prior probabilities P00 Classi cation rule predict class 1 if else predict class 1 Estimating the Parameters Count frequencies in training data n number oftraining examples pasneg number of positivynegative training examples TFwy number oftimes word w occurs in class y ly number ofwords occurring in documents in class y Estimating PY Fraction sitive negative examples in training dat Estimating PVY Smoothing with Laplace add1 estimate TF W1 1 PW wY y 2 Pros and Cons for Na39lve Bayes Pros Explicit theoretical foundation Relatively effective Very simple Fast in training and classification Cons Multinomial model independence assumption clearly wrong for text Performs worse than other methods in practice On some datasets it really fails badly MultiClass vs MultiLabel Cannot learn multilabel rules directly Most classifiers assume that each document is in exactly one class Many classifiers can only learn binary classification rules Most common solution MultiLabel Learn one binary classifierfor each label Attach all labels for which some classifier says positive Most common solution MultiClass Learn one binary classifierfor each label Put example into the class with the highest probability or some approximation thereof Performance Measures PrecisionRecall BreakEven Point Intersection of PRcurve with the identity line Macroaveraging First compute the measure then compute average Means average over tasks Microaveraging First average the elements of the contingency table then comput the measure Means average over each individual classification decision Experimental Results Reuters N ewswir e WebKB Collection Ohsumed MeSH o 90 cate gories o 4 categories o 20 cate gories o 9603 training doc o 4183 training doc o 10000 training doc o 3299 test doc o 226 test doc o 10000 test doc o N27000 features o N38000 features o N38000 features microaveraged PIR Reuters VWebKB Ohsumed int A V l y 72 3 820 62 4 3Rocchio Algorithm 799 741 61 5 c45 Decision Tree quot 794 79 1 567 k Nearest Neighbors 826 805 63 4 SVM CSsMoIN F06300 Sentiment Categorization Background and motivation Sentiment categorization 7 Text categorization Product review mining Sentiment Categorization Appiied standard text categorization aigoritnrns 7 Features bag at War s e Ciassiner machine iearning aigaritnrns suns naive saves mam Data 7 Reviews tram iMDb archive 7 752 negative ism pDSitiVE Sentiment categorization appears to be harder than categorizing bv topic e g 82aa accuracv tor movie reviews E q Pang Lee Vaithyanathan 5mm 20on What is the problem eTnis iaptop is a greaidea e A greatdeaor rnedia attention surrounded tne reiease oftne new iaptop rnodei e if you thinkthis iaptop is a greatdeai i ve got a nice oridge foryou to buy e We protagonist tries to protect ner good narne Examvies rram Liiiian tee Sentiment Categorization nrsentence Hygothesis sentencerievei suoiectiwty decisions wouid neip Pang and Lee ACL 1004 Senten ce Ievel Subjectivity Detector Text categorization eTraining dataMW Subiective sentences muvie review snippets ennuiutieniuinatueseuin obiective sentences muvie pint summaries einoueuin 7 Test sentence Output a subjectivity score Pang and tee Act 1004 Sentiment Categorization u I a II nrsentence Subjective pasneg sentence review7 extract Not good enougn Pang and tee Act 1004 Minimum Cut Classi cation Algorithm Based on Biurn amp Cnawia 20011 Boykov et ai 1999 Subiectiyity constraints e Eeii tne senteneeeieyei subjectivity eiassinei SEEIrE u 1 Conerence constraints e Assumptiun nearby sentences snuuid snare Subjeftivity status e Senteneeieyei proximity SEDrES Pang and tee Act 1004 Sentencelevel Subjectivity Detector Minimum cuts in graphs Pang and tee Act 1004


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Anthony Lee UC Santa Barbara

"I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.