Advanced Language Techologies
Advanced Language Techologies CS 6740
Popular in Course
Popular in ComputerScienence
This 9 page Class Notes was uploaded by Lacey Collier on Saturday September 26, 2015. The Class Notes belongs to CS 6740 at Cornell University taught by Staff in Fall. Since its upload, it has received 58 views. For similar materials see /class/214337/cs-6740-cornell-university in ComputerScienence at Cornell University.
Reviews for Advanced Language Techologies
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/26/15
086740INFO6300 Sentiment Analysis Today Background and motivation Sentiment categorization Text categorization Next class Product review mining Later in the semester Opinion extraction and summarization Subjectivity vs Sentiment Subjective sentences express private states ie internal mental or emotional states speculations beliefs emotions evaluations goals opinions judgments 1 Jill said quotI hate Billquot 2 Jack thought he won the race 3 Judy hoped her presentation would go well Subjectivity vs Sentiment Sentiment expressions are a type of subjective expression expressions of positive and negative emotions judgments evaluations 1 Jill said quotI hate Billquot 2 Jack thought he won the race 3 Judy hoped her presentation would go well Information Retrieval mm Welr images m rm M Desktop III spiiiunssuugissuicssAsia a Web Restlls io ulahnul 353m lurnliiliwilanngle Ai icts 012 secnnds A2 Belcw Lll lllOd The iizmu piwvss ms and npinl mime an Naiiuiai Wurlit Eusmess Ebts ciniia lel Ed Eiucsmri Raw Of ce ElEEUElly Cummiss wwwrirherald m Wavgailsa lrnlthEX sinuin 7 Wk m Digital MUElC News uiismiy ME incus imains an meim arrt its pa39EM Google am majnr mitia was uiigirallyhletwihlhe Us Paailarid 0mm gust WWWUlgl llilu l ilewsLinAmi L ii i ocigle39 Dream 39l Asia gt We a si gt gt O ioes iii Kuala iimpui S Jnhuiaiid Ferak Pinps1y Malaysia Bin ziiymaiaysia biagsime cm ijvd upliiiims am tommzhts nri viala si y a mrmRugl haN sidiMallySl ill nasaisindiECJriuihjHual7EstatPV 25i itiiis Google Dii39eciorvr Re39 onal gt Asia gt Pl llllEEl39leS gt BLsiness and T n SW01 Pthll lMEiEEt Enwm w O icel ieiuaiin cunsuhalinns is nularizzlims at pnusinnsinsssraiiujmimmyLegai giiitesi Asia Business Mecia This ias uraii ninveii mun be lha case iiis GvuiiIeBaidu ardTauaauEhavl Uniim nusi Question Answering Queries of a Subjective Nature How have business views towards global climate change varied over the past decade How have consumers and businesses responded to anSWel39 the release of Gore s An Inconvenient Truthquot supporting text I What is the reaction in Asia to the to the Bush policy towards the Kyoto Protocol teXt COIIeCtlon I Who were the first people to talk about bailout options for banks in the current economic crisis mg fgs 39m What does Sarah Palin think about ltxgt WilburIrma Blame Game Netflix offers 1 mln prize for better movte picks st cm i zuus ii usm a smithsthee i em LOS ANGELES out Reuteisyohiihe DVD iehtai cumpany urtxiitsttiixinci Netiiiy ihe NFLX o Quule Piu le T Research Em Sunda Last amieiiheee that it Wuuld pay an miiiieh tn the hist pEYsDn tn develup ia ulNeti lix 2755 ch e thumba setwaieteimpieyetheaeeu cy smuviEYEEummEndaliun Revenue mm 39mm system by in pattern EPS Netiiiys Webrbased ieeemmeheatieh system learns what kinds ethims suhseiiheis like by askm tei iate the lms they watch The system 1m Market Cap steamy a Wm quot9 4 39quotE t eh Vecummends iists ulsimilavlilles uniquelu eaeh usei Netiiiy chaiimah aha chietEyeeutiye Reed Hastings whei made his teituhe designing su waie pYEdiElEd that Webrbased ieeeimmeheatieih systems Wuuld play ah inEYEasmgly signi cant cummeicial ieiie in the tutuie Right HDW we ie eiiymg the MeeeiTyeisieih etwhat is pussible Hastings said in a statement ann uncing the Wine We Want tei build a FEllali and establishing the Netiiiy Piize is the next step 1d 5d 3m 5m ly 2y 5y max The Winning settwaie dEsignEl must implDVE the accuracy ethetiiiy s EUHEM mums PICYHEES Smywayis Yalmg system by in pattern The system 0 P H is cumpused etmeie thahi hiiiieh i WW 39 E lings in which suhseiiheis use the tn gtAnalvsl 39eseavch ve stais tn descvibe huw much they liked a r m i eeitaischaiee Eeslpicluies The Netiiiy system uses these Yatmgs lu iiemihe iastzt heiiis 6 windows Live predict huw mahy stais a eehsumei vewsIuzsnow r Wuuld assign lu each uHhe ES UUU titles in its iihiaiy ihgitiiiaai39xi g Early Work Learning semantic orientation of ad39ectives llatzivassiloglou amp McKeown ACL 1997f Polarity beautiful vs ugly Effects of adjective polarity and gradability on sentence subjectivity llatzivassiloglou amp Wiebe COLNG 2000 Gradability ugly No subjectivity sentence classifiers created or evaluated C86740INF06300 Sentiment Categorization Background and motivation Sentiment categorization Text categorization Product review mining Sentiment Categorization Is the overall sentiment in the Document document positive negative Pang et al EMNLP 2002 Tu rney ACL 2002 Turney amp Littman TOIS 2003 Text Classification binary E D andF MAN TO BUY JNTO HONGKONG FIRM TheU K Based commodity house E D andF Man Ltd and Singapore s Yeo Hiap Seng Ltd joint1 announced that Man will buy a substantiaI stake in Yeo s 71 1 pct he1d unit Yeo Hia Seng Enterprises Ltd Man will deve1op the locally 1isted so drinks manufacturer into a securities and commodities brokerage arm and will rename the rm Man Paci c Holdings Ltd About a cotporatc acquisition Text Classification multiclass E D andF MAN TO BUY lNTO HONG KONG FIRM The U K Based commodity house E D andF Man Ltd and Singapore s Yeo Hiap Seng Ltd jointly Ltd Man will develop the locally listed so drinks cturer into a securities an commoditi brokerage arm and will rename the rm Man Paci c Holdings Ltd business I I travel I I music I sports I Text Classification Assign pieces of text to predefined categories based on content Types of text Documents typical Paragraphs Sentences VWWVsites Different types of categories Bytopic Byfunction By author By style Text Classification Applications HelpDesk Support Who is an appropriate expert for a particular problem Information Filtering Agent Which news articles are interesting to a particular person Relevance Feedback What are other documents relevant for a particular query Knowledge Management Organizing a document database by semantic categories Focused Crawling Find all the WVWV pages on a particular topic Why Learn Text Classifiers Classifying documents by hand is costly and does not scale well eg browse all VWWV pages to filter out those about job announcements Humans are not really good at constructing text classification rules It is hard to write good queries Sometimes there is no expert available eg rules for routing email Sometimes training data is cheap and plenty eg existing databases TraininL7 SCI Learning Setting RealWorld Process Classi er Learner New Documents Learning Setting Training Examples ML algorithm Goal Find a classification rule hwith low prediction error wrt the selected loss function on new examples from distribution PXO Prediction Error and Loss Function Loss function Assigns amount of penality when making a mistake ZeroOne Prediction error 0839 Also generalization error or true error Probability of making an error on a new example drawn from the same distribution F X Y Equivalent Expected value of loss function Bayes Rule for Text Classification Want to compute Use Bayes theorem to get PX Assumptions of NB Probability Estimations Words occur independently given the class Each document is in exactly one class Word probabilities do not depend on the document length Unigram Model for Text Categorization What is the probability of seeing a document in class 1 vs class 1 Need to estimate PXx Y1PV1 and PXx V1 PY1 Assume that words are drawn randomly from class dependent lexicons with replacement Result IX is the total number ofwords in the document x w is the ith word in the document Is PX my 1 1391 PM 0le 1 i1 Naive Bayes Classifier for Text Train a separate model for each class Prior probabilities P00 Classi cation rule predict class 1 if else predict class 1 Estimating the Parameters Count frequencies in training data n number oftraining examples pasneg number of positivynegative training examples TFwy number oftimes word w occurs in class y ly number ofwords occurring in documents in class y Estimating PY Fraction sitive negative examples in training dat Estimating PVY Smoothing with Laplace add1 estimate TF W1 1 PW wY y 2 Pros and Cons for Na39lve Bayes Pros Explicit theoretical foundation Relatively effective Very simple Fast in training and classification Cons Multinomial model independence assumption clearly wrong for text Performs worse than other methods in practice On some datasets it really fails badly MultiClass vs MultiLabel Cannot learn multilabel rules directly Most classifiers assume that each document is in exactly one class Many classifiers can only learn binary classification rules Most common solution MultiLabel Learn one binary classifierfor each label Attach all labels for which some classifier says positive Most common solution MultiClass Learn one binary classifierfor each label Put example into the class with the highest probability or some approximation thereof Performance Measures PrecisionRecall BreakEven Point Intersection of PRcurve with the identity line Macroaveraging First compute the measure then compute average Means average over tasks Microaveraging First average the elements of the contingency table then comput the measure Means average over each individual classification decision Experimental Results Reuters N ewswir e WebKB Collection Ohsumed MeSH o 90 cate gories o 4 categories o 20 cate gories o 9603 training doc o 4183 training doc o 10000 training doc o 3299 test doc o 226 test doc o 10000 test doc o N27000 features o N38000 features o N38000 features microaveraged PIR Reuters VWebKB Ohsumed int A V l y 72 3 820 62 4 3Rocchio Algorithm 799 741 61 5 c45 Decision Tree quot 794 79 1 567 k Nearest Neighbors 826 805 63 4 SVM CSsMoIN F06300 Sentiment Categorization Background and motivation Sentiment categorization 7 Text categorization Product review mining Sentiment Categorization Appiied standard text categorization aigoritnrns 7 Features bag at War s e Ciassiner machine iearning aigaritnrns suns naive saves mam Data 7 Reviews tram iMDb archive 7 752 negative ism pDSitiVE Sentiment categorization appears to be harder than categorizing bv topic e g 82aa accuracv tor movie reviews E q Pang Lee Vaithyanathan 5mm 20on What is the problem eTnis iaptop is a greaidea e A greatdeaor rnedia attention surrounded tne reiease oftne new iaptop rnodei e if you thinkthis iaptop is a greatdeai i ve got a nice oridge foryou to buy e We protagonist tries to protect ner good narne Examvies rram Liiiian tee Sentiment Categorization nrsentence Hygothesis sentencerievei suoiectiwty decisions wouid neip Pang and Lee ACL 1004 Senten ce Ievel Subjectivity Detector Text categorization eTraining dataMW Subiective sentences muvie review snippets ennuiutieniuinatueseuin obiective sentences muvie pint summaries einoueuin 7 Test sentence Output a subjectivity score Pang and tee Act 1004 Sentiment Categorization u I a II nrsentence Subjective pasneg sentence review7 extract Not good enougn Pang and tee Act 1004 Minimum Cut Classi cation Algorithm Based on Biurn amp Cnawia 20011 Boykov et ai 1999 Subiectiyity constraints e Eeii tne senteneeeieyei subjectivity eiassinei SEEIrE u 1 Conerence constraints e Assumptiun nearby sentences snuuid snare Subjeftivity status e Senteneeieyei proximity SEDrES Pang and tee Act 1004 Sentencelevel Subjectivity Detector Minimum cuts in graphs Pang and tee Act 1004
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'