Class Note for CMPSCI 585 at UMass(12)
Class Note for CMPSCI 585 at UMass(12)
Popular in Course
Popular in Department
This 42 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 23 views.
Reviews for Class Note for CMPSCI 585 at UMass(12)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
speech recognition machine translation OCR spell correction maximize p1T text categorization parsing part of speech tagging maximize pT I maximize joint probability pIT speech recognition amp other tasks above train HMM PCFG n grams clusters maximum likelihood smoothing max peldata max pe data p6pdata6 Sequences n grams FSMs FSTs Viterbi collocations Vectors na39l39ve Bayes clustering word senses Trees PCFGs CKY Earley Syntactic features morphology clustering collocations really so altemative Note Just as in probability bigger weights are better ngrams log pw7 w5w6 ogw8 w6 W7 PCFG log pNP VP S log pPapa NP log pVP PP VP HMM tagging log pt7 t5 t6 log pW7 t7 NOlsy Channel log psource log pdata source Na39l39ve Bayes IogCIass ogfeature1 Class ogfeature2 Class Can regard any linguistic object as a collection of features here doc a collection of words but could have nonword features Weight of the object total weight of features Our weights have always been conditional logprobs s 0 but that is going to change in a few minutes Can estimate para meters automatically Our results are more meaningful results can be meaningfully combined 2 modularity Contains 9596 W Contains 5 02 Contains a dollar amount under Contains an imperative sentence Reading level 8 h grade Mentions money use word classes andor regexp to detect this 299 26 50 of spam has this 7 25x more likely than in ham 5 02 Contains a dollar amount under ul here are the emails With both features 7 only 25x Naive Bayes claims 5945 of spam has both features 7 259225X more likely than in ham 90 of spam has this 7 9x more likely than in ham Mentions money Can adjust scores to compensate for feature overlap smart money Stole already incllded log pro agjustef 9 5529 3 9 Q s v 9amp50 629 5 02 Contains a dollar amount under 1 56 85 23 Mentions money independent Contains Contains Contains a dollar amount under Contains an imperative sentence Reading level 7 h grade Mentions money use word classes andor regexp to detect this independent Contains I 6 Contains 0 Contains a dollar amount under Contains an imperative sentence Reading level 7 h grade Mentions money use word classes andor regexp to detect this log pfeats spam 577 pfeats spam exp 577 3205 pfeats spam exp 577 3205 120 m is the email message xi is weight of feature i fimEO1 according to whether m has feature i More generally allow fim count or strength of feature 120 is a normalizing factor making Em pm spam1 summed over all possible messages ml hard to find But But So instead Spam 6 weight of this feature is og pspam a constant spam and Contains old spam model s weight for contains Buy spam and Contains ham 6 weight of this feature is og pIing a constant ham and Contains old ing model s weight for contains Buy ham and Contains PmIC 120 eXP 2i Ki fimC Old New Question Question Question OUCH Entropy 051 log 051 0025 log 0025 029 log 029 Largest if probabilities are evenly distributed 22 Amazing Theorem PmC 120 eXP 2i Ki fimC T39 l39 ITO solve the maxent problem we use Lagrange multipliers L 7 7 EM 10mm 7 Z 6 Emmiix 7 0 7 it MK 7 1 0L l logpx 7 EarXx 7 L 7142 pr 1quot71 vxp Hflx 1 29 p17 Z 0X1 Z mum plxl 26 3x1191 x I So feature constraints l maxent implies exponential family a Problem is convex so solution is unique Define two submanifolds on the probability simplex Ax The first is E the set of all exponential family distributions based on a particular set of features H The second is N1 the set of all distributions that satisfy the feature expectation constraints They intersect at a single distribution pr the maxent maximum likelihood Exponential Model Likelihood t JliiliHIIIIIlllllIlllllll Maximum Likelihood Conditional Models Given a model form choose values of parameters to maximize the conditional likelihood of the data Exponential model form for a data set CD 31 logPClDA ZlogPcl 11 2 log tzl 39I t1E39ll ZcpoLfCi39d Building a Maxent Model 5 m wquot Define features indicator functions over data points Features represent sets of data points which are distinctive enough to deserve model parameters Usually features are added incrementally to target errors For any given feature weights we want to be able to calculate Data conditional likelihood Derivative ofthe likelihood wrt each feature weight Use expectations of each feature according to the model Find the optimum feature weights next part The Likelihood Value The log conditional likelihood is a function of the iid data CD and the parameters A logPCDl log HPcldl ZlogPc 1221 Calgary cde39l If there aren t many values of c it s easy to calculate epoAltcd 2 log lcdECU Zepo m c We can separate this into two components 10gPClDA Z logepoAxc 7 Z ti141 LtJFJH logPC l 04 logzepoWcld logPC DJ Ni M01 The derivative is the difference between the derivatives of each component The Derivative I Numerator 6 logexp Atl cd a A d 6N0 Mgr f 51 61 a 1 a Afcd ugd zm T l Zflad luli if m Derivative of the numerator is the empirical countf c The Derivative Denominator a 1 A d MA gf4mZ 39 c M 61 I z Z 1 agepoL cgd gammyZexpz w kmd all I Z i Ji 7 Zeizimczmazmmm whenZepoMc d 1 61 CXPZL c39sd aziflc39 ZEN c Zepomwdgt 61 Z ZPc39id1fc39d predictedc0untfl woman The Derivative III 3991 P C D A l actual countfC predicted countfi 1 l The optimum parameters are the ones for which each feature s predicted expectation equals its empirical expectation The optimum distribution is Always unique but parameters may not be unique Always exists if features counts are from actual data Features can have high model expectations predicted counts either because they have large weights or because they occur with other features which have large weights quot Summary We have a function to optimize I J 1 r logPC i DJ log amazwm ZepoAIfxgd We know the function s derivatives BlogPC i Di6t 39 a 39 39 I predicted countfl39t Perfect situation for general optimization Part II Comparison to NaiveBayes NaiveBayes is another tool for classification We have a bunch of random variables data features which we would like to use to predict another variable the class The NaiveBayes likelihood over classes is PC la NaiveBayes is just an exponential model Pc PM l6 explog 135 210 PM 0 c 2196 9 P 39 lcl ZexplogPc ZlogP lc explzmwil ZexpZi lfwdtc39 4 quotmy aquot S quot I Comparison to NaiveBayes 4 The primary differences between Naive Bayes and maxent models are Naive Bayes Maxent Trained to maximize joint Trained to maximize the likelihood of data and conditional likelihood of classes classes Features assumed to Features weights take supply independent feature dependence into evidence account Feature weights can be set Feature weights must be independently mutually estimated Features must be of the Features need not be of the conjunctive 1029 c cl conjunctive form but form usually are max pgtdata max p0 data pgt pdatagt PO Smoothing Priors MA What if we had a prior expectation that parameter values wouldn t be very large We could then balance evidence suggesting large parameters or infinite against our prior The evidence would never totally defeat the prior and parameters would be smoothed and kept finite We can do this explicitly by changing the optimization objective to maximum posterior likelihood log PCi D logPU logPC DJ Posterior Prior Evidence Gaussian or quadratic priors Intuition parameters shouldn t be large Formalization prior expectation that each parameter will be distributed according to a gaussian with mean u and variance 02 PM 1 ex 7 7 71 V21 p Penalizes parameters for drifting to far from their mean prior value usually u0 2521 works surprisingly well They don t even capitalize my name anymore
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'