Class Note for CMPSCI 683 at UMass(17)
Class Note for CMPSCI 683 at UMass(17)
Popular in Course
Popular in Department
This 12 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 17 views.
Reviews for Class Note for CMPSCI 683 at UMass(17)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Homework due Tuesday at 5 on November 29 Victor Lesser CMPSCI 683 Fall 2004 Waterman Z o continuation of Decisiontree Complete space of finite discretevalued functions relative to available attributes Maintains only a single current hypothesis decision tree 39 The Version Space Algorithm Performs no backtracking in its search Uses all training examples at each step in the search to make statisticallybased decisions regarding how to refine current hypothesis Algorithms 39 Neural Networks V lineman X y lineman Al A hypothesis over ts the training examples if there is some other hypothesis that fits Selects in favor of shorter trees over longer ones the training examples less well yet actually performs better over the entire Selects trees that place the attributes d39Str39bumquot f 39quotStances with highest information gain closest to 39 Causes the root Noisy Data construct tree to explain noisy data Lack of Examples small number of examples associated with leaf Colncldentai irregularities cause the construction of more detail tree than warranted v mama 5 v mama a Add new attribute value unknown Stop growing the tree earlier before it reaches the point where it perfectly classifies the training data Estimate missing value based on other examples for which this attribute has a known value Assign value that is most common among training I Postprune the tree examples at parent node Use nontraining instances to evaluate based on a I Instantiated example With all possible values of statistical test to estimate whether pruning a particular missing attribute but assign weights to each node is likelyto produce an improvement beyond the instance based on likelihood of missing value being training set a particular value given the distribution of examples in the parent node Modify decision tree algorithm to take into account weighting v uxxm mfmm 7 v uxxm mfmm a it 0 Handling multivalued large attributes and classification Need another measure of information gain lnformation gain measure gives inappropriate indication of attributed usefulness because of likelihood of singleton values Gain ratio Gain over intrinsic information content v uxxvcssmrmm g Continuousvalued attributes Discretize Example Preprocess to find out which ranges give the most useful information for classification purposes v uxxm mfmm 1D Sort instances based on value of an attribute eg temperature Identify adjacent examples that differ in their target classification Generate a set of candidate thresholds midway between corresponding examples Use information gain to decide appropriate threshold v uxxvcssmrmm M VWIWaitr ltgt Patronr Somev Patronr Fuii A Hunng A TyperFrench v Patronr Fuii A HungryU A TypeU Thai A FriSatr v Patronr Fuli A HungryU A TyperBurger Each example is a logical sentence Altr A aBarr A aFrir A 5 VWIWaitr A decision tree is consistent with the data iff the corresponding KB is consistent v uxxm mfmm 2 Can use Simpler Approach than Decision Tree Algorithm ssttming Complete Consistency lncrementally present examples lncrementally re ne hypothesis pomsan in Add Unknan Example posing a Di negative and anvst CunentHypnlnesis MnnnlnniL View giEvgivtign nlCunent Best Hypothesis nevei my to eliminate any example 39 a b C d e pomsan i5 Maintain single hypothesis Adjust to new example in nrdertn maintain consistency a An example can be consistent with the current hypothesis on t can he lalse negative ittne nypotnesis says t is negative but in tact t is positive or lalse pnsilne ittne nypotnesis says t is positive but in tact t is negative Generalizationspecializatinn Dropping conditions Adding conditions mttmsllmlb it function CUKkiNerlli lrllARNlNtgtxv1mpltrrelllrnl a hypnrnesis H Hny hypothesis consistent with the rst example in mm 112 l39or mt lcllmilllllg cinitpic in nontle do if e in taisnpnsiv39nt in n then H 9 than so a specialization of H consistent with txnmpln elie if e is thine negative tor 1 then H H than to n gellemlilu uu oi H oonsisteni with mnpo if no comment specinljzn oni gcncmlizanon can be found then fail and return H mttmsllmlb in Atbrbrrre Um Example Ali rim Jrr rim or Prinz Rum Rex rye Ext wirivbn Xi m Via a m Xbm 35 39in in mm t77m m x m in An in AL s 39i n Kr rim Jt77bt7 Ni X my in At Nu m s 39i n Kr illgt 07rd in 14 m m in in mi s 39i n Kr rim m m x c m 39i n in No ruir SSS 39io In much gt60 no no in A0 in Kama S m in him 177w m x no in A0 er tank 3 in Nb Ellgt 177w no x no in A0 in Kama S m in Am 177w m x no in in M7 rrni s in Nb Ellgt gtor no xiii m m m m ruir SSS 39io m nru rab 17710 no Xti my in At My Aunt x 39i n Kr rim 0710 my in m m in in mi x 39i n Kr illgt 107M in Woman rr Very large search space No good heuristics Nondeterministic search May need to backtrack Updatingchecking hypothesis is expensive in terms of number of examples Need to re evaluate every modi ed hypothesis on all examples presented onmmm n Xi is positive AiternateXi istrue 7 H1 Inlilal hypothesis vx WIIIWaII x Aiirarhaietxi X2 is negative iaise postive 7 H2 WIIIWaIIx Aiterhaietxi and Patrohsocsohre X3 is positive iase negative 7 HJWIIIWaII x Patrohsocsohre X4 is positive iase negative 7 HIWIIIWaIIx PatrohsocsohrevtPatrohsocnriinha nvsattxi Other Hypotheses r HI WIIIWaII X Pavohsmsome V Patiohsocnill and Wallislmatex1n MD 7 HI WIHWaII X not WaIIEsIImaIaxJMn mnmsllmni rx A i east commitment approach keep ah the hypotheses that are consistent With aii the exampies so far 7 Mo backtracking Probiem howto represent the current set of remaining hypotheses the version space efficientiy 7 Using boundary sets Sset most specific consistent hypotheses 7 every member or s is consistent with all observations so rar and there are no consistent hypotheses that are more speci c Gset most genera consistent hypotheses 7 every member on is consistent with all observations so iar and there are no consistent hypotheses that are more general mnmsllmni 2b Momenmal Initialize the sets 8 and G to the sets of maximally specific and maximally general hypothesis that are consistent with the first observed positive training instance G set of hypotheses that represent disjunction of each single attributevalue pair 1 S the hypothesis which is the conjunction of the attributevalue pairs in the training instance More speci c For each subsequent instance i do v uxxlcssmrmm 22 If i is negative If i is POSitiVe Remove from G the hypotheses which do not match i 7 False negative for Gk too specific Remove from S the hypotheses which match i 7 False positive for si too general Make hypotheses in G that match imore speci c only 39 Make hypotheses in S that do not matCh imore to the extent required so that they no longer match i generall only to the EXtEht reqUiI39Ed 50 that they matChi 7 False negative for si too specific replace by immediate 7 False positive foer too general generalizations Remove from G any element that is no longer more general than some member of S Remove from S any element that is no longer more speci c than some member of G Remove from G any element that is more speci c than some other member in G Remove from S any element that is more general than some other member in S v uxxlcssmrmm 21 v uxxlcssmrmm 24 Termination One hypothesis is le in the version space 235101135quot 25143 391 5143 391 indicatingthat it is the desired concept de nition my We 7ng my blue The version space collapses with either 3 or G demle 1980 decode 1970 decode 1990 becoming empty indicating that there are no type Economy type Spam type Economy consistent hypothesis for the given training set Jr 0 t on tn USA on n Jo on The algorithm runs out of examples With more We Chm mam than one hypothesis le can use the result for my blue 0010 WW classi cation if all agree ne otherwise can use mm 1980 decode 1980 majority vote type Economy type Economy z a Jopon Honda Blue 1980 Economy x x Blue gtltw x5 gt9 x2 xamp X Economy E G G In 12 1 lo 15 S Japan x2 Blue gtltw Economy 5 Japari Honda Blue 1980 Economy W T G 1970 S E USA Chrysler Blue 1980 Economy 111m Wm WM 1 Wquot G 13 an x Blue x x Ja an x x x 1HondL 1 w W Blue 26 5 Economy 239 39 w 5 p 39 239 3 w 1 51 1980 5 x x x x4 Economy Note did not include Japan 12 x xv 15 8 Japan X2 Em xw Economy 3 Japan Honda Blue 1980 Emmy E G E Japan Honda White 1980 Economy G Japan X X X Economy 8 Japan X2 X x Economy 13 Japari Toyota Blue 1990 Emmy G 1 12 Blue 14 15 1 x x 1 Emmy S Japari x Blue 10 Econonry common 1 mnmsllmm m Elegant Algorithm Limited Applicability There are no errors in the training examples Will remoye correct hypothesislrom set as seen as encounters raise negative hypothesis Does not handle unlimited dislunctions in hypothesis space Extensionsallowlimitedlormsoldisiunclion Generalization Hieararchy or more general predicates that represent disjunction mam z How do we knowthe h pothesis h is close to the target functionfif we on t knowwhatfis Sample Complexity Can we decide how many examples We need to train on Underlying principle An hthat is serlousl Wron Will almost certainly be found out With high proba ility a er a small number of examples An h that is consistent with a large set of training examples is unlikely to be seriously Wrong Probably Approximately Correct PAC Learning Stationary assum ption training and test data drawn randomly from same population of examples using same distribution uumiesumm 17 Compare Hypothesis f To correct concept F Vie Probability of misclassiiyirig an instance Probability of instance being in he F x not equal Fx Hypothesis is good to extort it classifies instances correctly mam 3i Hypothesis approximatdy correct lF A i EFeEPri 5e Wham Accuracy param eter Hypothesis Probably Approximately Correct IF A PAC Pu EFQEPOA gt6 lt6 Con dence parameter uumiesumm 11 Hypothesis space showing the Eballquot around the true function f Assume worst case All herr hbadH Have Error gt E Spite ofpossible hypotheses Upperboundis Phn agrees wrthM examples sUS M For lHl hypotheses probability of some it E Hbeing consistent With all M examples 1701bad contains a consistent M hypothesis with Mexamples iHi U e termml 3b To guarantee that 15 PAC lilo E s 6 Because E rl l are known can solve for M Blumer et al l l M2ELN3LNlHD Given aces Any hEHconsistentwrthMexamples M 2 1sPACll Bylooking at H forvanous representauons can determine corresponding Ml giving bound on sample complexity for PAC learning Space of H is 215W n attributes Sample complexity of space grows as 2quot Number of examples is at most 2quot Learning algorithm will no better than a lookup table in terms of PAC Problems occurs because ofworstcase complexity analysis and size of H 7 Do not necessarily rellecllire averagercase sample complexity Can we reduce the size of H and still learn reasonable Boolean functions termml 1 Series of Tests each with conjunction of literals PatronsxSome gt yes Patronsxfull and FriSatx gt yes Nil gt no kDL restrict size of test to k literals More expressive power than depth k decision tree PAClearn in a reasonable number of examples for small k connections per neuron Neuron fires when its inputs exceed a threshold Inputs are weighted and can have excitory or inhibitory effect Individual firing is slow 001 second but bandwidth is very high 10M bitssec The brain performs many tasks much faster than a computer Scene recognition time 1 second Learning and graceful degradation v minimum 1 Computational architectures and cognitive models that are neuraIIyin Sni red Faithful to coarse neural constraints not neural models Large numbers of simple neuronlike processing units interconnected through weighted links They do not compute by transmitting symbolically coded messages quotprogramquot resides in the structure of the interconnections massive parallelismquot and no centralized control Ability to bring large numbers of interacting constraints to bear on problem solving soft constraints Noise resistance error tolerance graceful degradation Ability to do complex multilayer recognition with a large number of inputsoutputs quickly Learning with generalization Biological plausibility Potential for speed of processing through fine grained parallelism v minimum In Automobile automatic guidance systems Credit application evaluation mortgage screening real estate appraisal Object recognition faces characters Speech recognition and voice synthesis Market forecasting automatic bond trading Robot control process control Breast cancer cell analysis Oil and gas exploration Image and data compression mxu imrinv Inpul m vinmsixmim u mnmsllmlb u Processing units computawaightad suni oltheir inputs and then apply a threshold lunction 1 I 1 1 1 Wui 39 input output 1 39 links inks in my Ammmompm aStepfunct139on bSignfunct139on cSigmoidfunct139on fumth fumth vinmsixmim u mnmsllmlb u Continuation of Neural Networks lm 196 mummy mummme Ihwhmmnmmcanm as by gm ngn appmpm mm mm gt XOR requires mullihyerneiwurk mmmmm Wu 5 mnmsumm w
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'