Class Note for CMPSCI 683 at UMass(7)
Class Note for CMPSCI 683 at UMass(7)
Popular in Course
Popular in Department
This 11 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 15 views.
Reviews for Class Note for CMPSCI 683 at UMass(7)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Victor Lesser CMPSCI 683 Fall 2004 swan j v 39 The structure of a learning agent 39 Basic problems bias Ockham s razor expressiveness 39 Decisiontree algorithms vmxua lm Z Learning is change within a system that improves its performance This admits a lot of different behaviors but identifies the basic preconditions of learning Leaming systems must be capable of change Learning systems must do something dillerenllyas a result of the change vvmamrm 39 A viable alternative to problem solving 39 Learning can simplify the complexity of problem solving Replace procedural knowledge inferencing search with learned functions and policies 39 Learning increases ef ciency robustness survivability and autonomy of system Key to operating in open environments 39 A learning program can become better than its teacher vmxcamrm d What changes as a result of learning 39 superwsed39earmm r lstnld by a teacher wnat actan is best in a speDlllD situatan How does the system nd out change is Reinforcement Learning needed 7 Gelsleedback about the eenseeuenees ola speene seeuenee olaclions m a Certain swam 7 Can aso be lhoughloi asupervrseo learning vnlh aless How does the system localize the inlormatveleedoatk sgnd problem to nd out what changes are necessary2 Unsuperwsed Learning 39 7 No leeobackaboul actions 7 Learns to predicllulure precepts given its previous precepts What is the mechanism of change 7 Can t learn wnallo on unless it already nasa ulililylunclion s e Learning element modi es performance element in response to feedback Critic tells learning element how well agent is doing Fixed standard of performance element Problem generator suggests actions that will lead to new and informative experiences Related to decision to acquire information enemtor women 7 mnmsixmlm e Goals A direct mapping from conditions on Learn better actions the current state to actions Spaquot quotP Pe m mame e eme Weighting of parameters of multi attribute decision process Wh39ch components f the pemrmance A means to infer relevant properties of element are to be Improved the world from the percept sequence What representation is used for those components Information about the way the world What feedback is available equot quot es39 What prior information is available Allow prEd39dlon 0f fume events v mm nmt a v emigre am is The type of training instances Information about the results of possible actions the agent can take Utility information indicating the desirability a tne beginning data rorine learning task The language used to represent knowledge a Speclflcti alnlng instances must be translated into tnls f W r39d States representation language Actionvalue information indicating the 7 ln sorne prograrnstnetraining instances are in tne sarne desirability of particular actions in particular language as the lnternal knowledge base and tnls step ls states unnecessary A set of o rations on re resentations Goals that describe classes of states whose Pe p achievement maximizes the agents utility a Typical operations generalize orspecialize existing knowledge combine units of knowledge orotnerwlse modifytne prograrn s existing knowledge or tne representation oftne training instances v mm nmt ii v emigre nmt i2 The concept space 7 The operations that dehhe a space ofposslble knowledge structures that is searched to the the appropriate characterization ofthe li ali il g rhstahees and similar problems The learning algorithms and heuristics employed to search the concept space 7 The orderofthe Search and he use of heuristics to glJlde the Search v mm m 13 All learning can be seen as learning the representation of a function Choice of representation ofa function e Traderoff between expressiveness and efficiency ls Whatyou Want representable ls Whatyou Want learnable or examples cost of Search Choice of training data 7 Correctly reflects past experiences 7 Correctly predicts future expehehces How to judge the goodness of the learned function v mm m 15 numerical parameters decision trees formal grammars production rules logical theories graphs and networks frames and schemas computer programs procedural encoding v mm mm 14 0 Importance of Prior Knowledge Prior knowledge can significantly speed up learning process EBL explanationbased learning 0 Learning as a search process Finding the best function 0 Incremental Process online vs offline v mm mm a Let an example be x fx Give a collection of examples off return a function h that approximatesf This function h is called a hypothesis Feedback is relation between x and hx x x could only be approximately correct Noise missing components v duress nmt 7 Many hypotheses h s are approximately consistent with the training set Curvefitting an min rdLii quotnoun A preference for one hypothesis over another beyond consistency is called Bias a inn v duress nmt 13 0 Simple hypotheses that are consistent with data are preferred 0 We want to maximize some metric of consistency and simplicity in the choice of the most appropriate function v duress nmt 19 Restricted Learn based on COHdlllOl iS ofthe situation representation of gic5 whether 0 Wall at a restaurant for a table sentences 7 Euuleanmnctiuns t Takes as input situation described by a set of m properties and outputs a yeslno decision Tree of property value tests 7 Terminals EVE dEEiSiDnS v duress nmt 2n A classi cation decision tree takes as I I I I Alternate Price input a situation described by a set of Bar Rainquotg attributes and returnsa decision FriSat Reservation Hungry Type French Italian Can express any boolean function of the Patrons None Some Thai Burger input attributes Full WaitEstimate 010 1030 3060 gt60 How to choose between equally consistent trees uumcsiwzm 1r uunmsixmim 12 Construct a root node tnat includes all tne examples tnen for eacn nodei minim Gm EWP M B p m M W x n m I w W i iitnere are both positive and negative examples 39 quot393 cnoose tne best attribute to split tnern Ki h M Nu m Sum 335 M In quh 7710 Y X h M Nu m mil 3 M Na Hui 1177017 No 2 faumeexamp esareposweganweryesmo r iv m in in mm t l in nuvgrr nrin m 3 iftnerearenoexamplesioracasenoobseryed it in in in in M f in in W W in examplesltnen cnooseadefaultbasedontne majority 39Vquot 9 f 3quot fl is f Lquot 2397quot yquot classification attne parent in M i n inmr n n m r r X N m w Nquot W i m w W W Nquot oaseoirammgunderhungryyesaiternateyes x iv r in in mm it in in TM nrin m 4 iftnerearenoattributesleftbutwenayebctn pos and X in M m M M f M in W M in negexamples tnismeanstnattneselectedieatures 39Vquot 6 quotf 39Nquot 5quot l 3 quotf39 a 3quot arenotsuificientforclassification ortnattnereis error 1 in n i nnr n i m r in x m m m m in r in w W W m Ml emmp es WSWEJWWVOtel uumcsiwzm 2 uunmsixmim a A perfec attribute divides the examples intn sets that are all positive and negative X1 X3 X4 X6 X8 X12 X1 X3 X4 X6 X8 X12 X2 X5 X7 X9 X10 X11 X2 X5 X7 X9 X10 X11 Patrons None Some Full French Italian That Burger X1 X3 X6 X8 X4 X12 X1 X6 X4 X8 X3 X12 X7 X11 X2 X5 X9 X10 X5 X10 X2 X11 X7 X9 z at Patron57 X1 X3 X4 X6 X8 X12 X2 X5 X7 X9 X10 X11 Patrons Non Some u 39 X1 X3 X6 X8 X4 X12 x7 X11 X2 X5 X9 X10 No Yes Hungry Basic idea is to build the tree greedily Decisions once made are not revised No search Choose most signi cant attributequot to be the root Then split the dataset in two halves and recurse De ne signi cance using information theory based on information gain or entropy Finding the smallest decision tree is an intractable problem women 2 Cannot use decision tree to represent tests that refer to two or more different objects 3r2 Nearbyr2r A Hicerp A Hicer2p A Cheapertpnp New Boolean attribute CheaperRestaurantNearby but intractable to add all such attributes Some truth tables cannot be compactly represented in decision tree 7 Paritylttnclion ieoiinst it aid oniyitan eiien nonoei otinoiits aiet exoonenoaiiyiaioe decision oee will he needed 7 Maioiityionction whichieoiinst itnioieitiai iiaitotitsmoiits aiet women It Any Boolean function can bewritten as a decision tree VI PatronsrFul0 A WaifEsfimafer1030 A HungryrN WilliVaifr Row of truth table path in decision tree 2quot rows given n literals 27quot functions ointment on v 39 i 391 ilt J inter Expected amount of information provided by an attribute 7 Similaftn tne concept olvaltte oi penect inioiination Amount of information content in a set of examples V is the possible answers p positive n negative zoo PltVgtgtgl7vtoogiPlto Example 12 cases dpos dneg information tbil P P P P P P i if to iilo i Wm W mm m saw v number ofallribules ofAllribule A r2matnd2rA f i AVA doiimsmmi PM Pt PM u mm mm u u mi r u WW 2 4 6 H a nu no u GMPMNM1 l01lLU lrl054lbtt NEEDquot NAI EVE 12 12 12 66 man an M MI in law m 1 2 2 2 a 1 0 2 7 2 2 7 2 2 a 7 2 2 Z 11 Z 22 4 22 or 71 717 7177 otr My 12 z ziitz 4 41z 4 4 3 Partition on m gives least Impurity Wumm 1 WNW at How do we measure how close our hypothesis is 3915 W E IES GAS KIM EVES um KIM EVES to Ilr aw m mm m u m u m u m m arm a wquot 39quoti Try ho on a test set mm m slam m mm m t t D 2 t t 2 D 2 2 2 2 Learnlng curve Measure quot0 correct predictions on Wm 23940392 7s the test set as a function ofthe size ofthe training 17mm tun set EYES ARE BETTER ATTRIBUTE vumcsiwzm 3 mnmsixmlm an 9 Lowest on iesi sci n 2n 40 an an mo Training SCI size Almning curve at the decisinn tree algnrithm an mu randmnly gma39zta l Examples in the remnant dnmain The graph summarilzs 2 trials v mmssmm 37 How correct is this 7 Can we even judge this idea 7 Not all attributes used How does the number or examples seen relate to the likelihood of correctness v o vcsm mm a Finding meaningless regularities in the data With enough attributes you re likely to find one which captures some ofthe noise in your data One solution is to prune the tree Collapse subtrees which provide only minor improvements Using information gain as a criteria v mmssmm 39 Handling examples with missing data Add new attribute value unknown nstantiated example with all possible values of missing attribute but assign weights to each instance based on likelihood of missing value being a particular value given the distribution of examples in the parent node Modify decision tree algorithm to take into account weighting v o vcsm mm on Handling multivalued large attributes and classification 0 Continuousvalued attributes Discretize Need another measure of information gain 39 Example 5y 555 Preprocess to find out which ranges Information gain measure gives inappropriate glve the m St USEfUI mformatlon for indication of attributed usefulness because CIaSSIfIcatlon purposes of likelihood of singleton values 53quot ratio Incremental construction Gain over intrinsic information content v mew m 41 v mm m 42 The version space algorithm Neural Networks v mew m 43
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'