Class Note for CMPSCI 683 at UMass(13)
Class Note for CMPSCI 683 at UMass(13)
Popular in Course
Popular in Department
This 12 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 12 views.
Reviews for Class Note for CMPSCI 683 at UMass(13)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Victor Lesser CMPSCI 683 Fall 2004 39 Example 1 You consider buying a program to manage your nances that costs 0100 There is a prior probability of 07 that the program is suitable in which case it will have a positive effect on your work worth 0500 There is a probability of 03 that the program is not suitable in which case it will have no effect 39 What is the value of knowing whether the program is suitable before buying it vimyuamim Z Expected utility given information o75oo1ooo3o Expected utility not given information 07rsoo1ooo3o1oo Value of information o75oo1ooo3oo7rsoo1ooo3o 100 28025030 vumamrm The general case We assume that exact evidence can be obtained about the value of some random variable El The agent39s current knowledge is E The value of the current best action a is de ned by EUdE maxA Z PResultyADoAE UResutA vmxemim l Vl th the information the value of the new Decision Trees best action will be EUuEEE maxA zPResuItA DoAEE UResultA Decision Networks But E is a random variable whose value is currently unknown so we must average over all possible values e1k using our current belief VPIE E Xy yJEEUwepl E E elk EUu E Markov Decision Processes MDPs vumcsiwzm 5 mnmsixmim a A decision tree is an explicit representation 7 of all the possible scenarios from a given mm mum state minurmangesPill Each path corresponds to decisions made by the agent actions taken possible observations state changes and a nal outcome node minurmangesPi 210K malur manges PE3 400K EUUIIM I 3 tI lEEIK 0 I7 65M 1W1 mill EU raise BMWHKlIE2iml hl ilaw ktF lZJK Similar to a game played against nature EuIMEl7 ZlEMl ilatml im vumcsiwzm 7 mnmsixmim x There are two candidate cars C1 and Ch each can be of good quality or bad quality 7 There are two possible tester T1 oh C1 costs 50 and T2 The chances that the cars are of good quality are 070 for C1 and 080 for CZ oh Clcosts 20 c1 costs 1500 500 below market value but if ltls of 39 T931 TillIquot con rm 990d quamy Wllh bad gnaw repa cost is 700 probability and Will con rm bad quality 599 m mm 05 With probability 065 CZ costs 1150 250 below market value but it it is of bad quality repalrcostls l50 Test T2 will con rm good quality with 7 250 gain or 100 gain probability 075 and will con rm bad quality Buyer must buy one of the cars and can perform at most with probability 070 one test What other information Vimmm v imimmm l DELisinri DnTest r iin lailstziiy c2 elsebiin T2 Vimmm ii it l2 Buyer knows car c1 is good quality 1 Traverse tne tree in a deptnfirst manner 70 Pc1good 7 tap Assign avalueto each lealnode based on the 100333 the aver 2 utility at each chance Buyer kmw car CI i5 9 quality node a 83 Pc1good 8 Ci Calculatethe maximum utility at each decision modemquot markingthe maximum branch Test 1 check qual39ty ofcar 0 Pt1 passlc1 good 8 2 Trace backtne marked brancnesi fromtne roct PUFPaSSICFbad 3935 node down to find tne desired optimai conditionai piani Test oftz check quality of car 01 Findingtnevaiue of perfect or imperfect PitfPBSSICfQOOd 75 information in a decision tree Ptpasslcbad 3 vumcsiwzm u mnmsixmlm u Case3 Case1 7 Pn2gnnanzram Pmiunmmzfaiipmunnd7 7 Pt2daii02gnnd Pu2gnnaPi2raii 7 Uiiiiiy20007is00720 400 7 25XB2Pt2iaii 7 Nnrrnaiize 234 1434 warmed Case2 7 50 7 mattedman pcitza0 7 0000140071150720 230 170cigm70 3 Case4 7 Utiiity 2000i 5007700720 7220 Expected Utility of Chance Node of 1amp2 7 7 x480 3x7220 270 7 PE2badQjaii 7 P0 aii02bad P02badPt2iaii 7 7x2 i4P0daii 7 M 7 0000140071150720715000 Expected Utility of Chance Node of 3amp4 2m xn 7 59 SOMiXBO 1685 vumcsiwzm is mnmsixmlm m What is the decision if Decide to do test 12 It comes out false Do you buy c1 or c2 Ecltest t2iail Expected Utility of Chance Node of 1amp2 270 A model of sequential decisionmaking developed in operations research in the 1950 s Allows reasoning about actions with uncertain outcomes MDPs have been adopted bythe Al community as a framework for Decisiontheoretic planning engeanetal lBB Reinforcement learning e g Bano et al lean Ec2test t2iail Expected Utility of Chance Node of 3amp4 168 5 vumcsinrzm ii minimum ix A policy is a Choice nlwriat action in Choose at each state 5 nite set of domain states An Optimal Policy isapolicy where you arealways choosing the action that maximizestne return Y utility oltne current state A nite set of actions gt 1 P la state transition function T I f 1 ra reward function can get reward at any point 1 50 initial state Actions succeed With probabilityO 8 and move at right angles The Markov assum pnm39 With probabilityO 1 remain in the same postion when Hall XMSM sa PM Xma there is a wall Actions incur a small cost 0 04 vumcsinrzm w minimum in Decision networks or in uence diagrams are an extension of belief networks that allow for reasoning about actions and utility 0000 The network represents information about the t agent s current state its possible actions the SOlU loll is a SlmPIe pain solution is m solution is a ossible outcome ofthose actions and their detaIHIHISIIE acycIIc glaph cycIIc glaph p Y Nondetelmlnlstlc Allows lol lnllnlte quotmy Based on 35110quot sequence 01 OUIEOIHES EEIIOH oi 1 Botalrtl dl quot Decision trees are not convenient for representing domain knowledge Ram Requires tremendous amount of storage t Multiple deoisions nodes expands liee i Dooiioalion oi knowledge aiono diiieienl oalns WeatherReport O 7 Joint Pmbablllty Dislnoolion VS Bayes Net Generate decision tree on the y from more b l Utlhty economical forms of knowledge Um re a Depthfirst expansion oitree for computing optimal deemquot Parameters PRaini PWeatnerReporthaini PWeatnerReportleaini UtilityRainiUmbrella vumcsiwzm 2 ooiinosmn oi Chance nodes ovals have CPTs conditional probability tables that depend on the states of the parent nodes chance or decision Decision nodes squares represent options available to the decision maker Utility nodes Diamonds orvalue nodes represent the overall utility based on the states of the parent nodes 1 The directed graph has no cycles 2 The utility nodes have no children 3 There is a directed path that contains all of the decision nodes 4 A CPT is attached to each chance node specifying PAparentsA 5 A real valued function over parentsU is attached to each utility node 25 27 Causal knowledge about how events influence each other in the domain Knowledge about what action sequences are feasible in any given set of circumstances Lays out possible temporal ordering of decisions Normative knowledge about how desirable the consequences are v mm m za Links into de sion nodes are called information linksquot and they Indicate that the state ofthe parent is known prior to the decision The directed path that goes through all the decision nodes de nes a temporal sequence of decisions It also partitions the chance var39ables into sets I0 is the vars observed before any decision is made I1 is the vars observed after the rst and before the second decision etc Iquot is the set of unobserved vars The noforgettingquot assumption is that the decision maker remembers all past observations and decisions Non MH RW Assumption v mm m 2a 1 Set the eVidehce variables forthe current state 2 For each possible value of the decision node a Set the decision node to that value b Calculate the posterior probabilities for the parent hodes ofthe utility hode c Calculate the expected utility forthe action 3 Return the action With the highest utility Similar to Cutset Conditioning of a Multiply Connected Belief Network mumimin1nmasonicquotinraniisinwniganonmgn constructinnnighl vmmm 22 mmicsinmm an Two months before the harvest of a wheat eld the farmer observes the state 0 of the crop and he observes whether it has been attacked by mildew M If there is an attack he will decide on a treatment 6 with fungicides There are five variables 0 fair f not too bad n average a good g M no no little I moderate m severe s H state on plus M rotten rbad b poor p 00 observation of Q imperfect information on Q 0M observation of M imperfect information on M QEQQ vmmm 3i mmicsinmm 2 39 Asingle decision nodeD may have links to some chance nod Aset oi utility quotnations U U over domains X X Goal nd the decision d that maximizes EUDI1 v Howtosowesuch piohlems using astandani aayesian netwoik package Need a more complex evaluation technique since generating a policy mmizsixmim at fTnotest a Buyiiauyzi fTdotestt1 a iiiipass Buyi eise ii ii daii Buyz a iiiipass BuyZ eise ii ii daii Buyi a Buyi 7 mm fTdotestt2 7 Same as above A POLICY IS A SEQUENTIAL SET OF DECISIONS EACH POTENTIALLY BASED ON THE OUTCOME OF PREVIOUS DECISIONS Basic idea Ross Snacntei Peiioiin a sequence oitiansioimations in the diagram that pieseive iiie nptimai pniicy and itsvaiue uniii oniy iiie UTILITV node remains r Snilato ideas Mtiandonuaion into polytiee Four basic vaiueutiiityEreserVing reductions Barren node removai Cnance node removai marginaiization Decision node removai maximization 39 Arc reversai Bayes ruie mnmsixmim ad nodes to 1 bu Let X represent a subset of nodes of interest in an influence diagram Let Xk represent a subset of evidence nodes We are interested in mpg 9 A node is barren if it has no successors and it is not a member anI 1er The elimination of barren nndes does not affect the value of Fwy 9 mmwm ax Nude i diremiy imked in mm nude v nodes connectedto lccv co ncltvgt imam V WWW and nail connected tnottov Assume null o 10 How mmwm W i Hamil Nude Removal A as Barman mum Imch mng l c a Rmnnval into Value Nude hyExpectntion x l g mw Given an in uence diagram containing an arc from i to j but no other directed path from i to j it is possible to to transform the diagram to one with an arc from to i lfj is deterministic then it becomes probabilistic vmlvalxmpm T wmx Axum m m vi mums r t u lunmsi lllt u l Arc Reversal i XPaAPa V VFAlvl39abl XPaBPlt1 AJ CNc CMCQ FQWONHI u PlA l BX 1Zl Pill lAYZl l tl l XYVI iB lX ZJ l arents of B u OWN arents PaAPaB parents of Awho are notp 97 mummy u Deci ion Example 1 Rmm XY Am 1 Remuval olx by Expulaliun in v 1 Removal oID by maximizmiuu hum V J g1 in you Lhe uplimal pulley furD gmquot y mm P x V gt 7 m V j 3amp9 vw mum LHJult m mmnm
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'