### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Artificial Intelligence Prog CS 662

USF

GPA 3.7

### View Full Document

## 22

## 0

## Popular in Course

## Popular in ComputerScienence

This 250 page Class Notes was uploaded by Michele Herzog on Thursday October 29, 2015. The Class Notes belongs to CS 662 at University of San Francisco taught by Christopher Brooks in Fall. Since its upload, it has received 22 views. For similar materials see /class/231232/cs-662-university-of-san-francisco in ComputerScienence at University of San Francisco.

## Reviews for Artificial Intelligence Prog

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/29/15

ALA L 1 1 Artificial Intelligence Programming Utility and Decision Networks Chris Brooks Department of Computer Science University of San Francisco Department at Computer Science 7 University of San Francisco 7 p 12 190 Making Decisions 3 Knowing the probability of events or outcomes is only half the battle 6 Agents are really interested in maximizing performance 9 Often performance can be captured by utility Utility indicates the relative value of a state Department of Computer Science University of San Francisco p22A 191 Expected Utility An action might lead to one of 5 states 51 52 53 s4 55 Our agent has a utility for each of these states Our agent also has a probability that these states will be reached The expected utility of an action is 25 P8U8 The principle of maximum expected utility says that an agent should choose the action that maximizes expected utility Department of Computer Science University of San Francisco p32l 192 Example o Let s say there are two levers a Lever 1 costs 1 to pull With p 04 you win 2 With p 06 you win nothing A Lever 2 costs 2 to pull With p 01 you win 10 with p 09 you lose 1 on top of the charge to pull 6 Should you a pull lever 1 b pull lever 2 c pull neither Department of Computer Science University of San Francisco p42A 193 Example EUIever 1 04 1 06 1 O2 EUIever 2 01 8 09 3 53 EUneither O Lever 2 gives the maximum EU Department of Computer Science University of San Francisco p52A 194 Rational Preferences In order for MEU to make sense we need to specify some hopefully reasonable constraints on agent preferences Orderability We must be able to say that A is preferred to B B is preferred to A or they are equally preferable We cannot have the case where A and B are incomparable Transitivity lfA lt B and B lt C39 then A lt 0 Continuity lfA lt B lt C39 then there is a lottery where the agent is indifferent to getting B and having a probability p of getting A and 1 p chance of getting 0 Department of Computer Science University of San Francisco p62A 195 Rational Preferences Substitutability If agents are indifferent between outcomes then those outcomes can be substituted into lotteries Monotonicity If two lotteries A and B have the same outcomes and I prefer A to B I should still prefer A if the probability of A increases Decomposability Compound lotteries can be decomposed into simpler lotteries These preferences are for the most part quite reasonable and allow an agent to avoid making foolish mistakes Department of Computer Science University of San Francisco p72A 196 Utility Money and Risk Utility comes from economics A Money is often used as a substitute for utility Preferences for money behave oddly when dealing with small or large amounts For example you will often take more chance with small amounts and be very conservative with very large amounts This is called your risk pro le Department of Computer Science University of San Francisco p82A 197 Example Bernoulli Paradox can You have the chance to play the following game a Flip a coin repeatedly until it comes up heads A If it comes up heads on the nth trial you win 2 dollars a How much would you pay to play this game Department of Computer Science University of San Francisco p92A 198 Example Bernoulli Paradox EU Pn1Un1 Pn2Un2 122 144 188 EU is infinite so we should be willing to pay any price This assumes utility for money is linear Department of Computer Science University of San Francisco p102A 199 Risk o Agents with a concave utility curve are called riskaverse A They prefer the sure thing to a higher EU outcome with a chance of catastrophe Agents with a convex utility function are called riskseeking a An agent may be riskseeking for small amounts and riskaverse for large amounts 6 If the utility curve is linear then the agent is risk neutral Department of Computer Science University of San Francisco p112A 1910 Multiattribute Utility Functions In many cases your agent will need to consider decisions with more than one variable One path might be shorter but have a higher chance of accidents If an agent must consider a vector X 2 511 512512n of attributes we call this a multiattribute utility function Sometimes we can compare multiattribute outcomes directly Department of Computer Science University of San Francisco p122A 1911 Dominance If one outcome is superior to the other on all attributes we say that it strictly dominates the other What about nondeterministic cases Example Driving to work takes between 20 and 50 minutes Taking BART takes 30 to 60 minutes We say that driving stochastically dominates BART If A1 stochastically dominates A2 then EUA1 gt EUA2 This is useful in cases where we cannot give precise costs but we know how costs change as a parameter vanes Department of Computer Science University of San Francisco p132A 1912 Dominance This region dominates A BO l a b e left A is strictly dominated by B but not by C or D 6 right A is strictly dominated by B but not C 6 Assuming Uniform or Gaussian distribution A is stochastically dominated by C Department of Computer Science University of San Francisco p142A 1913 Decision Networks Now that we have a mathematical foundation for decision making we can add decisions into a Bayesian network The resulting structure is called a decision network or an influence diagram Decision networks have three types of nodes A Chance nodes Random variables as in Bayesian networks a Decision nodes Choices an agent can make A Utility nodes Quantify the result of an agent s decision Department of Computer Science University of San Francisco p152A 1914 Decision Networks Airport Site Department of Computer Science University of San Francisco p162A 1915 Evaluating Decision Networks Evaluating a decision network is very much like evaluating a Bayesian network Once a decision node s value is set it acts just like a chance node in a Bayesian network Algorithm A Set evidence nodes as neededavailable A For each possible decision value Set node to that value Calculate posterior probabilities for parents of utility node given that decision a Select the action that generates the highest utility Department of Computer Science University of San Francisco p172A 1916 Value of Information Decision networks assume that all information is available to the agent before it considers an action More often an agent will need to know what questions to ask This is critically important in learning Agents should ask questions that give them useful information Useful means increasing expected utility Department of Computer Science University of San Francisco p182A 1917 Example An agent can recommend to a prospecting company that they buy one of n plots of land One of the plots has oil worth C dollars the others are empty Each block costs Cn dollars Initially agent is indifferent between buying and not buying why is that Department of Computer Science University of San Francisco p192A 1918 Example Suppose the agent can perform a survey on a block that will indicate whether that block contains oil How much should the agent pay to perform that survey P0il in block 171 P I0il in block n 1n A If oil found buy for 071 Profit C 071 2 n 1C39n a If oil not found buy a different block A Probability of picking the oil block is now 1n 1 Expected Profit Cn 1 071 2 Cn n 1 So the expected profit given the information is M n 1 O Q 1 n n n nn 1 n So the company is willing to pay up to 071 the value of the plot for the test This is the valueDgL isco p2021 1919 Value of Perfect Information Let s formalize this We find the best action 05 in general with EUoz mamaEacmms Zi60utcomes UiPia Let s say we acquire some new information E Then we find the best action with EU 04E mamaEactzons Zieoutcomes U ilPulm E The value of E is the difference between these two Department of Computer Science University of San Fran cisco p212l 1920 Value of Perfect Information However before we do the test we don t know what E will be We must average over all possible values of E VP1EZ PEJ39EUO EJ39 EUO jEvaluesE In words consider the possibility of each observation along with the usefulness of that observation to compute the expected information gain from this test In general information will be valuable to the extent that it changes the agent s decision Department of Computer Science University of San Francisco p222l 1921 Value of Information in Learning Value of information can be very useful in learning problems A You saw an example of this with decision trees More generally an agent may have a task to perform and it is able to perform it with efficiency 6 Engaging in learning might improve its performance by an expected amount a If the cost in performance of doing this learning is less than a the agent should choose to learn More generally it might be able to choose among things to learn this provides a quantitative way for making that decision Department of Computer Science University of San Francisco p232A I I M Artificial Intelligence Programming Inference in FOL ChrIS Brooks Department of Computer Sclence UnlverSlty of San Franclsco 101 Propositional Logic Review First order logic concerns bJects and relations behNeen them which form atomic sentences Atomic sentences can be joined with logical connec ives an introduce variables and quantifiers to refer to indefinite objects Example SiblingsBaTt Lisa Vxlikesa Marge gt whisper homer 103 Inference in FOL As with proposi al logic we ll need to convert our knowledge base into a canonical form In the simplest approach we can convert our FOL sentences to propositional sentences We do this by removing quantification we leave in predicates for readability but remove all variables a 100 Propositional Logic Review Recall that propositional logic concerns facts about the world that are true or false These facts can be used to construct sentences using logical connectives A V a gt ltgt Example P131 P23 an convert these sentences to CNF and use resolution as an inference mechanism Resolution is sound and complete 102 Inference in FOL Can we do the same sorts of inference with First order logic that we do with propositlonal logic Yes with some extra details Need to keep track of variable bindings substitution 104 Removing quantifiers Universal Instantiation we can always substitute a ground term for a variable This will typically be a term that occurs in some other sentence of interest VxLivesInx Springfield gt kn0w5x Homer Since this is true for all 5 we can substitute cc Bart LivesInBart Springfield gt knowsBart Homer The classical approach make this substitution for every object in our doma39n 105 Removing quanti ers V Existential lnstantiation we can give a name to the object that satisfies a sentence HzLivesInz Springfield A knowsz Harmer We know this must hold for at least one object Let s call that object k LivesInUc Springfield A knowsk Homer k is called a Skolem constant k must be unused gives us a way of referring to an existential object Once we ve removed quantifiers we can use propositional inference rules navammmoumvluschm imamms WWW 107 Uni cation v The key to unification is that we only want to make substitutions for those sentences that help us prove For example if we know VzLiiJesInz Springfield A W0rksAtzP0werPldnt gt knowszHcrmer VyLiiJesIni Springfield W0rksAtltMrSrnithers PowerPldnt We should be able to conclude knowsltMrSrnithers Harmer directly Substitution zMrSrnithersyMrSrnithers navammmoumvluschm imamms mpr 109 Uni cation Our inference process now becomes one of finding substitutions thatwill allow us to derive new sentences The Unify algorithm takes two sentences returns a set of substitutions that unifies the sentences WorksAtxPmuerPlant kasAtHomrPmuerPlant pruduces o mer WorksAtxPmuerPlant WorksAtHo merg pruduces wHo mergPmuerPlant WorksAtxPmuerPlant kasAtFutH2rOfBartg pruduces wFutherOfBartgPmuerPlant WorksAtxPmuerPlant WorksAtHo merx falls r x can t mm in bath Humer and F39uvverF39lant Devanquotnavcmvluscuuinnuenwavsn mpr 106 Propositionalization We can replace every existential sentence with a Skolemized version For universally quantified sentences substitute in every poss ble substitution This will in theory allow us to use propositional inference rules Problem very inefficient This was the state of the art until about 1960 Unification of variables is much more efficient navamumvcmvlmschmrumwwmyn mmrm 108 Generalized Modus Ponens This reasoning is a generalized form of Modus Ponens Basic idea Let s say we have An implication of the form P1 A P AA E gt Q Sentences Pf 9 P a set of substitutions such that 1 Flip lanaPTi We can then apply the substitution and apply Modus Ponens to conclude Q this technique of using substitutions to pair up sentences for inference is called uni cation mmnmmwusum imamms mwp mm 1010 Uni cation This last sentence is a problem only because we happened to use as in both sentences We can replace 1 with a unique variable say 121 is one sentence This is called standardizing apart mmnmmwusum imamms mwp am 1011 Uni cation V What if there is more than one substitution that can make two sentences look the same S1bl1ngBmtz S1bl1ngy 2 can prOdUCeBmtyzz OI39 zBmtyBmt2Bmt the first unification is more general than the second it in nts a es fewer commitme We want to find the most general uni er when performing inference This is the heuristic in our search DenimImvcmvuiscuurunuenwavyn WWW 1013 Forward Chaining Basic idea Begin with facts and rules implications Continually apply Modus Ponens until no new facts can be derived Requires de nite clauses Implications with positive clauses in the antecedent Positive facts DenimImvcmvuiscuurunuenwavyn WWW 1015 Example The cuuntry er Nana nas sume mi iie Cuiuneivvestwnu is an American Prove that West is a criminal is a crime fur an American re seii Weapunstu nesriie natiuns 1 Amemana A Weapony A Hostde z A Settsxyz s Cmmma x Nunu has missiles 2 3x0wnsNo39nox A Mtsstl x Use Existentiai Eiirninariun tn substitute M1 fur w 3 OwnsNmoM1 4 MissdeUWt DenimImvcmvuiscuurunuenwavyn raemepnn 1012 Uni cation Algorithm To unify two sentences proceed recursively If either sentence is a single variable find a unification that binds the variable to a constant v Else call unify in the first term followed by the rest of each sentence S1blmgz Bart A Plug53111 z and S1bl1ngL1say We can unify S1bl1ngzBart and S1bl1ngILsaygt with zILsayBmt The process of finding a complete set of substitutions is a search process State is the list of substitutions Successor function is the list of potential unifications and new substitution lists anammannusum emuass mm m 1014 Forward Chaining Algorithm whlle lt1 for rule 1n rules 1f can unlfyrule facts 1 assert consequent facts 1f no rules flred return anammannusum emuass mean ism 1016 Example Prove that West is a criminal All Nunu s missiles Were suld tn it by Culunel West 5 Mtssde w A OwnsNo39nox gt SeltsWestxNLmo Missiies are Weapuns 6 M155dewgt Weapma Ari enemy qumerica is a nesriie natiun 7 EnemyaAmema gt Hoszum West is American 3 AmemanWesz Nunu is an enemy qumerica 9 EnemyNo no Amema We want to forward chain until we can show all the antecedents of 1 anammannusum emuass mean ism 1017 Example V Algorithm Repeatedly fire all rules whose antecedents 39 f39 d are satIs Ie Rule 6 matches with Fact 4 zMl Add Weap0nM1 Rule 5 matches with 3 and 4 zMl Add SellsWest M1 Nona Rule 7 matches With 9 zNono Add H05tll ltN07L0gt Iterate again Now we can match rule 1 with zWestyM1zN0n0 and conclude CriminaKWest ammmwsuaeumums mammal 1019 Backward Chaining Basic idea work backward from the goal to the facts that must be asserted for the goal to hold Uses Modus Ponens in reverse to focus search on finding clauses that can lead to a goal Also uses definite clauses Search proceeds in a depthfirst manner ammmwsuaeumums museum 1021 Backward Chaining Example To Prove 1 Criminal West To prove 1 prove 2AmemcanWestgt 3Weap0ny4Hostllez5SellsWesty z 2 unifies with 8 To prove 3Weap0ny4Hostllez5SellsWesty z 6 unifies with 3zM1 To prove 6MlsslleM1 4H05tllez5Sell5West M1 2 We can unify 6 with 2zM1 and add OwnsN0n0M1 to the KB To prove 4H05tll ltlgt53 llSW StM1Zgt To prove Host lez prove EnemyzAmeMm To prove 7E7LemyzAmerica535llsWestM1z6MlsslleM1 ammmwsuaeumums mamas vm rululy llls man nu vvulu chaining Forward chaining is sound since it uses Modus Ponens Forward chaining is complete for definite clauses Works much like BFS This basic algorithm is not very efficient though Finding all possible unifiers is expensive Every rule is rechecked on every iteration Facts that do not lead to the goal are generated navammmoumvluscuu emuvisa mmmapm 1020 Backward Chaining Algorithm goals goalitojrove subsalaualonills a whlle not emptygoals goal a goalsdegueue unlfygoals subsclcuclonillsc foreach sentence 1n KB 1f unlfyconsequentsentence q goals pushantecedents sentence navammmoumvluscuu emuvisa muckwrvm 1022 Backward Chaining Example We can unify 7 With EnemyN0n0AmerlmzN0n0 To prove 538llSW StM1NOVLOgt6MSS J M1gt To prove SellsWest M1Ncrn0 prove Missile M1 and OwnsN0n0 M1 To prove 80wnsN0n0M16MlssvrleM1 I 8 resolves with the fact we added earlier To prove 6Mls ll M1gt Missile M1 resolves with 5 The list of goals is empty so we are done navammmoumvluscuu emuvisa mmapz m 1023 Analyzing Backward Chaining Backward chaining uses depthfirst search This means that it suffers from repeated states Also it is not complete A Can be very effective for querybased systems Most backward chaining systems esp Prolog give the rogrammer control over the search process including backtracking mmmempmeueeumums mammal 1025 Conversion to CNF v The recipe for converting FOL sentences to CNF is similar to propositional logic Eliminate Implications Move a inwards Standardize apart Skolemize Existential sentences Drop universal quantifiers Distribute A over v 00500 mmmempmeueeumums WWW 1027 CNF conversion example Skulernize ln this case We need a Skaem function rathertnan a cunstant vatAmmitFta A westw v LMSQGwWD Drup universal AWme A anestavFltagtgtgtgt v Lwesttaltagtagtgt Distribute A ever V ATMawn v Lmsaaa A aLmsaFa v Lwesaaa mmmempmeueeumums WWW 1024 Resolution Recall Resolution in Propositional Logic AvCA aAvB BvO Resolution in FOL works similarly Requires that sentences be in CNF mummenwusum emum mtctrwrvzm 1026 CNF conversion example Sentence Everyune Wnu luves aii animals lS luved by surneune Translatiun VnyAntmly gtlo1esxy e 3yLmsya Eliminate implicatiun Vway Amma y v imsay v ayLmsya Muve negatiun inwards Vxey bAntma y v Lmsay v ayLmsya yaeyaaAmmmy A Lmesxy v 3yLmsya yaeyAmmmy A Lmes v y v 3yLmsya Standardize apart Vx3yAntmaly A Lmes w M v eeLmsea mummenwusum emum WWWn 1028 Resolution Theorem Proving Resolution proofs work by inserting the negated form of the sentence to prove in o the knowledge base and then attempting to derive a contradiction A set of support is used to help guide search These are facts that are likely to be helpful in the proof This provides a heuristic mummenwusum emum Wampum 1029 Resolution Algorithm 1030 Resolution Example 505 useful gets 1 A v W v Sit v H 11 vs t usable all facts m W a mailmanx a eapony a e dew a 05 22 WWW 2 Mtsstl x v OwnsNo now v Seiiswesm NW0 fact SOSpr 3 EmmyxAme tmVHosttlex foreach fact 1n usable Athsmevaeapww re olve fact with usable slmpllfy clauses remove dupllcates and tautologles 50wnsNo noM1 1f clause has no llterals return refutatlon found 6MtsstteM1 untll s Mmemmwvgsz sos 1 s EnemyNo39noAmemm 9 C mmnQMWESZ added mmmmpmeussumums rmpr Devammmmmvluscuu emum mpr 1042 Analyzing Resolution Theorem 1041 Resolution Example Proving v Resulve l and a Add e Resolution is refutation complete if a sentences is 10 dAmmWWest v Mammy v dSetMWeswm v dHosmem unsatisfiable resolution will discover a contradiction Resulve m and 7 Add 11 iWeapo nQLO v SellsWestyz V HosttleQz Cannot always derive an consequences from a set of Resulve ll and 4 Add 12 Mtssd y v SellsWestyz v HosmleQz facts Rewlve l2 and 6 Add ISSettdWesmMm vdHosmea 2 Can produce nonconstructive proofs for existential goals Resulve l3 and 2 Add IA Mtssd Ml v OwnsNo39n0M1V Hostde vwo Prove Hzl39tkesz Homer Will be proven but without an E Resulve 14 and E Add 15 OwnsNo39noM1V HostdeUVo39no answer for Who I is RESD VE 15 and 5 Addis osmmm Can use full FOL rather than just definite clauses Resulve l6 and 3 Add17 EnemyNo39n0Amevtca Resulve 17 and E Cuntradlctldnl mmmmpmeussumums rmpr Devammmmmvluscuu emum mwp m 1033 Ef cient Forward Chaining 1044 Rete Recall that basic forward chaining tries to match every Rete remembers past partial matches of rules and retains rule with asserted facts on every iteration this information across iterations This is known as rules finding facts Only new facts are tested against the lefthand side of a rule In practice this is not very efficient Facts finding rules The knowledge base does not change drastically a ions between iter t 39 Rete compiles a set of rules into a network where nodes A few facts are asserted or retracted at each step in the network represent tests or conditions on a rule s lefthand side Also many rules have similar lefthand sides can matches be combine 2 As facts match these nodes activtion propagates through ork the netw mmmmpmeussumums News Devammmmmvluscuu emum mmpm Arti cial Intelligence Programming M are Inference lus antalagies Chris Eruuks Department Dr Computer Smence Umversily Dr San Franmscu Inference in FOL 1 Can we do the same sorts of inference with Firstorder logic that we do with propositional logic a Yes with some extra details 9 Need to keep track of variable bindings substitution Previously on C8662 a We introduced propositional logic 0 Also three inference mechanisms forward chaining backward chaining and resolution 1 But propositional logic has some serious representationa problems Q So we introduced firstorder logic 0 Can say more complex things a Can we also perform inference a Our goal apply propositional inference mechanisms to firstorder logic Inference in FOL lt1 As with propositional logic we ll need to convert our knowledge base into a canonical form a In the simplest approach we can convert our FO sentences to propositional sentences We do this by removing quantificatio a we leave in predicates for readability but remove all variables n Inference in Propositional logic 0 We talked about three basic mechanisms for performing inference in propositional logic rd chaining Begin with the facts in your KB Apply inference rules until no more conclusions can be drawn a Backward chaining Begin with a fact or its negation that you wish to prove Find facts that will justify this fact Work backwards until you find facts in your KB that support contradict the fact you wish to prove a Resolution typically used with re Jtation Removing quanti ers a Universal Instantiation we can always substitute a ground term for a variable a This will typically be a term that occurs in some other sentence of interest 0 Choose a substitution for a that helps us with our proof a VachesInacSprmgfzeld e knawsas Homer r1 Since this is true for all ac we can substitute as 13m a LmsmBmspmngfzezd e knawsBmt Homer Removing quanti ers a Existential lnstantiation we can give a name to the object that satisfies a sentence 0 ElastesInx Spmngfzezd A knawsas Homer know this must hold for at least one object Let s call that o ject K a LZUESIHK Spmngfzezd A knwsKHmw a K is called a Skolem constant a K be unused gives us a way of referring to an existential object a n e removed quantifiers we can use propositional inference rules Generalized Modus Ponens 1 This reasoning is a generalized form of Modus Ponens a Basic idea Let s say we have a An implication of the form P1 A P2 A A Pz e Q a Sentences PLPQ Pz EL a set of substitutions such that P1 P1P2 P2 P39r Pn n We can then apply the substitution and apply Modus Ponensto conclude Q n this technique ofusing substitutions to pair up sentences for inference is called uni cation Propositionalization a We can replace every existential sentence with a Skolemized version a For universally quantified sentences substitute in every possible substitution a This will in theory allow us to use propositional inference rules 1 Problem veryinefficient a This was the state ofthe art until about 1960 a Unification ofvariables is much more efficient Uni cation a Our inference process now becomes one offinding substitutions that will allow us to derive new sentences 0 The Unify algorithm takes two sentences returns a set of substitutions that unifies the sentences c WovlcsAXzszvPlamWoIcsAtHomzvPompPlant produces l a WowsamszmmWotcsA Homvy produces zHomzvyPowzvplmt a WovlcsAXzFwyPlantWoIcsAtFa hszBav y produces zFathchBavtyPowrlezt o WovlcsAXzszvPlamWo105A Homzvzfallsr x can t mm in bath Hume and PuwevPlani Uni cation a The key to unification is that we only want to make substitutions for those sentences that help us prove things a For example ifwe know a VwaesInacSprmgfzeld A ksAtx PwerPlant s knawsas Homer a VyLwesIny smngfzezd s tMrszthmsP0werPlant a We should be able to conclude knowsMrszthers Homer directly 1 Substitution xMrszthersyMrszthers a m Uni cation a This last sentence is a problem only because we happened to use a in both sentences a We can replace a with a unique variable say am is one sentence a This is called standardizing apart Uni cation a What if there is more than one substitution that can make two sentences look the same a SzblmgBmt as 3217mm 2 Q can produceBmtyacz or asBartyBmt zBmt u the first unification is more general than the second it makes fewer commitments a We want to find the most generaluni er when performing inference a This is the heuristic in our search Forward Chaining Algorithm mne 1 i for rule m rules i 1 can umtymue eeesi i neeu1eltru1ei rt consequent facts 1f no rules bred i etuxn Uni cation Algorithm a To unify two sentences proceed recursively a feither sentence is a single variable find a unification that binds the variable to a constant c Else call unify in the first term followed by the rest of each sentence a swimm m A PlaysSaacx and azbimgLzsey a We can unify Szblmgx Bart and SzblmgLzsay with xLzsayBmt a The process offinding a complete set of substitutions is a search process a State is the list of substitutions a Successorfunction is the list ofpotential unifications and new substitution lists Example The law saysthat ii is a eiime iei ah Ameiieah tn seii weaimhsie hesiiie haiiehs The eeuhiiy of Nana an enemy of Ameiica has same missiles All of its missiles Weie suld in it by CulunelWesthu is anAmeiican a Prove that West is a criminal a it is a eiime iei ah Ameiieahie sellweapunstu hus1ile haiiehs a 1 Ammms A Weaponry A Hosbitzz AS 5z w s Gamma 1 Norm has missiles lt1 2 Ezowns Nmz A Mustlzz Q Use Exis1eritial Eliminatiun in subs1itute Ml fui z o a OwnsUVono Mm M1551Ml Forward Chaining G Basic idea Begin with facts and rules implications 1 Continually apply Modus Ponens until no newfacts can be derived a Requires de nite clauses CL Implications with positive clauses in the antecedent a Positive facts Example a Prove that West is a criminal All Nunu s missilesweie sold in it by CulurielWes1 5 Masha A Ownsmmo z s SeikoVest z None PP Mama s Waponz Ah enemy qumeiica is a hesiiie heiieh ca s Homtez PPPPPPP i E st he is eh enemy qumeiica p a EmmyUVono Amma a We want to forward chain until we can show all the antecedents of 1 0 lt1 0 m 6 Example Algorithm Repeatedly fire all rules whose antecedents 39 f39 d are satIs Ie Rule 6 matches with Fact 4 acM1 Add WeapmM1 Rule 5 matches with 3 and 4 acM1 Add SellsWest M1 Nana Rule 7 matches with 9 acNam Add H05t2leN0m7 Iterate again wwe can match rule 1 with asWestyM1zNm0 and conclude CMmmalWest Backward Chaining Algorithm goals gmjojmve substltutlon use 1 mne not ltempyltgoasm r goal 1 queu foreach saute 1 mfycmsaquentsentence q gt alspush ntecedentssentenceh Analyzing basic forward chaining 6 Forward chaining is sound since it uses Modus Ponens a Forward chaining is complete for definite clauses a Works much like BFS a This basic algorithm is not very efficient though 9 Finding all possible unifiers is expensive a Every rule is rechecked on every iteration 6 Facts that do not lead to the goal are generated Backward Chaining Example a To Prove 1 OmmmalWest a To prove 1 rove 2 AmemwnWest3 Weap0ny4 Hastzl z5 SellsWesty z 2 2 unifies with 8 To rove 3 Weapmy4 H05tzlez5 SellsWesty z a 6 unifies with 3 acM M25521 M14 Hastzl z5 SellsWest M1 2 a We can unify 6 with 2 acM1 and add OwnsNm0M1 to the KB To prove 4 Hastzlez5 SellsWestM1 z a To prove Hastzlez prove Enemyz Amemm To prove 7 Enemyz Ammm 5 SellsWest M1 2 6 M25521 M1 Backward Chaining a Basic idea work backward from the goal to the facts that must be asserted for the goal to hold a Uses Modus Ponens in reverse to focus search on finding clauses that can lead to a goal 9 Also uses definite clauses a Search proceeds in a depthfirst manner Backward Chaining Example 6 We can unify 7 with EnemyN0m7 AmemmxNma To prove 5 SellsWest M1 Nma6 M2552leM1 a To prove SellsWest M1 Nona prove MzsszleM1 and OwnsUVam M1 0 rove a Ownsuvgm M1 6 M255218M1 a 8 resolves with the fact we added earlier To prove 6 M255218M1 u M2552leM1 resolves with 5 The list ofgoals is empty so we are done Analyzing Backward Chaining a Backward chaining uses depthfirst search a This means that it suffers from repeated states a Also it is not complete a Can be very effective for querybased systems a Most backward chaining systems esp Prolog give the programmer control over the search process including backtracking C F conversion example a Sentence Everyonewhuluvesallanwmalswsluvedbysumeune a Tianslalmn VzWyAmmaKy 3 mm y 3yLmsy 1 a Ehrmnalevnphcalmn Q Vz Vy Ammaly v lama w v 3yLmsy 1 a Move negalmn mwams v 3y A7umaly VLovz5zy V3yLovz5y 1 o Vsz imma y A Lov5zy V3yLovz5y 1 o Vz3yAnznwly A Lovzs q y v EyLovz5y 1 a Slaneaymze apan a Vz3yAnznwly A Lovzs q y v 31Lovz5z 1 Resolution a Recall Resolution in Propositional Logic a AVOA AvBBVO a Resolution in FOL works similarly a Requires that sentences be in CNF CNF conversion example 0 Skulemwze in lhws case WE need a Skaem lunaron valhevlhan a constant a VaAmmalFz A imam VLovz5Gzz L Dvup unwevsals G AnimalFz A Lovzs V Louz5Gz Q Dws1nbul2 A we V q AnimalFz v sz5Gz 1 A ammo no v Lou5Gz 1 Conversion to CNF a The recipe for converting FOL sentences to CNF is similar to propositional logic Eliminate Implications Move a inwards Standardize a art Skolemize Existential sentences Drop universal quanti iers Distribute A over v ewewwe Resolution Theorem Proving a Resolution proofs work by inserting the negated form of the sentence to prove into the knowledge base and then attempting to derive a contradiction a A set of support is used to help guide search at These are facts that are likely to be heIp JI in the roof a This provides a heuristic 2472 Neuralnetworks 2473 Perceptrons A network is composed of layers of nodes 3 input hidden output layers in feedforward nets An input is applied to the input units More NeuralNetworks 39 The resulting output is the value ofthe function the net computes Arti cial Intelligence Programing Singlelayer networks perceptrons rethe simplestform fNN Easy to understand but computationally limited Each input unit is directly connected to one or more output units a o cnns Emuks Depavtment ut Cumputev Science Univevsity etSan Fianciscu 2474 Delta rule 245 Example 246 Example 39 The appeal of perceptrons isthe ability to automatically learn 1 inpms 011pm inputs expededuutput M w2 m actualuulpul new Weights their weights in a supervised fashion 39 1915 suppose We W31 1 T 1 u u u n n n U U Th m d1 1 k m d 1 9 learn the majorityfunction D 1 1 1 eweig up a Ing rue IS nown as e 9 am wimmree inpms 0 1 1 1 1 1 D 1 Am a Edam 08 r Firing fn 0a 1 1 1 1 1 1 1 Where D is the training set to is expected output and ad is 139 wj j gt 000the rwise 1 1 1 1 n n 1 n actual output bias 01 0 0 1 0 1 n 1 1 a 7 1 0 1 1 n n n n o o o 0 U 1 U U quot39t39llquot 39htO Inna y a weig s 0 1 0 0 2477 Examp e 2478 Examp e 2479 Examp e 1npu1s expected Umpm M w2 w3 mm mm new we1gn1s mpms expected mm w1 w2 w3 mm mm new we1gn1s mpms expected mm M w2 w3 mm mm new we1gn1s U U U U U U U U U U 1 U U U U U U U U 1 U U U U U U U U 1 1 1 U U U U U U 1 U 1 U 1 1 1 U U U U U U 1 U 1 U 1 1 1 U U U U U U 1 1 1 U 1 1 1 U 1 U U1 U1 U U1 U 2 U1 1 1 U 1 U U1 U1 U U1 U 2 1 1 1 1 1 1 1 1 1 1 1 1 U1 U 2 U1 1 U1 U 2 U U 1 U U U 1 U U U 1 U 1 U 1 1 1 U 1 1 1 U 1 1 U U U U U U U U U U U U U 1 U U U 1 U U U 1 U U 2410 Examme 24711 Examme 24712 Examme 1npu1s Expected mm w1 w2 w3 aema1 umpm new we1gn1s mpms expected mm M w2 w3 amua1 umpm new we1gn1s mpms expected mm M w2 w3 aema1 umpm new we1gn1s U U U U U U 1 U U U U U U U U U 1 U U U U U U U 1 1 1 U U U U U U 1 U 1 U 1 1 1 U U U U U U 1 U 1 U 1 1 1 U U U U U U 1 1 1 U 1 U U1 U1 U U1 U2 U1 1 1 U 1 U U1 U1 U U1 U2 U1 1 1 U 1 U U1 U1 U U1 U2 1 1 1 1 U1 U2 U1 1 U1 U2 U1 1 1 1 1 U1 U2 U1 1 U1 U2 U1 1 1 1 1 U1 U2 U1 1 U1 U2 U U 1 U U1 U2 U1 U U1 U2 U1 U U 1 U U1 U2 U1 U U1 U2 U1 U U 1 U U1 U2 U1 U U1 U2 1 U 1 1 1 U 1 1 U1 U2 U1 1 U1 U2 U1 1 U 1 1 U1 U2 U1 1 U1 U2 U U U U U U U U U U U U U1 U 2 U1 U U1 U 2 U 1 U U U 1 U U U 1 U U 2413 Example inputs expededuulpul w1 w2 w3 actual nulpul new Weights 1 U U U U U U U 1 1 1 U U U U U U1 1 1 U 1 U U1 U1 U U1 U2 1 1 1 1 U1 U2 U1 1 U1 U2 U U 1 U U1 U2 U1 U U1 U2 1 U 1 1 U1 U2 U1 1 U1 U2 U U U U U1 U2 U1 U U1 U2 U 1 U U U1 U2 U1 1 U1 U1 39 At this point the network is trained properly In many cases we would need more iterations to converge on a solution 2446 Multilayer Networks 39 While perceptrons have the advantage ofa simple learning algorithm their computational limitations are a problem What ifwe add another hidden layer Computational power increases 39 one i en layer can represent any continuous unction i With two hidden layers can represent any function Problem How to find the correct weights for hidden nodes 2444 Gradient Descent and the Delta rule Recall that perceptrons are only able to perfectly learn linearly separable functions a This is their representational bias In other cases we can still learn as much as is possible We39ll interpret this to mean minimizing the sum of squared error i E lZZW 7 0907 for d in training set 2417 Multilayer Network Example Output nmts a W was nmts a Wu Input nmts 11 2445 Gradient Descent and the Delta rule a We can visualize this as a hillclimbing search through a space of weights Defining E in this way gives a parabolic space with a single global minimum The delta rule adjusts the weights so that E is reduced at each step By following the gradient in this space we find the combination of weights that minimizes error Use unthresholded output what is the real number computed by the weighted sum of inputs 24718 Backpropagation 39 Backpropagation is an extension ofthe perceptron learning algorithm to deal with multiple layers of nodes Nodes use sigmoid activation function rather than the step function an input 1 539 input C 9 input1 compute 939 we just need 17 9inputl good news here to t We will still follow the gradientquot where 5quot gives us the gradient 2449 Backprogagation Notation 4 a7 output ofthejth hidden unit 01 output ofthe ith output unit tI output for the ith training e ample quot Output error for output no e 239 t1 7 01 f Delta rule for output node 239 5I t1 7 239 putt 17 9inputt 39 Weight updating hiddenoutput WM W711 at a7 51 n rule except that we use use 5quot rather than just the difference between expected and actual 2422 Theory v5 Practice 39 In the definition of backpropagation a single update for all weights is computed for all data points at once 4 Find the update that minimizes total sum of squared error Guaranteed to converge in this case 39 Problem This is often computationally spaceintensive Requires creating a matrix with one row for each data point and i ver ing it In practice updates are done incrementally instead 2420 Backpropagation Updating inputhidden weights ldea each hidden node is responsible for a fraction ofthe error in I t Divide 51 according to the strength ofthe connection between the hidden and outpu node t For each hidden nodej 57 9iwut1 WWW Eteoutputs Win t Update rule for inputhidden weights WW 7 WW 77 o inputk 5 2423 Stopping conditions Unfortunately incremental updating is not guaranteed to conver e Also convergence can take a long time Convergence no change in weights 24721 Backpropagation Algorithm t The whole algorithm can be summed up as While not done for d in trainin set Apply inputs of d propagate forward for node 239 in output layer output 17 output tup 7 output for each hidden node 57 output 17 output Emma Adjust each weight W31 W o x 5 input 24724 Comments on Backpropagation 39 Also works for multiple hidden layers Backpropagation is only guaranteed to converge to a local minimum May not find the absolute best set of weights 39 Low initial weights can help with this i k h work act more linearly fewer minima Can also use random restart train multiple times with different initial weights Z4725 Momentum Since backpropagation is a susceptible to getting stuck 6 Areas where local weight cha Z4726 Momentum 24727 Design issues hillclimbing algorithm it is Implementing momentum typically means remembering what in plateaus update n As with GAs one difficulty with neural nets is determining how was do e in the previous iteration to encode your problem nges don39t produce an t Our update rule becomes c Inputs must be 1 and 0 or else realvalued numbers improvement in the error function Aw n 7 QA I AwHltn 1 0 Same for outputs 71 J 71 1 T V mgz rgrrxzx it swn m bECKpmpagamquot IS me addmoquot or 3 To consider the effect imagine that our new delta is zero we 39 Symm 39c Vanables can be 9W9quot nary encomngs 39 haven39t made any improvemen f Carries the algorithm through minima and plateaus n 39 Momentum will keep the weights moving in the same 39 ldea remember the direction you were going in and by direcuon default keep going that way 39 More complex concepts may require care to represent correctly Mathematically this means using the second derivative Cgsci ggaga y Increases S ep Slze Iquot areas Where gmdlem IS 0 This speeds up convergence 24728 Design issues 2429 Radial Basis Function networks 39 Like some ofthe other algorithms we39ve studied neural nets av m of paramters that must be tuned to get good performance 24730 Radial Basis Function networks One problem with backprop agation is that every node 39 Intuition Each node in the network will represent a portion of contributes to the output of a solution the input space 39 This means that all weights must be tuned in order to minimize 39 Responsible for classifying examples that fall near it Number on 9390b339 error 6 Vanilla approach For each training point lt zfz create a Mum quotddequot quotquotS Noise in one portion of the data can have an impact on the node whose center is o Leam39quotgra e em P me quote wmk o The output ofthis node for a new input I will be W Mix 7 mil quot Inma39 we39gms t Also training times are long Momsmum farm Radial Basis function nets provide a solution to this Training regimen These may require trial and error to determine Alternativel r Where W is the weight and 4 meg y you could use a GA or simulated annealing to figure them out 39 d is a basis function Arti cial Intelligence Programming Belief Networks Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science University of San Francisco p 11 182 Probability Review 39 Probability allows us to represent a belief about a statement or a likelihood that a statement is true 0 P7 am 06 means that we believe it is 60 likely that it is currently raining 39 Axioms O 0 S Pa g 1 O The probability of A B is PA PB PA B O Tautologies have P 1 O Contradictions have P 0 Department of Computer Science University of San Francisco p 21 183 Conditional Probability 39 Once we begin to make observations about the value of certain variables our belief in other variables changes 0 Once we notice that it s cloudy PRaz n goes up this is called conditional probability Written as PRamCloudy b Palb or Pa b PabPb O This is called the product rule Department of Computer Science University of San Francisco p 31 184 Conditional Probability 39 Example PCl0udy 03 39 PRam 02 39 Pcl0udy ram 015 39 Pcl0udy Ram 01 39 P Icl0udy Ram 01 39 P ICloudy Ram 065 0 Initially PRam 02 Once we see that it s cloudy PRamCloudy PW 05 Department of Computer Science University of San Francisco p 41 185 Combinations of events 39 The probability of A B is PABPB What if A and B are independent 39 Then PAB is PA and PA B is PAPB 39 Example 0 What is the probability of heads five times in a row 0 What is the probability of at least one head Department of Computer Science University of San Francisco p 51 186 Bayes Rule 39 Often we want to know how a probability changes as a result of an observation Recall the Product Rule 0 Pa b PabPb O Pa b PbaPa 39 We can set these equal to each other 0 PalbPb PblaPa And then divide by Pa o Pbla W a 39 This equality is known as Bayes theorem or rule or law Department of Computer Science University of San Francisco p 61 187 Monty Hall Problem From the game show Let s make a Deal Pick one of three doors Fabulous prize behind one door goats behind other 2 doors Monty opens one of the doors you did not pick shows a goat Monty then offers you the chance to switch doors to the other unopened door Should you switch Department of Computer Science University of San Francisco p 71 188 Monty Hall Problem Problem Clarification Prize location selected randomly 39 Monty always opens a door allows contestants to switch 39 When Monty has a choice about which door to open he chooses randomly Variables Prize P pA 193190 Choose 0 CA 63 c0 Monty M mA m3 m0 Department of Computer Science University of San Francisco p 81 189 Monty Hall Problem Without loss of generality assume Choose doorA 39 Monty opens door B PpACA7mB Department of Computer Science University of San Francisco p 91 1810 Monty Hall Problem Without loss of generality assume Choose doorA 39 Monty opens door B PpACAmB pmBCA7pA Department of Computer Science University of San Francisco p 10 1811 Monty Hall Problem PpACAmB pmBCA7pA 39 PmBCAapA Department of Computer Science University of San Francisco p 11 1812 Monty Hall Problem PpACAmB pmBCA7pA 39 PmBCA7pA12 PpACA Department of Computer Science University of San Francisco p 12 1813 Monty Hall Problem PpACAmB pmBCA7pA 39 PmBcApA12 PpACA13 39 PmBcA Department of Computer Science University of San Francisco p 13 1814 Monty Hall Problem PpACAmB pmBCA7pA 39 PmBcApA12 39 PpACA13 39 PmBCA PmbCApAPpA PmbCApBPpB PmbCAapCPpC Department of Computer Science University of San Francisco p 14 1815 Monty Hall Problem PltpAIcAvagt 39 PmBCA7pA12 39 PpACA13 o PmBCA PmbCA7pAPpA PmbCA PBPpB PmbCAapCPpC 3 O O U U 39U E 2 39U 939 2 39U 8 H i mbchpB 0 Won t open prize door 0 P mbchpC 1 Monty has no choice Department of Computer Science University of San Francisco p 15 1816 Monty Hall Problem PpACAmB pmBCA7pA 13 39 PmBcApA12 39 PpACA13 PmBCA PmbCApAPpA PmbCAvaPpB PmbCApCPpC12 PpA 13093 Ppo 13 Q PmbcApA12 O O P mbchpB 0 Won t open prize door 0 P mbchpC 1 Monty has no choice Department of Computer Science University of San Francisco p 16 1817 Monty Hall Problem PpCCAmB pmBCA7pC 23 39 PmBCAapC 1 39 PpCCA13 PmBCA PmbCApAPpA PmbCAvaPpB PmbCApCPpC12 PpA 13093 Ppo13 Q PmbcApA12 O O P mbchpB 0 Won t open prize door 0 P mbchpC 1 Monty has no choice Department of Computer Science University of San Francisco p 17 1818 Inference with the Joint Probability Distribution Remember that we can list all combinations of events and use this to do inference This is the joint distribution Hum High Hum High Hum Normal Hum Normal Sky Overcast Sky Sunny Sky Overcast Sky Sunny Rain 01 005 015 005 a Rain 02 015 01 02 39 PRa m PRa m Ham 2 High Sky 2 Overcast PRa m Hum 2 Normal Sky 2 Overcast PRa m Ham 2 High Sky 2 SunnyVPRaz n Hum High Sky 2 Sunny o PRaz n 01 005 015 005 035 Department of Computer Science University of San Francisco p 181 1819 Probabilistic Inference 39 Problems in working with the joint probability distribution 0 Exponentially large table We need a data structure that captures the fact that many variables do not influence each other 0 For example the color of Bart s hat does not influence whether Homer is hungry 39 We call this structure a Bayesian network or a belief network Department of Computer Science University of San Fra ncisco p 19 1820 Bayesian Network 39 A Bayesian network is a directed graph in which each node is annotated with probability information A network has 0 A set of random variables that correspond to the nodes in the network 0 A set of directed edges or arrows connecting nodes These represent influence In there is an arrow from X to Y then X is the parent of Y 0 Each node keeps a conditional probability distribution indicating the probability of each value it can take conditional on its parents values 0 No cycles it is a directed acyclic graph 39 The topology of the network specifies what variables directly influence other variables conditional independence relationships Department of Computer Science University of San Francisco p 20 1821 Burglary example 0 Two neighbors will call when they hear your alarm 0 John sometimes overreacts 0 Mary sometimes misses the alarm 0 Two things can set off the alarm O Earthquake O Burglary Given who has called what s the probability of a burglary Department of Computer Science University of San Francisco p 21 1822 Network structure 39 Each node has a conditional probability table 0 This gives the probability of each value of that node given its parents values 39 These sum to 1 39 Nodes with no parents just contain priors Department of Computer Science University of San Francisco p 22 1823 Summarizing uncertainty 39 Notice that we don t need to have nodes for all the reasons why Mary might not call 0 A probabilistic approach lets us summarize this information in M 39 This allows a small agent to deal with large worlds that have a large number of possibly uncertain outcomes 0 How would we handle this with logic Department of Computer Science University of San Francisco p 23 1824 Implicitly representing the full joint distribution Recall that the full joint distribution allows us to calculate the probability of any variable given all the others Independent events can be separated into separate tables These are the CPTs seen in the Bayesian network Therefore we can use this info to perform computations P1 x2 mm HP parentsi PA E B J M PJAPMAPA B EP BP E 090 070 0001 0999 0998 000062 Department of Computer Science University of San Francisco p 24 1825 Some examples What is the probability that Both Mary and John call given that the alarm sounded O PMA gtk PJA 90 gtk 70 063 What is the probability of a breakin given that we hear an alarm O PBA 095 0001 What is the probability of a breakin given that both John and Mary called 0 PBJM PBJMAAA EPBJMAEVPBJ MA uAA EPBJM IAE This last example shows a form of inference Department of Computer Science University of San Francisco p 25 1826 Constructing a Bayesian network 39 There are often several ways to construct a Bayesian network 39 The knowledge engineer needs to discover conditional independence relationships 39 Parents of a node should be those variables that directly influence its value 0 JohnCalls is influenced by Earthquake but not directly 0 John and Mary calling don t influence each other 39 Formally we believe that PMa7 yCallsJ0hnCallsAlarmEa7 thquake Burglary PMa7 yCallsAla7 m Department of Computer Science University of San Francisco p 26 1827 Compactness and Node Ordering 39 Bayesian networks allow for a more compact representation of the domain 0 Redundant information is removed Example Say we have 30 nodes each with 5 parents Each CPT will contain 25 32 numbers Total 960 39 Joint 230 entries nearly all redundant Department of Computer Science University of San Francisco p 27 1828 Building a network 39 Begin with root causes Then add the variables they directly influence Then add their direct children This is called a causal model 0 Reasoning in terms of cause and effect 0 Estimates are much easier to come up with this way 39 We could try to build from effect to cause but the network would be more complex and the CPTs hard to estimate PEB Department of Computer Science University of San Francisco p 28 1829 Conditional Independence in Bayesian networks 39 Recall that conditional independence means that two variables are independent of each other given the observation of a third variable Pa bc PacPbc 39 A node is conditionally independent of its nondescendants given its parents 0 Given Alarm JohnCalls is independent of Burglary and Earthquake A node is conditionally independent of all other nodes given its parents children and siblings the children s other parents 0 Burglary is independent of JohnCalls given Alarm and Earthquake Department of Computer Science University of San Francisco p 29 1830 Conditional Independence in Bayesian networks 1831 Conditional Independence in Bayesian networks 39 General rules 0 Two nodes are conditionally independent given their parents JohnCalls and MaryCalls given Alarm O A node is conditionally independent of nondescendents given its parents JohnCalls is conditionally independent of Burglary given Alarm O A node is conditionally independent of all other nodes given its parents children and its children s other parents Department of Computer Science University of San Francisco p 31 1832 Inference in Bayesian networks In most cases we ll want to use a Bayesian network to tell us posterior probabilities We observe a variable how do other variables change We can distinguish between query variables things we want to know about and evidence variables things we can observe There are also hidden variables which influence query variables but are not directly observable JohnCalls and MaryCalls are evidence Earthquake and Burglary are queries and Alarm is hidden Updating in Bayesian networks can be quite complex we ll skip over this 0 For more information see R amp N Department of Computer Science University of San Francisco p 32 1833 Enumeration In some cases reasoning in a Bayesian network is still very expensive For example we can rewrite the probability PXe aPX e or Z PX e y where y are hidden variables This is equivalent to summing within the joint probability distribution Consider PBurgla7 yJ0hnCalls MaryCalls ozPBu7 gla7 y JohnCalls MaryCalls This is 05 2E 2A PBE A J M For B true we must calculate PBJ M aZE ZAPBPEPAB7EPJAPMA Department of Computer Science University of San Francisco p 33 1834 Enumeration 39 Problem a network with n nodes will require 0n2quot computations We can simplify by bringing the priors outside the summation 39 aPB 2E PE EA PAB EPJAPMA 39 Still requires 02quot calculations Department of Computer Science University of San Francisco p 34 1835 Enumeration This tree shows the computations needed to determine PB M 0 Notice that there are lots of repeated computations in here PO39i a 90 POW Pji ea 05 05 W para 39 By eliminating repeated 70 01 I computations we can speed things up Pmina 01 Department of Computer Science University of San Francisco p 35 1836 Variable Elimination We can reduce the number of calculations by caching results and reusing them Evaluate the expression from righttoleft Summations are done only for nodes influenced by the variable being summed Consider PBlj m aPB 23 Pe 2a PalB 6 PUIa Pmla v V WWW B E A J The equation has been separated into factors M Once we compute PMa and PM a we cache those results in a matrix fM Similarly for PJa and PJ Ia stored in fj Department of Computer Science University of San Francisco p 36 1837 Variable Elimination 39 PAB E will result in a 2 x 2 x 2 matrix stored in FA We then need to compute the sum over the different possible values of A 39 This sum is Ea fAABEfJAfMA 39 We can process E the same way 39 In essence we re doing dynamic programming here and exploiting the same memoization process 39 Complexity 0 Polytrees one undirected path between any two nodes linear in the number of nodes 0 Multiplyconnected nodes Exponential Department of Computer Science University of San Francisco p 37 1838 Scaling Up 39 Many techniques have been developed to allow Bayesian networks to scale to hundreds of nodes 39 Clustering O Nodes are joined together to make the network into a polytree O CPT within the node grows but network structure is simplified Approximating inference 0 Monte Carlo sampling is used to estimate conditional probabilities Department of Computer Science University of San Francisco p 38 1839 Monte Carlo sampling example Start at the top of the network and select a random sample say it s true Draw a random sample from its children conditioned on true 0 PSprtnklerCloudy true lt01 09gt Say we select False Pc395 Q PRatnCl0udy true lt08 02gt Say we select True O PWetGrassSprtnkler false Rain 2 true lt09 01gt Say we select true 39 This gives us a sample for ltcloudy a Sprinkler S R PVV i 233 Rain WetGrassgt f t 90 f f 3900 As we increase the number of samples this provides an estimate of Pcl0udy Sprtnkler Ram WetGrass In this way we choose the query we are inter ested in and then sample the network enough times to determine the probability of that event occurring Department of Computer Science University of San Francisco p 39 1840 Applications of Bayesian Networks 39 Diagnosis widely used in Microsoft s products 39 Medical diagnosis 39 Spam filtering 39 Expert systems applications plant control monitoring Robotic control Department of Computer Science University of San Francisco p 40 774 Loca 39 For ot seque 39 CSPS quot Fin im 0 Fun 4 Prot Arti cial Intelligence Programming Local Search and GeneticAlgorithms cm Evuuks Depavtment ut Cumputev Science Univevsity utSan Fianeiseu Search her sorts of problems we may not care what the nce of actions is fit this descriptio ding the optimal or satisfactory solution is what39s portant ytorah P ction optimization ein folding gene sequencing The solution is an assignment ofvalues to variables that maxi 39mizes some objective function a In these informat cases we can safely discard at least some ofthe path ion 72 Overview Local Search When is it useful So far the algorithms we39ve looked at store the entire path from Hi cquotmbing search initial state to goal state Simulated Annealing 39 t Genetic Algorithms 773 Local Search This leads to memoryspace issues on large problems For some problems path information is essential 0 Route findin puz zle The solution is the sequence of actions to take know what the goal state is but not how to reach it 775 Local Search 776 Search Landscape A 39 quot 39 Local search is often useful for optimization problems to path information is called a local search algorithm Advanta es39 Constant memory requirement Find parameters such that 000 is maximizedminimizedquot s a Can find solutions in incredibly large spaces 39 This is a search problem where the state spa ce is the combination ofvalue assignments to parameters v lfthere are n parameters we can imagine an n1dimensiona Disadvantages space where the first n dimensions 39 ar to guarantee optimality we might only find a local optimum are the parameters ofthe ension is the objective function r We call this space a search landscape C Optima are on hills 0 Valleys are poor solutions C reverse this to minimize oz function and the n 1th dim Lack of memory may cause us to revisit states or oscillate 777 Search Landscape 771 e mmgmmgum N u m Mm A onedimensional landscap n Rveimlxlmupumch 0 Hillclimbing search The simplest form of local search is hillclimbing search Very simple at any point look at your successors neighbors and move in the direction ofthe greatest positive change Very similar to greed searc i h c he choice thyat myopically looks best Very little memory required Will get stuck in local optima Plateaus can cause the algorithm to wander aimlessly 778 Search Landscape A twodimensional landscape at l I n it llii i39l 7quotl39i3939 ltvt lll39i I 39 s beyund 2 dimensiunthey ve laugh in draw 741 Local search in Calculus Find roots of an equation fz 0 f differentiable t Guess an zjfind fz1 Mm t Use the tangent line to fz1 slope f z1 to pick 2 t Repeat In In a 1 This is a hillclimbing search Works great on smooth functions 779 Search landscapes Landscapes turn out to be a very useful metaphor for local search algorithms Lets us visualize 39climbing39 up a hill Gives us a way of differentiating easy problems from hard problems Easy few peaks smooth surfaces no ridgesplateaus C Hard many peaksjagged or discontinuous surfaces plateaus 7712 Improving hillclimbing HIllclimbing can be appealing quot Simple to code 39r Requires little memo t have a better approach How to make it better Stochastic hillclimbing pick randomly from uphill moves 0 Weight probability by degree of slope 7713 Improving hillclimbing Randomrestart hillclimbing Run until an optimum is reached Randomly choose a new initial state 39 Run again 39 A ern iterations keep best solution I we ave a guess as to the number of optima we can choose an n 7716 Simulated Annealing 1 1 1 s z Amtlalrstzatze w e s l as successorcms c z sale erranaomrchnawm 1 c s better than s s z c bath probablhty pm c s update 1 What is T v What is p 744 Simulated Annealing Hillclim bing39s weakness is that it never moves 39downhill39 Like greedy search it can39t back upquot Simulated annealing is an attempt to overcome this 39 Bad actions are occasionally chosen to move out ofa local optimum 74 7 Simulated Annealing 39 mistake the search and more rarely later in the search We39ll use Tto parameterize this T stands for temperature t Two questions 0 How does T change overtime 9 What39s the probability function wrt T 7715 Simulated Annealing Based on analogies to crystal formation When a metal cools lattice 6 By reheating and recooling Small undoing leads to s form as molecules fit into place a harder metal is formed better solution 7 Minimize the energy in the system Similarly small steps away hillclimbing escape local 0 7718 Cooling schedule 39 The function for changing T from the solution can help ptima is called a cooling schedule The most commonly used schedules are c Linear me old Proportional T7W c Totd lt 1 7719 Boltzmann distribution The probability of accepting a mistake is governed by a Boltzmann distribution Lets be the current state a be the child considered and 0 the function to optimize Pltcgtezpltw 5 gt Consider boundary conditions l0cr sl0tenPc T very high almost all fractions near 0 so Pc near 1 T low Pc depends on loc 7 osl typically small 9 Boltzmann gives us a way of weighting the probability of accepting a mistake by its quality 7722 GA applications Function optimization Job Shop Scheduling 39 Factory schedulinglayout Circuit Layout Molecular structure 39 etc 7720 Simulated Annealing Simulated Annealing is complete and optimal as long as T is lowered slowly enoughquot 39 Can be very effective in domains with many optima 39 Simple addition to a hillclimbing algorit m eakness selecting a good cooling schedule 39 No problem knowledge used in search weak method 7723 GA terminology Chromosome a solution or state Traitgene a parameter or state variable 39 Population a set of chromosomes or solutions 7721 Genetic Algorithms Genetic Algorithms can be thought arch hillclimbing 39 Basic idea Select so r Co sol Re se of as a form of parallel me solutions at random mbine the best parts ofthe solutions to make new utions f Successors are a function of two states rather than one 7724 A Basic GA keRandomPDpulatlon cmld cmldz c 15 c mutate h replace old populat y 2 select random solutlons crossover parent mm on watch new populatlon Parent from pop 2 7725 Analogies to Biology Keep in mind that this is not how biological evolution works 4 Biological evolution is much more complex Diploid chromosomes phenotypegenotype nonstationary objective functions Biology is a nice meta h p or As must stand or fail on their own merits 7728 Crossover Example 39 S1100101110000101001010110 S2001000101110111010110111 5605126 Picklocus9 0567267 5310010111011011101011011145667267 39 S400100010100010100101011010505126 7726 Encoding a Problem Encoding a problem for use by a GA can be quite challenging 39 Traditionally GA problems are encoded as bitstrings Example 8 queens For each column we use 3 bits to encode the row ofthe queen 2 39 10010111000010100101011045605126 t We egin by generating random bitstrings then evaluating them according to a fitness function the function to optimize x Bqueens number of nonattacking pairs of queens max 23 7729 Mutation Next apply mutation 39 With probability m for small m randomly flip one bit in the solution t After generating a new population ofthe same size as the old population discard the old population and start again 7727 Generating new solutions Our successor function will work by operating on two solutions This is called crossov r 6 Pick two solutions at random v First method Fitnessproportionate selection 0 Sum allfitnesses c Pseection of s fitnesss ltotal fitness Pick a random point on the bitstrings locus 39 Merge the first part ofb1 with the second part of b2 and vice versa to produce two new bitstrings 7730 So what is going on 39 Why would this work Crossover recombine pieces of partially successful solutions 39 Genes closer to each other are more likelyto stay together in uccessive ge ratio r is m kes encoding important t Mutation inject new solutions into the population c fa trait was missing from the initial population crossover cannot generate it unless we place the locus within a gene 39 The schema theorem provides a formal proof of why GAs work 7731 Selection 3 How should we select parents for reproduction 39 Use the best n pe t7 quot Want to avoid premature convergence t No genetic variati Also so etimes poor solutions have promising subparts 39 Purely random quot No selection pressure 732 Roulette Selection Rouleffe Selection weights the probability of a chromosome being selected by its relative fitness PW This normalizes fitnesses total relative fitnesses will sum to 1 t Can directly use these as probabilities 7733 Example Suppose we want to maximize fz 7 on 0731 et39s assume integer values of z for the moment Five bits used to encode solution initial population 7734 Example 735 Example 39 Select parents with roulette selection 7736 Example Select parents with roulette selection Replace old population with new population cus and crossover the two strings Choose a random locus and crossover the two strings Apply mutation to new po ulation 0 With a small population and low mutation rate mutations are unlikely 1 Children 01100 11000 New generation 0 001100111011 10000 Average fitness has increased 293 to 439 Children 0110011000 Children 1000011011 Maximum fitness has increased 576 to 729 7737 What s really going on here T e subsolutions11m and m11 are recombined to produce a h better solution 9 There39s a correlation between strings and fitness Having a 1 in the first position is correlated with fitness This shouldn39t be shocking considering how we the input 39 We39ll call a 1 in the first position a building block 39 GAs work by recombining smaller building blocks into larger building blocks 7740 Propagation of schemata 39 Suppose that there are m examples ofa schema H in a population of size n at time t mHt Q Strings are selected according to relative fitness p 7 At time 2 1 we will have mHt 1 MHt m M where f is the average fitness of strings represented by this schema n other words schema grow or decay according to their fitness relative to the rest ofthe population Above average schemata receive more samples Below average schemata receive fewer samples But how many more or less ncoded 7738 Schemata We need a way to talk about strings that are similar to each other Add quot39 don39t care symbol to 01 t A schema is a template that describes a set of strings using 0113 111 matches 11100 11101 11110 11111 0 11 matches 00110 00111 01110 1111 9 0 1 matches 00001 00011 00101 00111 01001 01011 01101 01111 Premise Schemata are correlated with fitness In many encodings only some bits matter for a solution chema a give us a way of describing all the important information in a string 7741 Propagation of schemata Assume that schema H remains above average by am amount a We can rewrite the schema difference equation as mHt1 1 mam t Starting at t 0 we get 7V1HtmH01ct This is a geometric progression Also the compound interest formula Reproductio n selects exponentially more fewer above below average schemata 7739 Schemata GAs actually process schemata rather than strings Crossover may or may not damage a schema quot11 vs 0 1 39 Short highly fit loworder schemata are more likely to survive Order the number offixed bits in a schema t Building Block Hypothesis GAs work by combining loworder ch ata into higherorder schemata to produce progressively more fit solutions 7742 Schemata and crossover 39 Selection is only halfthe story Schemata with da longer defining length are more likely to be maged by crossover u Psmmz 3 1e t We can combine this with the previou s equation to produce mHt1ZmHt lli quot1775 1 ln words short aboveaverage schemata are sampled at exponentially increasing rates f This is known as the schema theorem 7743 Schema in our example Consider H7 1 1 H2 10 H3 1 0 in our previous 5 ng Fitness Relative Fitness m 1m 144 IIEIEIEI 492 example man u U55 IDEIII u any Tutal I u Parents 2Iwme34 4 2 MIN are bath mslams cle Children mmnmummmmuuuu 4 quotstancesufH 39 Schematheuvem piedidsm 7746 Elitism 39 In practice discarding all solutions from a previous generation can slow down a GA 4 Bad draw on RNG can destroy progress You may need monotonic improvement v Elitism is the practice of keeping a fraction ofthe population from the previous generation use Roulette selection to choose a fraction ofthe population to carry over without crossover for learning rate 7744 Theory vs Implementation Schema Theorem shows us why GAs work t In practice implementation details can make a big difference in the effectiveness of a GA 39 This includes algorithmic improvements and encoding choices 7747 Knowing when to stop n some cases you can stop whenever your GA finds an acceptably good solution In other cases it39s less cle r now we39ve found the best solution to TSP 39 Stop when population has 39converged39 39 it out mutation eventually one solution will dominate the o ulation After 39enough39 iterations without improvement 7745 Tournament Selection t Roulette selection is nice but can be computationally nsive Every individual must be evaluated C Two iterations through entire population t Tournament selection is a much less expensive selection mechanism For each parent choose two individuals at random 39 Higher fitness gets to reproduce 7748 Encoding 39 The most difficult part ofworking with GAs is determining how to encode problem instances c Schema theorem tells us that short encodings are good Parameters that are interrelated should be located near each other t N queens Assume that each queen will go in one column Problem find the right row for each queen l Nrows requires log N bits 39 Entire length of string N log N Arti cial Intelligence Programming Intro to Rythan Chvls Evuuks Department u Computer Sclence Unwevslly u San Fianmscu Invoking Python Programs 1 There are four ways to run a Python program a Interactivey From the command line 0 As a scri t a From another program What is Python a Python is a Highlevel a Objectoriented 0 Free opensource o Dynamically ped q Has a large collection of utility libraries ct garbagecollected o Mixable works nicely as a glue language 0 Easy to learnuse Python Implementations o lusrbinpython on all unix machines and on OS X a IDLE and PythonWin on Windows 0 Machthon on Macintosh a Also forAmiga PalmOS BeOS etc a BoaConstructor is an opensource Python IDE CL Eclipse has a Python plugin PyDev Some uses of Python n Things that Python is good at a System utilities a Python has hooks into standard OSlevel services such as files processes pipes etc a GU Is a Python has interfaces to Tk Qt and WxPython a Embedded integration a Python has hooks that allow it to callbe called by C and C code and also COM interfaces a Rapid Interactive Development 2 Like Lisp Python makes it easy to quickly put together a working program a Scriptin a Python has utilities for CGI Database access HTTP Sockets FTP POPSMTF Mclpgrnsin gvec Python Program Structure 9 Programs consist ofmodules o A module corresponds to a source file 0 Modules contain blocks of statements a Statements create and process objects Basic Python a Python has a nice set of builtin data types a Dictionaries 1 Files u Using builtin types makes your code easier to writemaintain more portable and more efficient Strings 1 One of Python s strong suits is its ability to work with strings a Strings are denoted with double quotes as in C or single quotes s1 s2 concatenation s1 3 repetition s1i indexing s1ij slicing s11 last character a parrotquot dead formatting for char in s1 iteration pppnpp Numbers a Numbers work like we d expect Q There are integers equivalent to longs in C 1 31311 4000 q There are long integers which are ofunlimited size 31111L12345l at There are floats equivalent to doubles in C or Java 123 31e5 Q There are Octal and Hexadecimal representations as in C 0155 0x3af5 Q There are complex numbers as in Lisp Matlab etc 304j Strings 0 Strings are immutable sequences to change them we need to make a copy 0 Can t do s13 c a Must do s2 s102 c s13 a As in Java making lots of copies can be very inefficient lfyou need to do lots of concatenation use join instead a We ll return to efficiency issues throughout the semester Mathematical Operators Python has all the usual mathematical operators l ltlt abs rand amp This makes Python a very handy desk calculator 1 Operations are coerced to the most specific type a 3 40 l 2 will produce a float a Common error since variables do not have declared types be carele ofrounding 32 1 a O a a M or pow for expon entiation a 0 Lists Python has a flexible and powerle list structure Lists are mutable sequences can be changed in place Denoted with square brackets I1 1234 Can create nested sublists l2 12 34 5 6 7 l1 l2 concatenation l1 4 repetition 135 13 15 slices append extend sort reverse built in Range create a list ofintegers PPPPPPPPP Dictionaries a A Dictionary is a Python hash table or associative list a Unordered collections of arbitrary objects 2 d1 new hashtable d2 spam 2 eggs 3 a Can index by key d2 spam 9 Keys can be any immutable object a Can have nested Hashtables 0 d3 spam 1 other eggs 2 spam 3 a d3 other spam a haskey keys values for k in keys a Typically you ll insertdelete with spam delicious a del d3 spam Files 1 outfilewriteS write the string S into the file a outfilewritelinesL write the list of strings L into the file a outfileclose this is also done by the garbage collector Tuples a Tuples are like immutable lists a Nice for dealing with enumerated types a Can be nested and indexed a t1 123 t2 1 2345 a Can index slice lengthjust like lists a t13t112t12 a Tuples are mostl use Jl when you want to have a list of a predetermined sizelength 0 Also constanttime access to elements fixed memory locations a Tuples are also very use Jl as keys for dictionaries Basic Python statements lt1 Python uses dynamic typing a No need to predefine variables 0 Variables are instantiated by assigning values to them 0 Referencing a variable before assignment is an error a You can assign multiple variables simultaneously spam s 4 ass a 5 spam eggs a eggs spam spam eggs a 5 Files 0 Since it s a scripting language Python has a lot of support for file lO o Operators are not too different from C a Outfile file fname w or infile file fname I a I is default and can be le out a S infileread read the entire file into the string S a S infilereadN read N lines into the string S a S inputreadline read one line a S a inputreadlines read the whole file into a list of Ings Q Unless the file is really huge it s fastest to read it all in at once with read or readlines Python variables a Variables must a begin with a letter or underscore a contain any number ofletters numbers or underscores a No etc a Case matters a Can t use reserved words as variables Printing 2 We ve already seen the basic print a print hello worldquot a To use a formatting string do 0 print hello squot bob a print quots squot quotheoquot quotworldquot a To suppress the linefeed include a Syntax a Indentation is used to delimit blocks L If you are going to be editing your code on multiple machines with different editors you may want to configure your editor to use spaces instead oftabs o Statements can cross multiple lines if rs u They re within delimIte a You use a backslash Conditionals a The general format for an if statement is a Notice the colons a erthe conditionals Compound statements consist ofthe colon followed by an indented block P Iteration a Python has the familiarwhile loop a One wrinkle it has an optional else clause to execute if the test is false 9 Additional loop control a a Exit the innermost loop without doing an else a continue a Return to the beginning ofthe loop a pass a Do nothing Conditionals 0 Logical tests return 1 for true and 0 forfalse a True and False are shorthand e and or not are available for compound tests For a For is a generic iteratorin Python o Lets you loop over any indexable data structure a for item in 1234 a for char in This is a stringquot a for keyin dictionary a This provides a uniform polymorphic inter ce to a set of different data structures a Note it s much fasterto use the polymorphic operator than to access manually at Le for item in list is fasterthan fori in lenlist u for item in dictionary is fasterthan foritem in dictionarykeys ALA L 1 1 Artificial Intelligence Programming Learning Decision Trees Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science 7 University of San Francisco 7 p 13 160 Learning a What does it mean for an agent to learn Department of Computer Science University of San Francisco p2339 GD 9 161 Learning What does it mean for an agent to learn Agent acquires new knowledge Agent changes its behavior Agent improves its performance measure on a given task Department of Computer Science University of San Francisco p3339 162 Learning Agents 6 Recall previously we talked about learning agents Performance standard Qgent Problem generator Actuators 39 Critic lt Sensors 4 feedback quot changes v Learning Performance element element learning goals 1U9UIUOJIAUH Department of Computer Science University of San Francisco p4339 163 Learning Agents 23 A learning agent has a performance element and a learning element A A The performance element is what an agent uses to decide what to do This is what we ve studied up to now 6 The learning element is what allows the agent to modify the performance element A This might mean adding or changing rules or facts modifying a heuristic changing a successor function In order to modify its behavior an agent needs information telling it how well it is performing This information is called feedback Department of Computer Science University of San Francisco p5339 164 Types of Feedback o There are essentially three categories of learning tasks each of which provides different feedback Supervised learning A In this case an external source often called a teacher provides the agent with labeled examples 0 Agent sees specific actionscases along with their classification Department of Computer Science University of San Francisco p6339 165 Types of Feedback o Unsupervised Learning a In this case there is no teacher to provide examples A The agent typically tries to find patterns in data 9 Reinforcement Learning a This is a particular version of learning in which the agent only receives a reward for taking an action A May not know how optimal a reward is a Will not know the best action to take Department of Computer Science University of San Francisco p7339 166 Supervised Learning Supervised learning is one of the most common forms of learning Agent is presented with a set of labeled data and must use this data to determine more general rules Examples A List of patients and characteristics what factors are correlated with cancer A What factors make someone a credit risk A What are the best questions for classifying animals a Whose face is in this picture This process of learning general rules from specific facts is called induction Department of Computer Science University of San Francisco p8339 167 Defining the Learning Problem We can phrase the learning problem as that of estimating a function f that tells us how to classify a set of inputs An example is a set of inputs 1 and the corresponding f a A ltlt Mammal Eats Meat Black Stripes Tawny gt Tiger gt We can define the learning task as follows a Given a collection of examples off find a function H that approximates f for our examples A H is called a hypothesis Department of Computer Science University of San Francisco p9339 168 Induction We would like H to generalize A This means that H will correctly classify unseen examples If the hypothesis can correctly classify all of the training examples we call it a consistent hypothesis Goal find a consistent hypothesis that also performs well on unseen examples We can think of learning as search through a space of hypotheses Department of Computer Science University of San Francisco p10339 169 Inductive Bias Notice that induction is not sound In picking a hypothesis we make an educated guess the way in which we make this guess is called a bias Examples 5 Occam s razor a Most specific hypothesis a Most general hypothesis A Linearfunction Department of Computer Science University of San Francisco p11339 1610 Observing Data Agents may have different means of observing examples of a hypothesis A batch learning algorithm is presented with a large set of data all at once and selects a single hypothesis An incremental learning algorithm receives examples one at a time and continually modifies its hypothesis A Batch is typically more accurate but incremental may fit better with the agent s environment An active learning agent is able to choose examples A passive learning agent has examples presented to it by an outside source A Active learning is more powerful but may not fit with the constraints of the domain Department of Computer Science University of San Francisco p12339 1611 Learning Decision Trees Decision trees are data structures that provide an agent with a means of classifying examples At each node in the tree an attribute is tested Yy K DOES it Fly Does it Eat Meat I No es Yy No Does it have Alb tr Does it have Does it have long legs a 055 Black StuPas along neck 0 Yes 2 o y No it H i A V Tiger Department of Computer Science University of San Francisco p13339 3 6 1612 Another Example R amp N show a decision tree for determining whether to wait at a busy restaurant The problem has the following inputsattributes A Alternative nearby A Has a bar A Day of week Y A Hungriness A Crowd A Price 6 Note that not all atributes A Raining are USGd A Reservation A Type of restuarant A Wait estimate Department of Computer Science University of San Francisco p14339 1613 An example training set Example Attributes Alt Bar Fm H un Pat Price Rain Res Type Est X1 T F F T Some F T French 0 10 X2 T F F T Fu F F Thai 30 60 X3 F T F F Some F F Burger 0 10 X4 T F T T Fu F F Thai 10 30 X 5 T F T F Fu F T French gt60 X6 F T F T Some T T Italian 0 10 X7 F T F F None T F Burger 0 10 X8 F F F T Some T T Thai 0 10 X9 F T T F Fu T F Burger gt60 X10 T T T T Full F T Italian 10 30 X1 F F F F None F F Thai 0 10 X12 T T T T Fu F F Burger 30 60 Department of Computer Science University of San Francisco p153 1614 Inducing a decision tree An example is a specific set of attributes along with a classification WillWait or not Examples where WillWait is true are called positive examples Examples where WillWait is false are called negative examples The set of labeled examples is called the training set We want to have a tree that a Classifies the training set correctly A Accurately predicts unseen examples A is as small as possible Occam s razor What if we construct a tree with one leaf for each m p Department of Computer Science University of San Francisco p16339 1615 Choosing useful attributes e Intuitively we would like to test attributes that split the training set 6 Splitting on restaurant type is not useful positive and negative examples are still clustered together In a b Department of Computer Science University of San Francisco p17339 1616 Constructing a decision tree 23 We can construct a decision tree recursively 1 2 If all examples are positve or negative we are done If there are no examples left then we haven t seen an instance of this classification so we use the majority classification of the parent If there are no attributes left to test then we have instances with the same description but different classifications 5 Insufficient description a Noisy data nondeterministic domain A Use majority vote Else pick the best attribute to split on and recursively construct subtrees Department of Computer Science University of San Francisco p18339 1617 An example tree Patrons The induced tree Notice that it s simpler and dis covers a relationship be tween Thai food and wait mg The handconstructed tree Department of Computer Science University of San Francisco p19339 1618 Choosing an Attribute The key to constructing a compact and efficient decision tree is to effectively choose attributes to test Intuition we want to choose tests that will separate our data set into positive and negative examples We want to measure the amount of information provided by a test This is a mathematical concept that characterizes the number of bits needed to answer a question or provide a fact Department of Computer Science University of San Francisco p20339 1619 Choosing an Attribute e Example let s say I m trying to predict the flip of a coin a Before the flip one bit headstails is enough to completely answer the question a After I ve seen the coin additional bits contain no information since they don t provide new knowledge Department of Computer Science University of San Francisco p21339 1620 Information Theory More formally let s say there are n possible answers 211112 on to a question and each answer has probability 132 of occuring The information content of the answer to the question is I 2171 PviloggPvi For a coin this is l092 l092 1 Questions with one highly likely anwer will have low information content if the coin comes up heads 99100 of the time 008 Information content is also sometimes called entropy A This is often used in compression and data transfer algorithms Department of Computer Science University of San Francisco p223 1621 Using Information Theory For decision trees we want to know how valuable each possible test is or how much information it yields We can estimate the probabilities of possible answers from the training set Usually a single test will not be enough to completely separate positive and negative examples Instead we need to think about how much better we ll be after asking a question This is called the information gain Department of Computer Science University of San Francisco p23339 1622 Information Gain If a training set has p positive examples and n negative examples the information in a correct answer is p n p p n n P I n p l n p I nZOQ2P l n P l nZOQ2P In We want to find the attribute that will come the closest to separating the positive and negative examples We begin by computing the remainder this is the information still in the data after we test attribute A Say attribute A can take on 2 possible values This test will create 2 new subsets of data labeled E1 E2 EU The remainder is the sum of the information in each of these subsets RemainderA Z m L m 731 19 Pilnq Prlm Department of Computer Science University of San Francisco p243 1623 Information Gain e Information gain can then be quantified as the difference between the original information before the test and the new information after the test a z4I E I nadnder 4 pn7pn Heuristic Always choose the attribute with the largest information gain a Question What kind of search is this Department of Computer Science University of San Francisco p25339 1624 Example Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Department of Computer Science University of San Francisco p263 1625 Example Which attribute is the best classi er s 95 E 0940 High Normal 34 E O985 61 E O592 Gain S Humidity 940 714985 714592 151 s 95 EO940 Weak Strong 62 33 EO811 E100 Gain S Wind 940 81481161410 048 Department of Computer Science University of San Francisco p273 1626 Noise If there are two examples that have the same attributes but different values a decision tree will be unable to classify them separately We say that this data is noisy Solution Use majority rule at the parent node Department of Computer Science University of San Francisco p28339 1627 Overfitting A common problem is learning algorithms occurs when there are random or anomalous patterns in the data a For example in the tennis problem by chane it might turn out that we always play on Tuesdays The phenomenon of learning quirks in the data is called over tting ln deicsion trees overfitting is dealt with through pruning Once the tree is generated we evaluate the significance of each node Department of Computer Science University of San Francisco p29339 1628 Pruning Assume that the test provides no information null hypothesis Does the data in the children look significantly different from this assumption Use a chisquare test If not the node is removed and examples moved up to the parent Department of Computer Science University of San Francisco p30339 Arti cial Intelligence Programming Markov Decision Processes Chris Brooks Department of Computer Science University of San Francisco Depa ment 0 Compute Science Unve sty 0 San anc sco p 234 Expected Utility 0 Recall that the expected utility of an action is the utility of each possible outcome weighted by the probability of that outcome occurring 39 More formally from state 3 an agent may take actions a1 a2 an 0 Each action a can lead to states 31 32 Sim with probability 19239171923927m719239m Zpijsij We call the set of probabilities and associated states the state transition model 39 The agent should choose the action a that maximizes EU Depa ment 0 Compute Science Un ve sty 0 San anc sco p 4 232 Making Sequential Decisions 0 Previously we ve talked about Making oneshot decisions in a deterministic environment 0 Making sequential decisions in a deterministic environment 39 Search 0 Inference 0 Planning 0 Making oneshot decisions in a stochastic environment 39 Probability and Belief Networks 0 Expected Utility 39 What about sequential decisions in a stochastic environment Depa tment 0 Compute Sc ence Unive sty 0 San ancisco p 2 235 Markovian environments 39 We can extend this idea to sequential environments 0 Problem How to determine transition probabilities O The probability of reaching state 3 given action a might depend on previous actions that were taken 0 Reasoning about long chains of probabilities can be complex and expensive The Markov assumption says that state transition probabilities depend only on a finite number of parents 0 Simplest a firstorder Markov process State transition probabilities depend only on the previous state 0 This is what we ll focus on Depa tment 0 Compute Sc ence Unive sty 0 San ancisco p 5 233 Sequential Decisions We ve thought a little 39 We can model this as 0 We can even use bit about this in terms of value of information a statespace problem a minimaxstyle ap proach to determine the optimal actions to take Depa tment 0 Compute Sc ence Unive si y 0 San 2 236 Stationary Distributions O O We ll also assume a stationary distribution This says that the probability of reaching a state 3 given action a from state 3 with history H does not change Different histories may produce different probabilities Given identical histories the state transitions will be the same We ll also assume that the utility of a state does not change throughout the course of the problem 0 In other words our model of the world does not change while we are solving the problem Depa tment 0 Compute Sc ence Unive si y 0 San z 2377 Solvlng sequentlal problems lfwe have to solve a sequential problem the total utility will depend on a sequence of states 5152 gs Let39s assign each state a reward R61 Agent wants to maximize the sum of rew s We call this formulation a Markov decision process quot Formally n state sn 0 A discrete set of states and actions 4 A Transition model Tsas that indicates the probability of reaching state s from s when taking ac 39 n 11 A reward function Rs 2340 MDP Solutlorls Things to note We39ve wrapped the goal formulation into the problem 39 Different goals will require different policies We are assuming a great deal of correct knowledge about the world 39 State transition models rew d ar s We39ll touch on how to learn these without a model 238 Example grld prpplem 2 El n1 n1 srAwl 2 a Agent moves in the intended direction with probability 08 and at a right angle with probability 02 What should an agent do at each state to maximize reward 237M Comparlng pollcles w r r me histories they produce The policy with the highest expected utility is the optimapolicy Once an optimal policy is found the agent can just look up the best action for any state 2379 MDP Solutlorls Since the environment is stochastic a solution will not be an action sequence 39 Instead we must specify what an agent should do in any reachable state t We call this specification a policy If you39re elowt e goal move u If you39re in the leftmost colum n move rightquot 39 We denote a policy with 7r and 5 indicates the policy for state 5 2342 Example grld proplem nm21lt mm a negatlve As the cost of being in a nonterminal state changes so does the optimal policy 39 Very high cost Agent tries to exit immediately even through bad exit 39 Middle ground Agent tries to avoid bad exit 6 Positive reward for nonterminals Agent dgesn39t tryet ogexit 2343 More on reward functions 2344 More Ori reward functions 2345 More Ori reward functions In solving an MDP an agent must consider the value of future We also need to think about how to value future reward Discounting lets us deal sensibly with infinite horizon problems anions 39 100 is worth more to me today than in a year C Othet N39Sev 3 EUS WOUId aPProa Winquot 39 There are different types of Problems to consider t We model this by discounting future rewards t Expected utilities will be finite if rewards are finite and bounded Horizon does the world go on fore e s 7 is a discoumracyo arid Y lt 1 Finite horizon afterN actions the world stops and no more SD SI SZ SS t We can now describe the optimal policy 7r as reward caquot be eamed Rsn 7R5172Rsz VSR53 7 s 01 39 Infinite horizon World goes on indefinitely or we don39t know when quot Sm quot397 is large wevalue future states 39 Infinite horizon is simpler to deal with as policies don39t change over ti argmaszUZ 7 RS l7r quot397 is low we focus on nearterm reward D In onetaryterms a discount factor ofy is equiwalent to an interest rate of 17 7 1 234 6 Value Iteration 2347 Utilities of States 39 How to find an optimal policy e We39ll begin by calculating the expected utili 2348 Utilities of States 39 The utility ofa state is the immediate reward for that state plus the expected discounted utility ofthe next state assuming that ty of each state and then selecting actions that maximize expected utility 3 0398 0398 0399 E the 399m Chooses the Optimal ECHOquot In a sequential problem the utility ofa state is the expected a utility of all the state sequences that follow from it 39 This depends on the policy 7r being executed 2 0752 066 E Essentially Us is the expected utility of executing an optimal policy from state 5 Us Rs y maza Ems a 5Us 39 This is called the Bellman equation 1 0705 0655 6 Example 0611 0388 U11 70 047mazlt0 8U1201U2101U11 11 0 1U12 1 2 3 4 0 9U1101U21 0 8U2 1 01U12 0 1U1 1 39 Notice that utilities are highest for states close to the 1 exit 2349 Dynamic Programming 23720 Valuelteration 2321 Value teration algorithm The Bellman equation is the basis of dynamic programming Since state utilities are defined in terms of other state utilities 39 In an acyclic transition graph you can solve these recursively how nd a Closed39form Solm39on byworking backward from the final state to the initial states 39 We can use an iterative V Can39t do this directly for transition graphs with loops do for s n s ates Us e Rs maxTsas Us approach untn 39 Give each state random initial utilitie 3 mm Change by less the 55 39 Calculate the new lefthand side for a state based on its nei hbors39 valu Propagate this to update the righthandside for other states Update rule U11s Rs 7maza EST 39 where 5 Error 1 e 7V7 512 SWIG This is guaranteed to converge to the solutions to the Bellman equations 2322 Discussion 39 Strengths ofValue iteratio 23723 Policy iteration 2324 Policy iteration algoritnrn Policy iteration helps address these weakness P MW P01 5 mdexequot by State do uaranteed to conVerge t0 COI39I39SCt SOlutIOI39I 39 Searches directly for optimal policies rather than state utilities u evaluate the utmty of each state to m S39mp39e quoteral39ve alg mhm Same idea iteratively update policies for each s a e f S m States a e 1nd actlon that mexlnnzes Expected utlhty for that state weakness t Two s 715 e e Converge e 5 W Given a policy compute the utilities for each state while some 3cmquot mama we really d quotl need a quotquot5 39quotfmmamquot Compute a new policy based on these new utilities Just need What to do at each sta e 2325 Learning a Policy So far we39ve assumed a great deal of knowledge 39 In panicular we39ve as at a model ofthe world is known quot his is the state transition model and the reward function 39 What ifwe don39t have del All we know is that there are a set of states and a set of actions We still want to learn an optimal policy 2328 Learning the 0 function 39 To learn Q we ne d to be able to estimate the value of taking an action in a state even though our rewards are spread out over time We can do this iteratively v Notice that Us maIaQsa We can then rewrite our equation for Q as ma Mm 7maraiQs a 23726 Orlearriirig Learning a policy directly is difficult t Problem our data is not ofthe form ltstate actiongt Instead it39s ofthe form 515253 t Sincew n39t know the transition learn the utility of a state t lnstead we39ll lear n a value function Qs a This will estimate the utility oftaking acti on a in state s 2329 Learning the 0 function t Let39s denote our estimate onsa as Qsa 39 We39ll keep a table listing each stateaction pair and estimated Qvalue the agent observes its state 5 chooses an action a then observes the reward 739 Rsa that it receives and the new state s It then updates the Qtable according to the following formula mm TWaza0sa R function it39s also hard to 2327 Orlearriirig More precisely Qsa will repr state 5 then acting optimally aft ma Rea WazaZSiTsas Us The optimal policy is then to take the action with the highest Q value in each state esent the value oftaking a in er that t lfthe agent can learn Qsa it can take optimal actions even without knowing either the reward function or the transition function 2330 Learning the 0 function The agent uses the estimate on for s to estimate Q for s 39 Notice 39t need any knowledge of or the t ansition function to execute this c stateaction pairs that the agent doesn Qlearning is guaranteed to converge as long as Rewards are bounded c The agent selects s tateaction pairs in such a way that it each infinitely often This means that an agent must have a nonzero probability of selecting ea h a in each s as the sequence of approaches infinity Arti cial Intelligence Programming Agents and Environmenb Chns Evuuks Department at Computer Smence Umvevsily at San Fianmscu Intelligent Agents 1 The idea of developing intelligent agents has been a unifying one for AI R Previously subdisciplines were fairly balkanized a Learning knowledge representation search vision etc a Agent programs that may require several ofthese abilities Overview a What makes an agent a Defining an environment a Types ofagent programs What is an agent Q There are lots of potential definitions R agent is anything that can be viewed as perceiving its environment through sensors and acting on that environment through actuators a Woolridge An agent is a computer system that is situated in an environment and is capable of autonomous action Overview 0 What makes an agent a a Qualities of an agent a Autonomy a Adaptation a Goaldirected behavior 9 Has beliefs and intentions a Proactive a Situated within an environment Autonomy o Autonomy is a quality often attributed to agents 0 An autonomous agent is able to rely on its percepts and past experience to make decisions rather than asking a human for help 0 This is a thorny area most agents will not have complete autonomy Q When might we not want an agent to have complete autonomy 0 Challenge Designing an agent that can reason about its own autonomy and know when to ask for help Depa tment o Compu e Scence elm ye sty o Sari anctscoe p6 77 Agents and Environments 9 One shift from other courses we ll think explicitly about an agent s environment and how that affects execution Agent 0 Percepts Information delivered to an my agent s sensors light sound EM m waves signals e E39 Q Sensors An agent s mechanisms for a a gathering data about its enVIronment eyes ears photoelectric cells Q Actuators An agent s mechanisms for affecting its environment Wheels arms radios lights etc 0 Actions Actual changesto the environ ment running rolling Depa tment o Compu e Scence elm ye sty o Sari anctscoe pg 77 Agentoriented Programming Q We can also think of agents as a programming paradigm o The next logical step after objects a Objects do it for free agents do it for money a Objects are receivers of actions agents are actors 0 It s less useful to think of agent as an objective label than as a subjective description 9 Agency is a useful abstraction for us as programmers a Allows us to think about a program at a higher level 9 Treat something as an agent if that helps to understand predict or explain its behavior 0 Thermostats as agents Depa tment o Compute Science 7Umye Si y o Sari ancisco Agent Programs 9 We can describe our agent s behavior as a function F 9 Action Fcurrentpercept percepthistory 9 Maps a percept sequence to an action 0 Actually implementing this function is the work of an agent program 0 That s what we ll spend most of our time on Depa ment o Compu e Science elm ye sty o Sari ancsco Usefulness of the agent metaphor Q Why bother with all this We already know how to write programs 9 Agents tend to be openended programs 9 Difficult to specify all cases in advance e Instead write programs that can work with a wide se of cases 0 Separate out the knowledge from the reasoning mechanism 1 It s helpful to talk about them as if they were intelligent o The robot wants to find the power supply a The server believes that the client has reset o This assigning of mental states to programs is called the intentional stance Depa ment o Compute Science elm ye sty o Sari Example Vacuumcleaner World a Robotic vacuum cleaners are actually on the market a 150 at Amazon Q A re ex agent mostly Depa tmento Compute Scenceiuniye sty o Sari a Example Vacuumcleaner World 2 Let s start with a very simple approximation a Two rooms A and B Each room can be either clean or dirt G This is the agent s environment a Sensors Dirt sensor location a Actuators Vacuum wheels in Percepts Clean Dirty a Actions Move Ie move right suck do nothing Rationality 1 Notice that this is a specification ofan outcome rather than how an agent should behave a A rational action is one that tries to maximize an agent s performance measure given its percepts and actions a R amp N Rational agents For each possible percept sequence a rational agent should select an action that is expected to maximize its performance measure given whatever builtin knowledge the agent has Example Vacuumcleaner World a In this simple world we could list all the possible percept sequences and associated actions a This is known as a tablebased or lookup agent a Question How do we fill in the best action for each percept sequence a Great for simple worlds but doesn t scale 1 We need a more compact representation for this table Rationality 0 expected vs actual We don t require that our agent be able to predict the future or predict unlikely events 1 Information gathering might also be a rational action lt1 Crossing the street without looking is in39ational o Rational agents must be able to learn except in very simple wellunderstood environments a Learning is defined as improving an agent s mance o This could mean reducing uncertainty ortaking observations into account Rationality a Roughly rationality means doing the right thingquot a More precision is needed what is the right thingquot a We need a definition of success 9 Begin with a performance measure a This is a con ition or state of the world we d like the agent to ac ieve 9 Both rooms are cleanquot perhaps more criteria such as minimizing time power consumed or number of actions a en 0 We might prefer a scalar measure or a boolean ondition Overview a a Defining an environment a Environments a A criterion for being an agent is existing in an environment a Not necessarily physical could be a sottware environment 1 R amp N refer to the task environment a Consists of a Performance measure 0 Environment a Actuators available to the agent a Sensors available to the agent Deterministic stochastic 1 We can think ofthe world as going through state transitions 3 Currentsmte gtlt agentALtums A newstate u lfthe state transition is unique the world is deterministic 9 Does an action always produce the same result 1 chessplaying deterministic a Vacuum world deterministic n Driving a car stochastic Characteristics of the Environment at Observability a Deterministicstochastic a Episodic vs sequential a Static vs Dynamic a Discrete vs continuous a Singleagent vs multiagent Deterministicstochastic lt1 Note we re avoiding splitting hairs with tummechanical uestions here it s possible that the world could be deterministic but appear stochastic ue to its complexity o How does the world appear to the agent Observability a The environment is fully observable if the agent s sensors always give it complete information about the nt relevant parts ofthe envrronme a Is there anything the agent needs to know that it cannot sense r1 Chessplaying fully observable o What about the other player a Vacuum cleaner partially observable can t see if there s dirt in the adjacent square Episodic vs Sequential a Episodic each action is independent a Agent perceives decides acts Start a ain o Next decision does not depend on previous states 1 A spamfiltering agent is episodic a lfthe agent must take a series ofactions to accomplish a task or achieve a goal the environment is sequential a Future decisions must be considered 1 Driving a car is sequential Static vs Dynamic a A static environment holds stillquot while the agent is deciding on an action a Agent is not under time pressure to come to a decision a Spamfiltering is static a Chessplaying is static a A dynamic environment changes while the agent is deciding what action to take a Harder the agent must act quickly enoughquot a Driving a car is dynamic Discrete vs Continuous a A continuous environment has continuouslychanging variables 0 Steering angles in a cardriving environment a Realvalued sensor readings we can split hairs about precision here the point is whether or not there s a distinguishable change between two values a Time is the element we ll o en be concerned with Static vs Dynamic a Semidynamic the environment doesn t change but the performance measure changes over time 9 Taking a timed test 0 Playing chess with a clock a Still pressure to act quickly Singleagent vs Multiagent o Singleagent Our agent is acting on its own a World may still be stochastic a Spamfiltering is singleagent o Multiagent The actionsgoalsstrategies of other agents must e taken into account a Chessplaying bidding in an auction 1 Issues at Even thou h a world may have other agents we may choose to treat it as singleagent and stochastic for complexity reasons e an agent controlling traffic signals c Cooperative vs Competitive Discrete vs Continuous a We can talk about discrete vs continuous in terms of the agent s percepts actions or possible states ofthe environment a lfthe possible values for any ofthese are a discrete set then the environment is discrete wrt that characteristic a Discrete is not the same as finite o A spamfiltering environment is discrete even though there s a countably infinite number of possible mails Some examples a Chessplaying Monopolyplaying slotmachine playing a Robot getting me coffee in Harney a Mars orbiter a Webcrawling agent a Conversational agent a Medical diagnosis agent Overview 0 a 1 Types of agent programs Tabledriven agent 0 Keep a dictionary that maps percept sequences to tions lass TableDnvenAgent Agent MTms agent selects an actlon based on the percept sequence It s practlca only for my domans def int7sef table mm as table a dlctzlonary of all pactheq uencemctlon Fan39s Parcepts e 1 def programpercept Perceptsappendpercept on e tablegettupeparcepts actlon Types of agent programs a Typically we can t enumerate every possible percept sequence and action 1 Too many possibilities a We as programmers may not know what to do for each sequence a We may not understand the problem well enough 0 Need to create a more compact representation Examples of tabledriven agents lt1 Exception handlers a Square rootJlogarithm tables a Won t scale to Al environments Q Chess 10150 percept sequences Q And this is an easy environment Deterministic fully observable a Taxi driving 10250300900 000 entries for one hour of driving 0 Also highly redundant and uninformative 0 Many percept sequences may lead to the same action Q Designer has no guidance as to how to fill in the table Types of agent programs a Tabledriven agent a Reflex agent a Modelbased reflex agent a Goalbased agent a Utilitybased agent a Learning agent Re ex agent 9 Given the current percept select an action a Ignore history a E Re ex agent 0 Given the current percept select an action class Re exvacuumAQEnt Agent 4 A re t for flex egen tne tweetete Vacuum envlronment mg 291 def progremumeetmn status 1f status ee Dlrty return Suck Z r a This agent will only be rational if the best action can be chosen based only on the current percepts Modelbased re ex agent 1 A modelbased reflex agent maintains an internal representation ofthe wor a We ll refer to this as keeping state about the world 9 Actions are selected based on the model and the current percepts lass ModelEasedVacuumAgent Agent An 39611 that keeps track of what locatlons ere e1een ur duty e Joe A Nona we None Same es ReflexVaCuumAgent except 1f everythan s e1een do way e s ode nere Cle 11 return 39pr39 ehf locatlon ee we return Left Re ex Agents a Examples of reflex agents a Thermostat agent a Spamfiltering agent sometimes a Insight table can be represented more compactly as a set of conditionaction rules a Problems c Writing rules that apply in all cases is hard 0 O en it s necessary to remember some ofthe past Modelbased re ex agent a Maintaining a representation ofthe environment is extremely use ul 0 Allows the agent to rememberthings Q Can anticipate future even 5 a Can make predictions about unseen parts ofthe environment 1 Still uses rules conditioned on the model and the sensors n Much ofourtime will be spent constructing and manipulating models Modelbased re ex agent a A modelbased reflex agent maintains an internal representation ofthe world a Actions are selected based on the model and the cu n39ent percepts a 5 Agent Modelbased re ex agent 9 Examples of modelbased agents a Vacuumcleaner agent with a map a Spamfiltering agent maybe ll Factory robots Types of models Goalbased agent Goalbased agent a Attributes and values ii Knowing the current a Goalbased reasoning is very useful for sequential a Probability distributions over variables state onhe env39ronmem env39ronmems39 I not always enough a Chessplaying a Data structures Q Ma 5 a The right action may a Taxi drivmg Gr hs 33950 9P9 PP What a Spaceship piloting a p the agent is trying to Th ht t t t k f t a Finite State Machines accomplish a e quot9 390 o a e ora g39vequot percep sequence epen s upon the agent s knowledge its model its D SEISCI ECHO that WI current state percepts and what it is trying to achieve help accomplish goals currently 9 Facts about the world 1 searCh arid Flaming are D Future lectures will look at using search to accomplish used to solve this prob goa5 lem Utilitybased agent Utilitybased agent Utilitybased agent 2 Utilities are sometimes a controversial issue in Al a Assumption outcomes can be linearly ordered with c There may be many action sequences that achieve a ties on a consistent scale goal Whmwmbem ll Example Taxi driving agent What is the utility of a Utility is used to compare the relative desirability of quot quot A driVing Offthe Bay Bridge action sequences D Requires designer to explicitly evaluate outcomes Q Maps states to real numbers Qualitative V5 quantitative Q an be an estimate ofcosty time or relative value of a Utilities are very useful in domains with probability different outcomes andl rm e 39 a Utilities are very useful in dealing with partially 1 onllne tradmg eXplomnon m uncenaln ob enVIronments amblin servable or stochastic enVIronments g g 1 Goals may not be enough in highcomplexity environmen s Arti cial Intelligence Programming Ontalagies Chns Evuuks Depavlmenl uv Compute Smence Umvevswly at San Fianmscu Knowledge Enginering 1 The knowledge engineering process typically consists of the following steps a 3 Select a vocabulary of classes functions predicates and constants D This vocabulary is called an ontology a 4 Encode general knowledge about the domain a Formally represent the knowledge gathered in step a This may require revisiting step 3 a 5 Encode specific queries or problems to be solved a This may require returning to steps 3 and 4 Knowledge Engineering at Logic provides one answer to the question of how to say things Q It does not tell us what to say c Typically this is the hard part We want to give our agent enough knowledge to solve all of the problems we are interested in a The process of building a knowledge base is referred to as knowledge engineering 1 Many ofthe same principles as so ware engineering Ontologies a An ontology is a vocabulary that describes all ofthe objects of interest in the domain and the relations between them a All ofthe relevant knowledge about a problem we are interested in a Ontolgies allow knowledge about a domain to be shared ween agents including humans a This can allow knowledge to be reused more easily a Allows an agent to perform inference with its current knowledge Knowledge Enginering a The knowledge engineering process typically consists o39 the following steps 1 Determine the queries and types offacts that are availableallowed For example will the agent need to select actions nclusions or just answer questions from a human user a Will we ask existential queries or just universal enes 2 Gather the relevant knowledge a Interview experts and find out how the domain works Vocabulary a An ontology consists of o A set of concepts or classes at Guitar25tGeMgeHmmsmVac Gmtmzstx A Musumnx a Instances ofthese classes a Relations between classes of objects sometimes called slots or properties or roles n perfmmsBeatles thteAllmm aantaznedOnRmkgE u Restrictions on properties sometimes called constraints a w y Amman A 3mm A amtazned0ny z A perfmmsxy Apgrfmmson Ontologies vs 00 desigi a Classes are a primary focus ofontology design a In many ways this looks like objectoriented design a We have classes and subclasses and properties of classes that look like data members 9 However properties have richer semantics than data embers u A property may attach to several classes at once and have subproperties a We can specify constraints on the values in a slot s range 9 Properties can exist without being assigned to a class Types of OW s 1 OWL Lite a Does not allow cardinality except 0 and 1 EL Primarily classes simple constraintsrules 0 Efficient inference mechanisms exist a OWL DL a Complete and decidable but more expressive a Inference mechanisms xi a This is whatwe ll be using a OWL Full a Allows metaclasses richer reasoning about classes Q No inference mechanisms exist for all of OWL Full Ontology tools a An popular means for assisting in knowledge engineering is the use of GUIbased tools 9 Details of logic and inference are hidden from the user a Protege is one of the more wellknovm ontology develoment tools Uses of OWL 0 Organization and indexing ofjournals and scientific literature a Human Genome Project and bioinformatics in general a Data integration between DBs in an enterprise a Automated Web Service description and composition a and others OWL a Web Ontology Language OWL is an XMLbased language for representing ontologies a Built on top ofRDF a Encodes a description logic v1 Decidable subset offirstorder logic 1 We ll be using a graphical tool Protege that uses OWL as its underlying representation The Pizza Tutorial 9 The pizza tutorial provides a nice tour ofthe issues involved in creating an ontology in Protege with OWL a We begin by asking competency questions these are questions we d like our KB to be able to answer a What toppings are on a Margherita pizza a Is the Americana pizza vegetarian a What can I put on my pizza to make it spicy o What pizzas have Tomato on them Classes a As with 00 design we can work topdown bottomup orin a combination ofthe two lt1 Let s start by making a Pizza class a We then make PizzaBase and PizzaTopping a Make these disjoint Domains and ranges 1 We can specify that properties can only apply to certain types of objects a set the range ofhasTopping to be PizzaTopping a Set the domain of hasTopping to be Pizza o Same for hasBase Classes or Use the Wizard to add subclasses of PizzaBase DeepDish Base ThinAndCrispyBase lt1 Use the Wizard to subclass PizzaTopping MeatTopping VeggieTopping CheeseTopping SeafoodTopping a Subclassthese Property Restrictions lt1 The primary way that we write rules in Protege is through the use of property restrictions 0 We can use an existential restriction Pizza must have a PizzaBase to specify that a Properties a Now we can start to create Properties 0 Add a haslngredient property a In Protege properties can have subProperties D Add subproperties hasTopping hasBase a We can also add inverse properties 9 Also functional inverse JnctionaI symmetric and transitive a make hasBase JnctionaI Types of pizzas a Add a subclass of pizza caed NamedPizza a Add a subclass ofthis MargheritaPizza u existential restriction has Tomato and Mozzarella toppings a Add Americana Pizza cone Margherita add Pepperoni a Add AmericanaHotPizza cone Americana add jalapenos a Add SohoPizza Margherita pus Olives Parmesan Reasoners a As your ontology gets large it can be difficult to maintain a Do you have the hierarchy right 0 Are there contradictions a Are there other relations that are entailed n A big advantage ofusing this sort oftool is the ability to perform this inference automatically a Tools for doing this are knovm as reasoners a We ll be using a particular reasoner known as RACER Universal restrictions 1 We specificed that a Margherita pizza must have Tomato and Mozzarella but not that those were the only things it could have a To do this we use universal restrictions o Create a VeggiePizza a Under necessary and sufficient indicate that all toppings must be veggie or cheese 1 Run RACER again Reasoners a We can use the reasonerto 9 Check consistency a Classify the taxonomy a lnfer class membership 0 Let s add an inconsistent class lnconsistentTopping a Subclass ofCheese a Restriction must be a Veggie a Run the reasoner Open World Reasoning Q Why is Margherita and Soho not suclassed as Veggie a Protege and OWL use openworld reasonin Q a Can t assume something is false just because it s not stated to be true a Margherita could have other toppings q Add a closure axiom to MargheritaPizza a Use the widget to add closure axioms for Soho and Americana Necessary and Suf cient Conditim Q So far we ve only specificed necessary conditions a This is not enough to show that something must be a member of a particular class 9 Create a CheesePizza class a Add a necessary and Sufficient restriction has a Cheese toppIng a This is now a de niti n a CheesyPizza is a Pizza with a Cheese topping a Run RACER again Value Partitions 9 Use the Wizard to create a SpicynessValuePartition a Subclass thisto get Hot Mild Medium a Add hasSpiciness property a Use the PropertyRestrictions wizard ALA L 1 1 Artificial Intelligence Programming Decision Trees Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science 7 University of San Francisco 7 p 17 130 Learning a What does it mean for an agent to learn Department of Computer Science University of San Francisco p2739 GD 9 131 Learning What does it mean for an agent to learn Agent acquires new knowledge Agent changes its behavior Agent improves its performance measure on a given task Department of Computer Science University of San Francisco p3739 132 Learning Agents 6 Recall that at the beginning of the semester we talked about learning agents Performance standard Ki 3 Critic lt Sensors feedback E quot changes v Learning Performance 8 element element 5 1 B earn1ng goals E V Problem generator U Q g ent Actuators gt Department of Computer Science University of San Francisco p4739 133 Learning Agents 23 A learning agent has a performance element and a learning element A A The performance element is what an agent uses to decide what to do This is what we ve studied up to now 6 The learning element is what allows the agent to modify the performance element A This might mean adding or changing rules or facts modifying a heuristic changing a successor function In order to modify its behavior an agent needs information telling it how well it is performing This information is called feedback Department of Computer Science University of San Francisco p5739 134 Types of Learning Tasks c There are essentially three categories of learning tasks each of which provides different feedback Supervised learning A In this case an external source often called a teacher provides the agent with labeled examples 0 Agent sees specific actionscases along with their classification Department of Computer Science University of San Francisco p6739 135 Types of Learning Tasks o Unsupervised Learning A In this case there is no teacher to provide examples A The agent typically tries to find a concept or pattern in data A Statistical methods such as clustering fall into this category 9 Reinforcement Learning A This is a particular version of learning in which the agent only receives a reward for taking an action A May not know how optimal a reward is A Will not know the best action to take Department of Computer Science University of San Francisco p7739 136 Supervised Learning Supervised learning is one of the most common forms of learning Agent is presented with a set of labeled data and must use this data to determine more general rules Examples A List of patients and characteristics what factors are correlated with cancer A What factors make someone a credit risk A What are the best questions for classifying animals a Whose face is in this picture This process of learning general rules from specific facts is called induction Department of Computer Science University of San Francisco p8739 137 Defining the Learning Problem We can phrase the learning problem as that of estimating a function f that tells us how to classify a set of inputs An example is a set of inputs 1 and the corresponding fw A ltlt Mammal EatsMeat BlackStripesTawny gt Tiger gt We can define the learning task as follows a Given a collection of examples off find a function H that approximates f for our examples A H is called a hypothesis Department of Computer Science University of San Francisco p9739 138 Induction We would like H to generalize A This means that H will correctly classify unseen examples If the hypothesis can correctly classify all of the training examples we call it a consistent hypothesis Goal find a consistent hypothesis that also performs well on unseen examples We can think of learning as search through a space of hypotheses Department of Computer Science University of San Francisco p10739 139 Inductive Bias Notice that induction is not sound In picking a hypothesis we make an educated guess the way in which we make this guess is called a bias All learning lagorithms have a bias identifying it can help you understand the sorts of errors it will make Examples A Occam s razor A Most specific hypothesis a Most general hypothesis a Linearfunction Department of Computer Science University of San Francisco p11739 1310 Observing Data Agents may have different means of observing examples of a hypothesis A batch learning algorithm is presented with a large set of data all at once and selects a single hypothesis An incremental learning algorithm receives examples one at a time and continually modifies its hypothesis A Batch is typically more accurate but incremental may fit better with the agent s environment An active learning agent is able to choose examples A passive learning agent has examples presented to it by an outside source A Active learning is more powerful but may not fit with the constraints of the domain Department of Computer Science University of San Francisco p12739 1311 Online vs Offline 23 An offline learing algorithm is able to separate learning from execution A Learning and performance are separate a Batch learning is easier computational complexity is less of a factor 9 An online learning algorithm allows an agent to mix learning and execution 5 Agent takes actions receives feedback and updates its performance component a Incremental learning makes more sense fast algorithms a requirement A note on algorithm speed we will worry about both training time how long does it take to construct a hypothesis and classification time fn nlnccifv a nnw incfnnr a sco p13739 1312 Learning Decision Trees Decision trees are data structures that provide an agent with a means of classifying examples At each node in the tree an attribute is tested Yy K DOES it Fly Does it Eat Meat I No es Yy No Does it have Alb tr Does it have Does it have long legs a 055 Black StuPas along neck 0 Yes 2 o y No it H i A V Tiger Department of Computer Science University of San Francisco p14739 3 6 1313 Another Example R amp N show a decision tree for determining whether to wait at a busy restaurant The problem has the following inputsattributes A Alternative nearby A Has a bar A Day of week Y A Hungriness A Crowd A Price 6 Note that not all atributes A Raining are USGd A Reservation A Type of restuarant A Wait estimate Department of Computer Science University of San Francisco p15739 1314 An example training set Ex Attributes Target Alt Bar Fri H un Pat Rain Res Type Est Wait X1 T F F T Some F T French 010 T X2 T F F T Fu F F Thai 3060 F X3 F T F F Some F F Burger 010 T X4 T F T T Fu F F Thai 1030 T X 5 T F T F Full F T French gt60 F X6 F T F T Some T T Italian 010 T X7 F T F F None T F Burger 010 F X8 F F F T Some T T Thai 010 T X9 F T T F Fu T F Burger gt60 F X10 T T T T Fu F T Italian 1030 F X11 F F F F None F F Thai 010 F X12 T T T T Fu F F Burger 3060 T Department of Computer Science University of San Francisco p16 1315 Inducing a decision tree An example is a specific set of attributes along with a classification Wait or not Examples where Wait is true are called positive examples Examples where Wait is false are called negative examples The set of labeled examples is called the training set We want to have a tree that A Classifies the training set correctly 5 Accurately predicts unseen examples a is as small as possible Occam s razor What if we construct a tree with one leaf for each example Department of Computer Science University of San Francisco p17739 1316 Choosing useful attributes e Intuitively we would like to test attributes that split the training set 6 Splitting on restaurant type is not useful positive and negative examples are still clustered together A Equal number of yes and no answers ore e 1 El 6 Department of Computer Science University of San Francisco p18739 1317 Constructing a decision tree 23 We can construct a decision tree recursively 1 2 If all examples are positve or negative we are done If there are no examples left then we haven t seen an instance of this classification so we use the majority classification of the parent If there are no attributes left to test then we have instances with the same description but different classifications 5 Insufficient description a Noisy data nondeterministic domain A Use majority vote Else pick the best attribute to split on and recursively construct subtrees Department of Computer Science University of San Francisco p19739 1318 An example tree Patrons The induced tree Notice that it s simpler and dis covers a relationship be tween Thai food and wait mg The handconstructed tree Department of Computer Science University of San Francisco p20739 1319 Choosing an Attribute The key to constructing a compact and efficient decision tree is to effectively choose attributes to test Intuition we want to choose tests that will separate our data set into positive and negative examples We want to measure the amount of information provided by a test This is a mathematical concept that characterizes the number of bits needed to answer a question or provide a fact Department of Computer Science University of San Francisco p21739 1320 Choosing an Attribute e Example let s say I m trying to predict the flip of a coin a Before the flip one bit headstails is enough to completely answer the question a After I ve seen the coin additional bits contain no information since they don t provide new knowledge Department of Computer Science University of San Francisco p22739 1321 Information Theory More formally let s say there are n possible answers v1 v2 on to a question and each answer has probability 132 of occuring The information content of the answer to the question is I 21PUil092pvi For a coin this is l0g2 l0g2 1 Questions with one highly likely anwer will have low information content if the coin comes up heads 99100 of the time 008 Information content is also sometimes called entropy A This is often used in compression and data transfer algorithms Department of Computer Science University of San Francisco p23739 1322 Using Information Theory For decision trees we want to know how valuable each possible test is or how much information it yields We can estimate the probabilities of possible answers from the training set Usually a single test will not be enough to completely separate positive and negative examples Instead we need to think about how much better we ll be after asking a question This is called the information gain Department of Computer Science University of San Francisco p24739 1323 Information Gain If a training set has p positive examples and n negative examples the information in a correct answer is 19 1 ZOQ2 bg We want to find the attribute that will come the closest to separating the positive and negative examples We begin by computing the remainder this is the information still in the data after we test attribute A Say attribute A can take on 1 possible values This test will create 1 new subsets of data labeled E1 E2 EU The remainder is the sum of the information in each of these subsets RemainderA 2 7 gtk I pi i 11 19 PHr nzquot pini Department of Computer Science University of San Francisco p25739 1324 Information Gain e Information gain can then be quantified as the difference between the original information before the test and the new information after the test a nD I B 1 I na nderc4 pn pn Heuristic Always choose the attribute with the largest information gain a Question What kind of search is this Department of Computer Science University of San Francisco p26739 1325 Example Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Department of Computer Science University of San Francisco p27l 1326 Example Which attribute is the best classi er s 95 E 0940 High Normal 34 E O985 61 E O592 Gain S Humidity 940 714985 714592 151 s 95 EO940 Weak Strong 62 33 EO811 E100 Gain S Wind 940 81481161410 048 Department of Computer Science University of San Francisco p28l 1327 Noise If there are two examples that have the same attributes but different values a decision tree will be unable to classify them separately We say that this data is noisy Solution Use majority rule at the parent node Department of Computer Science University of San Francisco p29739 1328 Overfitting A common problem is learning algorithms occurs when there are random or anomalous patterns in the data a For example in the tennis problem by chance it might turn out that we always play on Tuesdays The phenomenon of learning quirks in the data is called over tting ln decision trees overfitting is dealt with through pruning Once the tree is generated we evaluate the significance of each node Department of Computer Science University of San Francisco p30739 1329 Pruning Assume that the test provides no information null hypothesis Does the data in the children look significantly different from this assumption Use a chisquare test If not the node is removed and examples moved up to the parent Department of Computer Science University of San Francisco p31739 1330 Continuousvalued inputs Decision trees can also be extended to work with integer and continuousvalued inputs Discretize the range into for example lt 70 and gt 70 Challenge What value yields the highest information gain Use a hillclimbing search to find this value Department of Computer Science University of San Francisco p32739 1331 Features of Decision Trees Work well with symbolic data Use information gain to determine what questions are most effective at classifying a Greedy search Produce a humanunderstandable hypothesis Fixes needed to deal with noise or missing data Can represent all Boolean functions Department of Computer Science University of San Francisco p33739 Arti cial Intelligence Programming Ontologies Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science University of San Francisco p 11 162 Knowledge Engineering 39 Logic provides one answer to the question of how to say things 39 It does not tell us what to say Typically this is the hard part 39 We want to give our agent enough knowledge to solve all of the problems we are interested in 39 The process of building a knowledge base is referred to as knowledge engineering 0 Many of the same principles as software engineering Department of Computer Science University of San Francisco p 21 163 Knowledge Enginering The knowledge engineering process typically consists of the following steps 1 Determine the queries and types of facts that are availableallowed O For example will the agent need to select actions or make conclusions orjust answer questions from a human user 0 Will we ask existential queries orjust universal queries 2 Gather the relevant knowledge 0 Interview experts and find out how the domain works 3 Select a vocabulary of classes functions predicates and constants O This vocabulary is called an ontology 4 Encode general knowledge about the domain 0 Formally represent the knowledge gathered in step 2 O This may require revisiting step 3 5 Encode specific queries or problems to be solved n Tkn MAI vANI Ivt InnIn Ivnv ln nInnn DPWWCWPUTHSCience UniVerSitY Ofsan FranCiSCO p37 164 Ontologies 39 An ontology is a vocabulary that describes all of the objects of interest in the domain and the relations between them 0 All of the relevant knowledge about a problem we are interested in Ontolgies allow knowledge about a domain to be shared between agents including humans This can allow knowledge to be reused more easily Allows an agent to perform inference with its current knowledge Department of Computer Science University of San Francisco p 41 165 Vocabulary An ontology consists of O A set of concepts or classes 0 Pr0f653390rB7 00k8 Vx Professorx gt USFEmployedx Instances of these classes 39 Relations between classes of objects sometimes called slots or properties or roles 0 SalaryBr00ks 500 NameB7 00k3 Chri3 39 Restrictions on slots sometimes called facets O Vy Professo x Salaryxy gt y lt 1000 Department of Computer Science University of San Francisco p 51 166 Ontologies vs 00 design 39 Classes are a primary focus of ontology design 39 In many ways this looks like objectoriented design 39 We have classes and subclasses and properties of classes that look like data members However slots have richer semantics than data members 39 A slot may attach to several classes at once 39 We can specify constraints on the values in a slotaAZs range Slots can exist without being assigned to a class Department of Computer Science University of San Francisco p 61 167 Graphical Representations 39 Logic is a very powerful representation language but it can be difficult for humans to work with 39 In addition the lack of structure between sentences in a KB can make inference difficult 39 A semantic network is a way to graphically represent and reason about some logical concepts Department of Computer Science University of San Francisco p 71 168 Semantic Net example 39 SisterO Mary John 39 FemaleMa7 y 39 VxFemaldx gt Personm 39 MaleJ0hn 39 VxMale gt Personm 39 VP6r80n gt Mamma x 39 VxPersoMx gt DefaultNumberOfLegs 2 39 legsJohn 1 Department of Computer Science University of San Francisco p 81 169 Inference in a Semantic Net 39 Inference becomes easy in a semantic net to answer questions about John we follow the labeled edges emanating from the John node 0 This is the same sort of inference done by modem 00 languages to resolve inheritance 39 Strengths Knowledge is easily visualized relationships between objects are clearer 0 Weaknesses only binary relations no negation disjunction or existential quantifiers 0 We can extend semantic nets to include these features but we lose the transparency 0 Instead use a semantic net to model classslot relations and FOL or something similar to model rules Department of Computer Science University of San Francisco p 91 1610 Protege 39 Protege is a Javabased graphical tool for constructing ontologies 39 Has a rich set of plugins for performing inference and visualizing data 39 Can export data in RDF and OWL for use with the Semantic Web Department of Computer Science University of San Francisco p 10 1611 Using Protege to build a simple ontology 39 Let s make a simple ontology to describe the USF C8 department We begin by asking competency questions these are questions weaAZd like our KB to be able to answer 0 Who is taking C8662 0 Which professors teach classes on Monday 0 How many students are taking both C8662 and C8601 0 What classes should one take before taking C8662 0 Which professors assign the most work Department of Computer Science University of San Francisco p 11 1612 Classes 39 As with 00 design we can work topdown bottomup or in a combination of the two LetaAZs start by making a Person class 39 We can then subclass that to make Professors and Students Students have GradStudent and UndergradStudent subclasses LetaAZs also make a separate class representing courses Department of Computer Science University of San Francisco p 12 1613 Slots Now we can start to fill in the slots 39 LetaAZs start with Student 0 Students have a Name an ID and some courses theyaAZre taking 0 LetaAZs also add a aAZyearaAZ slot to Undergrads this can be Freshman SophomoreJunior or Senior 39 Next we can do Professors O Professors have a name a salary and some courses they teach Wait Maybe name should be in person instead of student and professor O Thenvdo course course should have an aAZ39inverse slotaAZfor taughtby and enrolls Department of Computer Science University of San Francisco p 13 1614 Instances 39 Now we can begin to populate our knowledge base with instances of the classes weaAZve created 39 To begin add aAZBrooksaAZ as a Professor with salary 100000 I wish 39 We can create a course object on the fly to represent C8662 39 Next create a student use your own name Department of Computer Science University of San Francisco p 14 1615 Forms We can now enter instances but the default editor settings are not very helpful In particular seeing the symbol names for each instance is unhelpful Forms will let us change that Forms also let you customize or modify the UI that a domain expert will use to add instances Department of Computer Science University of San Francisco p 15 1616 Querying the KB 39 The query pane lets us create save and retrieve queries 39 We can specify either AND or OR 0 Try finding all students enrolled in C8662 Department of Computer Science University of San Francisco p 16 ALA L 1 1 Artificial Intelligence Programming Python Intro Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science 7 University of Sari Francisco 7 p m 2b0 What is Python o Python is A Highlevel A Objectoriented A Free opensource A Dynamically typed A Has a large collection of utility libraries A garbagecollected A Mixable works nicely as a glue language A Easy to learnuse Department of Computer Science University of San Francisco p22 2b1 Some uses of Python o Things that Python is good at a System utilities Python has hooks into standard OS level services such as files processes pipes etc a GUls Python has interfaces to Tk and Qt A Embedded integration Python has hooks that allow it to callbe called by C and C code and also COM interfaces A Rapid Interactive Development Like Lisp Python makes it easy to quickly put together a working program 5 Scripting Python has utilities for CGI Database access HTTP Sockets FTP POPSMTP XML niversity of San Francisco p32 2b2 Invoking Python Programs 63 There are four ways to run a Python program A Interactiver A From the command line A As a script A From another program Department of Computer Science University of San Francisco p42 Gt 6 2b3 Python Implementations lusrbinpython on all unix machines IDLE and PythonWin on Windows Machthon on Macintosh Also for Amiga PaImOS BeOS etc Department of Computer Science University of San Francisco p52 2b4 Python Program Structure 63 Programs consist of modules A A module corresponds to a source file 6 Modules contain blocks of statements 6 Statements create and process objects Department of Computer Science University of San Francisco p62 2b5 Basic Python o Python has a nice set of builtin data types a Numbers 5 Strings A Lists a Dictionaries A Files o Using builtin types makes your code easier to writemaintain more portable and more efficient Department of Computer Science University of San Francisco p72 2b6 Numbers Numbers work like we d expect There are integers equivalent to longs in C 1 31311 4000 There are long integers which are of unlimited size 31111L12345D There are floats equivalent to doubles in C123 31e5 There are Octal and Hexadecimal representations as in C 0155 0x3af5 There are complex numbers as in Lisp Matlab etc 304j Department of Computer Science University of San Francisco p82 2b7 Mathematical Operators Python has all the usual mathematical operators or pow for exponentiation abs rand amp This makes Python a very handy desk calculator Operations are coerced to the most specific type a 3 40 2 will produce a float Department of Computer Science U niversity of San Francisco p92 2b8 Strings One of Python s strong suits is its ability to work with strings Strings are denoted with double quotes as in C s1 s2 concatenation s1 3 repitition s1i indexing s1ij slicing s11 last character a parrot dead formatting for char in s1 iteration Department of Computer Science University of San Francisco p102 2b9 Strings e Strings are immutable sequences to change them we need to make a copy a Can t do s13 c a Must do 52 s102 c s13 Department of Computer Science University of San Francisco p112 2b10 Lists Python has a flexible and powerful list structure Lists are mutable sequences can be changed in place Denoted with square brackets I1 1 234 Can create nested sublists l2 12 34 5 6 7 l1 l2 concatenation l1 4 repetition 135 13 15 slices append extend sort reverse built in Range create a list of integers Department of Computer Science University of San Francisco p122 2b11 Dictionaries A Dictionary is a Python hash table or associative list Unordered collections of arbitrary objects d1 new hashtable d2 spam 2 eggs 3 Can index by key d2 spam Can have nested Hashtables A d3 spam 1 other eggs 2 spam 3 A d3 other spam haskey keys Values for k in keys Typically you ll insertdelete with A d3 spam delicious A del d3 spam Department of Computer Science University of San Francisco p132 2b12 Tuples Tuples are like immutable lists Nice for dealing with enumerated types Can be nested and indexed t1 123 t2 1 2345 Can index slice length just like lists A t1 3 t112 t12 Tuples are mostly useful when you want to have a list of a predetermined sizelength Department of Computer Science University of San Francisco p142 2b13 Files Since it s a scripting language Python has a lot of support for file O Operators are not too different from C Outfile open fname w or infile open fname r S infileread read the entire file into the string 8 S infilereadN read N lines into the string 8 S inputreadine read one line 8 inputreadines read the whole file into a list of strings Department of Computer Science University of San Francisco p152 2b14 Files o outfilewriteS write the string 8 into the file outfilewriteinesL write the list of strings L into the file outfileclose this is also done by the garbage collector Department of Computer Science University of San Francisco p162 2b15 Basic Python statements e Python uses dynamic typing a No need to predefine variables 6 Variables are instantiated by assigning values to them a Referencing a variable before assignment is an error 6 You can assign multiple variables simultaneously spam 4 eggs 5 spam eggs eggs spam spam eggs 45 Department of Computer Science University of San Francisco p172 2b16 Python variables Variables must A begin with a letter or underscore a contain any number of letters numbers or underscores No etc Case matters Can t use reserved words as variables Department of Computer Science University of San Francisco p182 2b17 Printing o We ve already seen the basic print A print hello world e To use a formatting string do a print hello s bob a print quots s quothelloquot quotworldquot 6 To suppress the linefeed include a Department of Computer Science University of San Francisco p192 2b18 Conditionals o The general format for an if statement is if lttestlgt ltstatement1gt ltstatement2gt elif lttest2gt ltstatement3gt else ltstatementngt o Notice the colons after the conditionals Compound statements consist of the colon followed by an indented block Department of Computer Science University of San Francisco p202 2b19 Conditionals e Logical tests return 1 for true and O for false 65 True and False are shorthand and or not are available for compound tests Department of Computer Science University of San Francisco p212 2b20 Syntax e Tabs are used to delimit blocks Statements can cross multiple lines if 0 They re within delimiters a You use a backslash Department of Computer Science University of San Francisco p222 2b21 Iteration o Python has the familiar while loop 65 One wrinkle it has an optional else clause to execute if the test is false 9 Additional loop control A break Exit the innermost loop without doing an else A con nue A Return to the beginning of the loop A pass Do nothing Department of Computer Science University of San Francisco p232 2b22 For o For is a generic iterator in Python 65 Lets you loop over any indexable data structure A for item in 12345 a for char in This is a string a for key in dictionarykeys o This provides a uniform polymorphic interface to a set of different data structures Department of Computer Science University of San Francisco p242 2b23 Functions def is used to define a function return returns a value Functions maintain local scope Names are resolved locally then globally then builtin Department of Computer Science University of San Francisco p252 2b24 Functions Multiple values can be returned in a tuple We can also provide default arguments Functions can be called with keywords A myfuncspam1 eggs2 args can be used to catch arbitrary argument lists and store them in a tuple args can be used to catch arbitrary argument lists and store them in a dictionary Department of Computer Science University of San Francisco p262 Arti cial Intelligence Programming More Neural Networks Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science University of San Francisco p 11 242 Neural networks 39 A network is composed of layers of nodes 0 input hidden output layers in feedfonNard nets 39 An input is applied to the input units 39 The resulting output is the value of the function the net computes Department of Computer Science University of San Francisco p 21 243 Perceptrons 0 Singlelayer networks perceptrons are the simplest form of NN 39 Easy to understand but computationally limited Each input unit is directly connected to one or more output units Units JV Units Department of Computer Science University of San Francisco p 31 244 Delta rule 39 The appeal of perceptrons is the ability to automatically learn their weights in a supervised fashion The weight updating rule is known as the delta rule 39 Awi oz ZdEDtd 0dz d 39 Where D is the training set td is expected output and ad is actual output Department of Computer Science University of San Francisco p 41 245 Example 39 Let s suppose we want to Inputs 01119111 learn the majority function 1 0 0 0 with three inputs 0 1 1 1 39 Firing fn 0i 1 1 O 1 lif ij j gt 0 00therwz39se 1 1 1 1 39 bias O1 0 0 1 0 39 a 01 1 0 1 1 O O O O O 1 O O 39 initially all weights 0 Department of Computer Science University of San Francisco p 51 246 Example inputs expected output w1 w2 W3 actual output new weights 1 o o o o 0 0 0 o o o 0 1 1 1 1 1 0 1 1 1 1 1 0 0 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Department of Computer Science University of San Francisco p 61 247 Example inputs expected output w1 w2 W3 actual output new weights 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 01 01 1 1 0 1 1 1 1 1 0 0 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Department of Computer Science University of San Francisco p 71 248 Example inputs expected output w1 w2 W3 actual output new weights 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 01 01 1 1 0 1 01 01 01 02 01 1 1 1 1 0 0 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Department of Computer Science University of San Francisco p 81 249 Example inputs expected output w1 w2 W3 actual output new weights 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 01 01 1 1 0 1 0 01 01 0 01 02 01 1 1 1 1 01 02 01 1 01 02 01 0 0 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Department of Computer Science University of San Francisco p 91 2410 Example inputs expected output w1 w2 W3 actual output new weights 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 01 01 1 1 0 1 0 01 01 0 01 02 01 1 1 1 1 01 02 01 1 01 02 01 0 0 1 0 01 02 01 0 01 02 01 1 0 1 1 0 0 0 0 0 1 0 0 Department of Computer Science University of San Francisco p 10 2411 Example inputs expected output w1 w2 W3 actual output new weights 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 01 01 1 1 0 1 0 0 1 0 1 0 01 02 01 1 1 1 1 01 02 01 1 01 02 01 0 0 1 0 01 02 01 0 01 02 01 1 0 1 1 01 02 01 1 01 02 01 0 0 0 0 0 1 0 0 Department of Computer Science University of San Francisco p 11 2412 Example inputs expected output w1 w2 W3 actual output new weights 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 01 01 1 1 0 1 0 01 0 1 0 01 02 01 1 1 1 1 01 02 01 1 01 02 01 0 0 1 0 01 02 01 0 01 02 01 1 0 1 1 01 02 01 1 01 02 01 0 0 0 0 01 02 01 0 01 02 01 0 1 0 0 Department of Computer Science University of San Francisco p 12 2413 Example inputs expected output w1 w2 W3 actual output new weights 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 01 01 1 1 0 1 0 01 01 0 01 02 01 1 1 1 1 01 02 01 1 01 02 01 0 0 1 0 01 02 01 0 01 02 01 1 0 1 1 01 02 01 1 01 02 01 0 0 0 0 01 02 01 0 01 02 01 0 1 0 0 01 02 01 1 01 01 01 At this point the network is trained properly 0 In many cases we would need more iterations to converge on a solution Department of Computer Science University of San Francisco p 13 2414 Gradient Descent and the Delta rule 39 Recall that perceptrons are only able to perfectly learn lineary separable functions 0 This is their representational bias In other cases we can still learn as much as is possible 39 We ll interpret this to mean minimizing the sum of squared error 39 E 12 2td 0052 for d in training set Department of Computer Science University of San Francisco p 14 2415 Gradient Descent and the Delta rule 39 We can visualize this as a hillclimbing search through a space of weights Defining E in this way gives a parabolic space with a single global minimum The delta rule adjusts the weights so that E is reduced at each step 39 By following the gradient in this space we find the combination of weights that minimizes error 39 Use unthreshoded output what is the real number computed by the weighted sum of inputs Department of Computer Science University of San Francisco p 15 2416 Multilayer Networks 39 While perceptrons have the advantage of a simple learning algorithm their computational limitations are a problem 39 What if we add another hidden layer 39 Computational power increases 0 With one hidden layer can represent any continuous function 0 With two hidden layers can represent any function 0 Problem How to find the correct weights for hidden nodes Department of Computer Science University of San Francisco p 16 Department of Computer Science University of San Francisco p 17 Input units U u Hidden units Output units ai 2417 Multilayer Network Example 2418 Backpropagation 39 Backpropagation is an extension of the perceptron learning algorithm to deal with multiple layers of nodes Nodes use sigmoid activation function rather than the step function 0 9mputz O g mput gmput 1 ginput good news here to compute g we just need g We will still follow the gradient where 9 gives us the gradient Department of Computer Science University of San Francisco p 18 2419 Backprogagation 39 Notation O aj output of the jth hidden unit 0 0i output of the ith output unit 0 ti output for the ith training example 0 Output error for output node z t 0i 0 Delta rule for output node z 6r tr 0i gtllt g7 nput gtk 1 gmput 0 Weight updating hiddenoutput ij ij or gtllt aj gtllt 6i 0 Same as the perceptron rule except that we use use 9 rather than just the difference between expected and actual Department of Computer Science University of San Francisco p 19 2420 Backpropagation 39 Updating inputhidden weights 39 Idea each hidden node is responsible for a fraction of the error in Si 39 Divide 6i according to the strength of the connection between the hidden and output node For each hidden nodej 39 5339 9input1 gamut ZiEOutputs Wji5 Update rule for inputhidden weights kaj kaj or gtilt inputk gtilt Sj Department of Computer Science University of San Francisco p 20 2421 Backpropagation Algorithm 39 The whole algorithm can be summed up as While not done for d in training set Apply inputs of d propagate forward for node z in output layer 6 output gtllt 1 output gtllt twp output for each hidden node Sj output gtllt 1 output gtllt Z WMCS Adjust each weight Wj Wj 05 gtk gtk inputj Department of Computer Science University of San Francisco p 21 2422 Theory vs Practice In the definition of backpropagation a single update for all weights is computed for all data points at once 0 Find the update that minimizes total sum of squared error Guaranteed to converge in this case 39 Problem This is often computationally spaceintensive 0 Requires creating a matrix with one row for each data point and inverting it C In practice updates are done incrementally instead Department of Computer Science University of San Francisco p 22 2423 Stopping conditions 39 Unfortunately incremental updating is not guaranteed to converge 39 Also convergence can take a long time 39 When to stop training 0 Fixed number of iterations 0 Total error below a set threshold 0 Convergence no change in weights Department of Computer Science University of San Francisco p 23 2424 Comments on Backpropagation 39 Also works for multiple hidden layers 39 Backpropagation is only guaranteed to converge to a local minimum 0 May not find the absolute best set of weights 39 Low initial weights can help with this 0 Makes the network act more linearly fewer minima 0 Can also use random restart train multiple times with different initial weights Department of Computer Science University of San Francisco p 24 2425 Momentum 39 Since backpropagation is a hillclimbing algorithm it is susceptible to getting stuck in plateaus 0 Areas where local weight changes don t produce an improvement in the error function 39 A common extension to backpropagation is the addition of a momentum term 0 Carries the algorithm through minima and plateaus 0 ldea remember the direction you were going in and by default keep going that way 0 Mathematically this means using the second derivative Department of Computer Science University of San Francisco p 25 2426 Momentum 39 Implementing momentum typically means remembering what update was done in the previous iteration 39 Our update rule becomes 39 ij n 05ijer AWjiO l 1 39 To consider the effect imagine that our new delta is zero we haven t made any improvement 39 Momentum will keep the weights moving in the same direction Also gradually increases step size in areas where gradient is unchanging O This speeds up convergence Department of Computer Science University of San Francisco p 26 2427 Design issues 39 As with GAs one difficulty with neural nets is determining how to encode your problem 0 Inputs must be 1 and O or else realvalued numbers 0 Same for outputs 39 Symbolic variables can be given binary encodings More complex concepts may require care to represent correctly Department of Computer Science University of San Francisco p 27 2428 Design issues Like some of the other algorithms we ve studied neural nets have a number of paramters that must be tuned to get good performance 0 Number of layers 0 Number of hidden units 0 Learning rate 0 Initial weights 0 Momentum term 0 Training regimen 0 These may require trial and error to determine 39 Alternatively you could use a GA or simulated annealing to figure them out Department of Computer Science University of San Francisco p 28 2429 Radial Basis Function networks One problem with backpropagation is that every node contributes to the output of a solution This means that all weights must be tuned in order to minimize global error Noise in one portion of the data can have an impact on the entire output of the network Also training times are long Radial Basis function nets provide a solution to this Department of Computer Science University of San Francisco p 29 2430 Radial Basis Function networks 39 Intuition Each node in the network will represent a portion of the input space 39 Responsible for classifying examples that fall near it 39 Vanilla approach For each training point lt xi fwd create a node whose center is xi The output of this node for a new input x will be W gtllt gb 51 Where W is the weight and gb exp gb is a basis function Department of Computer Science University of San Francisco p 30 2431 Radial Basis Function networks 39 Each node has a zone of influence where it can classify nearby examples Training due to misclassification will only affect nodes that are near the misclassified example 39 Also network is singlelayer Weights can be trained by writing a matrix equation 0 ltIgtW t o W ltIgt 1t 0 lnverting a matrix is a much faster operation than training with backpropagation Department of Computer Science University of San Francisco p 31 2432 Recurrent NNs 39 So far we ve talked only about feedfonNard networks 0 Signals propagate in one direction 0 Output is immediately available 0 Wellunderstood training algorithms 39 There has also been a great deal of work done on recurrent neural networks 0 At least some of the outputs are connected back to the inputs Department of Computer Science University of San Francisco p 32 2433 Recurrent NNs 0 This is a singlelayer recurrent neural network 4 n1 Out1 39 9 gt Out2 mg a gt Out3 0 Notice that it looks a bit like an S R latch Department of Computer Science University of San Francisco p 33 2434 Hopfield networks 39 A Hopfield network has no special input or output nodes Every node receives an input and produces an output Every node connected to every other node Typically threshold functions are used Network does not immediately produce an output 0 Instead it oscillates Under some easytoachieve conditions the network will eventually stabilize Weights are found using simulated annealing Department of Computer Science University of San Francisco p 34 2435 Hopfield networks Hopfield networks can be used to build an associative memory 39 A portion of a pattern is presented to the network and the net recalls the entire pattern Useful for letter recognition Also for optimization problems 39 Often used to model brain activity Department of Computer Science University of San Francisco p 35 372 Overview What makes an agent I I I I 39 Defining an environment Am clal Intelllgence Programmmg Types of agem programs Agents andEnvironments cm Evuuks Depavtment ut Cumputev Science Univevsity ufSan Fianciscu 374 Intelligent Agents 375 What is an agent 39 The idea of developing intelligent agents has been a unifying There are lots of potential definitions one forequot R amp N An agent is anything that can be viewed as perceiving 4 Previously sudesprlInes were fairly balkanlzed quots environmen hrough sensors and acting on that Learning knowledge representation search vision etc environment through actuators Agent programs that may require several ofthese abi ies t Woolridge An agent is a computer system that is situated in an environment and is capable of autonomous action 373 Overview What makes an agent on 76 Qualities of an agent Autonomy Adaptation Goaldirected behavior Has beliefs and intentions l Proactive 39 Situated within an environment 37 Autonomy Autonomy is a quality often attributed to agents An autonomous agent is able to rely on its percepts and past experience to make decisions ratherthan asking a human for help This is a thorny area most agents will not have complete autonomy When might we not want an agent to have complete autonomy Challenge Designing an agent that can reason about its own autonomy and know when to ask for help Depa menl 0 Compute Science7 on ye sly 0 San ancscor p 7 77 310 Agents and Environments Agent One shift from other courses we ll think explicitly about an agent s environment and how that affects execution Percepts Information delivered to an agent s sensors light sound EM waves signals 39 Sensors An agent s mechanisms for gathering data about its environment eyes ears photoelectric cells quantum ug Actuators An agent s mechanisms for affecting its environment Wheels arms radios lights etc 39 Actions Actual changes to the environ ment running rolling Depa lmenl o Compu e Scence iUlllVe sly 0 San anclscor p o 77 00 Agentoriented Programming We can also think of agents as a programming paradigm O The next logical step after objects 0 Objects do it for free agents do it for money Objects are receivers of actions agents are actors It s less useful to think of agent as an objective label than as a subjective description Agency is a useful abstraction for us as programmers O Allows us to think about a program at a higher level Treat something as an agent if that helps to understand predict or explain its behavior 0 Thermostats as agents Depa lmenl 0 Compute sc ence 7 Unlve sly 0 San anclsco r p 8 7 311 Agent Programs 0 We can describe our agent s behavior as a function F 0 Action Fcurrentpercept percepthistory 0 Maps a percept sequence to an action 0 Actually implementing this function is the work of an agent program 0 That s what we ll spend most of ourtime on Depa lmenlo Compute Sciencei Umve Sl y 0 San anClSCorp 7 39 Usefulness of the agent metaphor Why bother with all this We already know how to write programs Agents tend to be openended programs 0 Difficult to specify in advance what they should do in all cases lt s helpful to talk about them as if they were intelligent The robot wants to find the power supply The server believes that the client has reset This assigning of mental states to programs is called the intentional stance Depa lmenl 0 Compute Scencei Unlye Sl y 0 San a 312 Example Vacuumcleaner World G Robotic vacuum cleaners are actually on the market 150 at Amazon 0 A reflex agent mostly Depa memo Compute ScienceiUli ye slyo San ar

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I made $350 in just two days after posting my first study guide."

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.