### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Intro CS 3600

GPA 3.81

### View Full Document

## 21

## 0

## Popular in Course

## Popular in ComputerScienence

This 0 page Class Notes was uploaded by Alayna Veum on Monday November 2, 2015. The Class Notes belongs to CS 3600 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 21 views. For similar materials see /class/234119/cs-3600-georgia-institute-of-technology-main-campus in ComputerScienence at Georgia Institute of Technology - Main Campus.

## Reviews for Intro

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 11/02/15

NEURAL NETWORKS CHAPTER 19 SECTIONS 175 Chapter 19 Sections 175 1 Outline ltgt Brains ltgt Neural networks ltgt Perceptrons ltgt Multilayer perceptrons ltgt Applications of neural networks Chapter 19 Sections 175 Brains 1011 neurons of gt 20 types 1014 synapses lms lOms cycle time Signals are noisy spike trainsquot of electrical potential Axonal arborization Axon from another cell Synapse Dendrite Synapses Cell body or Soma Chapter 19 Sections 175 3 McCulloch Pitts unit Output is a squashed linear function of the inputs CLZ39 lt g EjoJaj 1 Bias Weight a0 W01 ai 80771 g Input Input Activation Output Links Function Function output Links Chapter 19 Sections 175 4 Activation functions 1 1 a a is a step function or threshold function b is a sigmoid function 111L 6quot Changing the bias weight WW moves the threshold location go Chapter 19 Sections 175 5 Implementing logical functions W0 2 15 W0 2 05 W0 2 05 W1gt W15 W2 2 1 W2 2 1 AND OR NOT McCulloch and Pitts every Boolean function can be implemented Chapter 19 Sections 175 6 Network structures Feedforward networks singlelayer perceptrons multilayer perceptrons Feedforward networks implement functions have no internal state Recurrent networks Hopfield networks have symmetric weights 933 signa ai i 1 holographic associative memory Boltzmann machines use stochastic activation functions m MCMC in BNs recurrent neural nets have directed cycles with delays gt have internal state like flipflops can oscillate etc Chapter 19 Sections 175 7 Feedforward example 1 W13 W 4194 35 2 24 Feedforward network a parameterized family of nonlinear functions a5 9W35 39 a3 W45 39 a4 9W35 39 9W13 39 a1 W23 39 a2 W45 399W1439a1 W24 39 a2 Chapter 19 Sections 175 8 Perceptrons Perceptron output 1 z I I z I I y t I V WNWzquot 107 I I 1 111141 I 1 I 9 I I 047 211 OIt 1 Illl 00 1 11111110 110 II at I llff Input Output Units Jquot Units Chapter 19 Sections 175 9 Expressiveness of perceptrons Consider a perceptron with g step function Rosenblatt 1957 1960 Can represent AND OR NOT majority etc Represents a linear separator in input space Zjomjgt0 or WXgt0 11 11 1 Q 1 O 0 0 0 1 12 0 1 12 a 11 and 12 b 11 or 12 c 11 xor 12 Chapter 19 Sections 175 10 Learn by adjusting weights to reduce error on training set Perceptron learning The squared error for an example with input X and true output y is 1 1 E EEW Z 3 ECU hWX27 Perform optimization search by gradient descent 9E Err 9 E 2 w aWj EM aWj Warmy g 7 m 7 Err gtlt g m gtlt Jij Simple weight update rue Wj lt Wj0 gtlt Errxg m xajj Eg ve error gt increase network output gt increase weights on ve inputs decrease on ve inputs Chapter 19 Sections 175 11 Perceptron learning contd Perceptron learning rule converges to a consistent function for any linearly separable data set l l 6 6 2 09 3 09 13 lt13 8 08 x 8 08 Decisiontree 39 39 Perce tron o x x xx e medgd o P g 07 336 3 g 07 8 8 r a a M I 2 06 Perceptron 2 06 v E Decisiontree E xquot g 05 E 05 m m 04 04 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Training set size Training set size Chapter 19 Sections 175 12 Chapter 19 Sections 175 13 Input units Q r Hidden units Output units numbers of hidden units typically chosen by hand Layers are usually fully connected Multilayer perceptrons Expressiveness of MLPS All continuous functions w 2 layers all functions w 3 layers h hwx1 x2 8 999999999 membmoqwo Chapter 19 Sections 175 14 Backpropagation learning Output layer same as for singlelayer perceptron Wm lt WM oz gtlt aj gtlt Ai where Ai E77 gtlt 91ml Hidden layer backpropagate the error from the output layer Aj glam ngJ39AZ Update rule for weights in hidden layer Wm lt Wk7j0z gtlt ak gtlt Aj Most neuroscientists deny that backpropagation occurs in the brain Chapter 19 Sections 175 15 Backpropagation derivation The squared error on a single example is defined as 1 E Ezlyi 027 where the sum is over the nodes in the output layer 9E ai 99239m 3W Jz39 WWW Jz39 at 3W 92m 7 9 ltyi aigtg39ltznigt ltyi aigtgWjiaj Jz39 allymuch 0in Chapter 19 Sections 175 16 Backpropagation derivation contd aigini ijiajgt AiWM Ain7iag Zjgt AZ39Wngm g zj A Wj7 9mjm WWW ZAin7iginjak CLkAj Chapter 19 Sections 175 17 Backpropagation learning contd At each epoch sum gradient updates for all examples and apply Total error on training set O 50 100 150 200 250 300 350 400 Number of epochs Usual problems with slow convergence local minima Chapter 19 Sections 175 18 Backpropagation learning contd 1 I I I I I I I dig FM 45 4 7t 4 7A W g 43th V 0 0 I correct on test set 0 l 06 Multilayer network Decision tree 05 A 04 I I I I I 0 10 20 30 40 50 60 70 80 90 100 Training set size Chapter 19 Sections 175 19 Handwritten digit recognition 433 f 9 3 H E i 3quot Sixfj Q 3nearestneighbor 24 error 400 300 10 unit MLP 16 error LeNet 768 192 30 10 unit MLP 09 Chapter 19 Sections 175 20 I 971 39171 1630qu git 39171 HEILdVHQ SHHOAALCHN NVISCHAVS Z 971 39171 1630qu suonnqmsgp pazualawemd ltgt sanuewag ltgt xelu g ltgt aumno Bayesian networks A simple graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax a set of nodes one per variable a directed acyclic graph link directly influencesquot a conditional distribution for each node given its parents PXilPCL7 TLt8XZ gtgt In the simplest case conditional distribution represented as a conditional probability table CPT giving the distribution over X2 for each combination of parent values Chapter 14143 3 Example Topology of network encodes conditional independence assertions Toothache Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity Chapter 14143 4 Example I m at work neighbor John calls to say my alarm is ringing but neighbor Mary doesn t call Sometimes it s set off by minor earthquakes Is there a burglar Variables Burglar Earthquake Alarm JohnCalls MaryCalls Network topology reflects causal knowledge A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call Chapter 14143 5 9 971 39171 1630qu IO39 1 9039 1 0L39 l 0639 l VIWM V VIDJ 10039 6 17639 9639 EI EIIVM H ELLFLL m F me 10039 01 z00 EDJ MeISJna exeanuea ll 39pquoo aIdwmxg Compactness A CPT for Boolean X2 with k Boolean parents has 2 rows for the combinations of parent values Q Q Each row requires one number p for X2 true g the number for X2 false is just 1 p G W If each variable has no more than k parents the complete network requires 0n 2 numbers le grows linearly with H vs 02 for the full joint distribution For burglary net 11 4 2 2 10 numbers vs 25 1 31 Chapter 14143 7 II Global semantics Global semantics defines the full joint distribution as the product of the local conditional distributions P11 13 Hi1Pmilparent8XZ eg Pj m a b e Q9 0360 Chapter 14143 8 II Global semantics Global semantics defines the full joint distribution as the product of the local conditional distributions Q Q P91 9 Hi1Pmilparent8XZ egPjma b e g E PjlaPmlaPalaba a PabPa 09 X 07 X 0001 X 0999 x 0998 000063 22 Chapter 14143 9 II Local semantics Local semantics each node is conditionally independent of its nondescendants given its parents Theorem Local semantics ltgt global semantics Chapter 14143 10 II Markov blanket Each node is conditionally independent of all others given its Markov blanket parents children children s parents Chapter 14143 11 Constructing Bayesian networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics 1 Choose an ordering of variables X1 Xn 2 For i 1 to n add Xi to the network select parents from X1 Xi1 such that Xi1 This choice of parents guarantees the global semantics PX1 Xn Hi1PXZlX1 Xi1 chain rule H1PXilPCLT TLt8Xigt by construction Chapter 14143 12 61 971 39171 1910 qu i d MPH g g V T W guungo aql asooqo 9M asoddng H 611dequ T71 971 39171 1610qu WM W rh d JPNH W rh d 0N i d mm g g V T W guungo aql asooqo 9M asoddng aIdwmxg Example Suppose we choose the ordering M J A B E Burglary PJM PJ No PAJ M PAJ PAJ M PA No PBA J M PBA PBA J M PB Chapter 14143 15 Example Suppose we choose the ordering M J A B E Earthquake JM PJ No P PAJ M PAJ PAJ M PA No PBA J M PBA Yes PBA J M PB No PEB A J M PEA PEB A J M PEA B Chapter 14143 16 Example Suppose we choose the ordering M J A B E Earthquake JM PJ No P PAJ M PAJ PAJ M PA No PBA J M PBA Yes PBA J M PB No PEB A J M PEA No PEB A J M PEA B Yes Chapter 14143 17 Example contd Burglary Earthquake Deciding conditional independence is hard in noncausal directions Causal models and conditional independence seem hardwired for humans Assessing conditional probabilities is hard in noncausal directions Network is less compact 1 2 4 2 4 13 numbers needed Chapter 14143 18 Example Car diagnosis Initial evidence car won t start Testable variables green broken so fix itquot variables orange Hidden variables gray ensure sparse structure reduce parameters Chapter 14143 19 Example Car insurance V w 7 edicalCost 4 iabilityCost 139 ropertyCos C hapter Compact conditional distributions CPT grows exponentially with number of parents CPT becomes infinite with continuousvalued parent or child Solution canonical distributions that are defined compactly Deterministic nodes are the simplest case X fParent8X for some function f Eg Boolean functions N orthAmerican ltgt Canadian V U S V M emican Eg numerical relationships among continuous variables Level 6t inflow precipitation outflow evaporation Chapter 14143 21 Compact conditional distributions contd NoisyOR distributions model multiple noninteracting causes 1 Parents U1 Uk include all causes can add leak node 2 Independent failure probability qz for each cause alone PXlU1Uj jilaUk1 H 1qi Cold Flu Malaria PFever PbFever 00 10 09 01 08 02 098 002 02 X 01 04 06 094 006 06 X 01 088 012 06 X 02 0988 0012 06 X 02 X 01 l l l l393939393939l3939l l l39393939l l l39393939l l3939 l3939 l3939 l3939l Number of parameters linear in number of parents Chapter 14143 22 Hybrid discrete l continuous networks Discrete Subsidy and Buys continuous Harvest and Cost Subsidy w Buys Option 1 discretization possibly large errors large CPTs Option 2 finitely parameterized canonical families 1 Continuous variable discretecontinuous parents eg Cost 2 Discrete variable continuous parents eg Buys Chapter 14143 23 II Continuous child variables Need one conditional density function for child variable given continuous parents for each possible assignment to discrete parents Most common is the linear Gaussian model eg PCost elHarvest h Subsidy true NUth bt O39tC l C 53 Mean Cost varies linearly with Harvest variance is fixed Linear variation is unreasonable over the full range but works OK if the likely range of Harvest is narrow Chapter 14143 Continuous child variables 6 f M k R e W 1g 9939 f Z g 7 6 K Allcontinuous network with LG distributions gt full joint distribution is a multivariate Gaussian Discretecontinuous LG network is a conditional Gaussian network Le a multivariate Gaussian over all continuous variables for each combination of discrete variable values Chapter 14143 25 Discrete variable w continuous parents Probability of Bugs given Cost should be a soft threshold 1 i l l i i 08 4 c 06 A falselCost 04 4 PBuys 02 4 Cost 0 Probit distribution uses integral of Gaussian 1333 KOO N0 11d1 PBuy8 true l Cost 0 C13 C uo Chapter 14143 26 Why the probit 1 It s sort of the right shape 2 Can view as hard threshold whose location is subject to noise H e Chapter 14143 27 II Discrete variable contd Sigmoid or logit distribution also used in neural networks 1 PBuy8true l Costc W 6ch JO Sigmoid has similar shape to probit but much longer tails 1 l l i i 09 A 08 4 07 4 06 4 05 4 04 4 03 A 02 A 01 4 falsel Coch PBuys Chapter 14143 28 INFERENCE IN FIRST ORDER LOGIC CHAPTER 9 Chapter 9 1 ltgtltgtltgtltgtltgtltgt Outline Reducing firstorder inference to propositional inference Unification Generalized Modus Ponens Forward and backward chaining Logic programming Resolution Chapter 9 2 450BC 322BC 1565 1847 1879 1922 1930 1930 1931 1960 1965 A brief history of reasoning Stoics Aristotle Cardano Boole Frege Wittgenstein Go39del Herbrand Go39del DavisPutnam Robinson propositional logic inference maybe syllogisms inference rules quantifiers probability theory propositional logic uncertainty propositional logic again firstorder logic proof by truth tables 3 complete algorithm for FOL complete algorithm for FOL reduce to propositional complete algorithm for arithmetic practical algorithm for propositional logic practical algorithm for FOL resolution Chapter 9 3 H Universal instantiation UI Every instantiation of a universally quantified sentence is entailed by it V1 oz SUBSTvg oz for any variable v and ground term 9 Eg V51 Greedyv gt Evilz yields KingUohn GreedyJ0hn gt Evionhn KingRiChard GreedyRiChard gt EvilRiChard KingFatherJ0hn GreedyFatherJ0hn gt EvilFatherJ0hn Chapter 9 4 H Existential instantiation EI For any sentence oz variable v and constant symbol k that does not appear elsewhere in the knowledge base 3 v or SUBSTvk or Eg 3 1 Crownv OnHeadv John yields Crown01 OnHeadCl John provided 01 is a new constant symbol called a Skolem constant Another example from 3 1 dcydy zvy we obtain deydy 6y provided 6 is a new constant symbol Chapter 9 5 Existential instantiation contd Ul can be applied several times to add new sentences the new KB is logically equivalent to the old El can be applied once to replace the existential sentence the new KB is not equivalent to the old but is satisfiable iff the old KB was satisfiable Chapter 9 6 Reduction to propositional inference Suppose the KB contains just the following V51 King1G7quoteedyv gt Evilz KingUohn GreedyUohn BrotheMRiChard J ohn Instantiating the universal sentence in all possible ways we have KingUohn GreedyUohn gt Evionhn KingRiChard GreedyRiChard gt EUilltRiChCLTdgt KingUohn GreedyUohn BrotheMRiChard J ohn The new KB is propositionalized proposition symbols are KingUohn GreedyUohn EvilJ0hn KingRiChaTd etc Chapter 9 7 Reduction contd Claim a ground sentence is entailed by new KB iff entailed by original KB Claim every FOL KB can be propositionalized so as to preserve entailment ldea propositionalize KB and query apply resolution return result Problem with function symbols there are infinitely many ground terms eg FatherFatherFatherJohn Theorem Herbrand 1930 If a sentence or is entailed by an FOL KB it is entailed by a nite subset of the propositional KB Idea For n 0 to 00 do create a propositional KB by instantiating with depthn terms see if or is entailed by this KB Problem works if or is entailed loops if or is not entailed Theorem Turing 1936 Church 1936 entailment in FOL is semidecidable Chapter 9 8 Problems with propositionalization Propositionalization seems to generate lots of irrelevant sentences Eg from V51 KingcGreedyv gt Evilz KingUohn Vy Greedyy BrotherRiChard John it seems obvious that Evionhn but propositionalization produces lots of facts such as GreedyRiChard that are irrelevant With 19 kary predicates and n constants there are p nk instantiations With function symbols it gets nuch much worse Chapter 9 9 Uni cation We can get the inference immediately if we can find a substitution 0 such that and Greedy13 match KingUohn and Greedyy 0 13J0hnyJ0hn works UNIFY01 0 if 010 50 p q 0 Knowonhn 13 Knowonhn Jane Knowonhn 13 Kn0w8y OJ Knowonhn 13 Kn0w8y M0thery Knowonhn 13 Kn0w813 OJ Chapter 9 10 Uni cation We can get the inference immediately if we can find a substitution 0 such that and Greedy13 match KingUohn and Greedyy 0 13J0hnyJ0hn works UNIFY01 0 if 010 50 p q 0 Knowonhn 13 Knowonhn Jane 13Jcme Knowonhn 13 Kn0w8y OJ Knowonhn 13 Kn0w8y M0thery Knowonhn 13 Kn0w813 OJ Chapter 9 11 Uni cation We can get the inference immediately if we can find a substitution 0 such that and Greedy13 match KingUohn and Greedyy 0 13J0hnyJ0hn works UNIFY01 0 if 010 50 p q 0 Knowonhn 13 Knowonhn Jane cJane Knowonhn 13 Kn0w8y OJ 13OJyJ0hn Knowonhn 13 Kn0w8y M0thery Knowonhn 13 Kn0w813 OJ Chapter 9 12 Uni cation We can get the inference immediately if we can find a substitution 0 such that and Greedy13 match KingUohn and Greedyy 0 13J0hnyJ0hn works UNIFY01 0 if 010 50 p q 0 Knowonhn 13 Knowonhn Jane cJane Knowonhn 13 Kn0w8y OJ 13OJyJ0hn Knowonhn 13 Kn0w8y M0thery yJohn 13M0the73J0hn Knowonhn 13 Kn0w813 OJ Chapter 9 13 Uni cation We can get the inference immediately if we can find a substitution 0 such that and Greedyv match KingUohn and Greedyy 0 mJohnyJohn works UNIFYa 0 if 10 50 p q 0 Knowonhn 13 l Knowonhn Jane ccJane Knowonhn 13 Kn0w8yOJ mOJyJohn Knowonhn 13 Kn0w8y Mothery yJohn mMotherJohn Knowonhn 13 Kn0w8cOJ fail Standardizing apart eliminates overlap of variables eg Kn0w8217 OJ Chapter 9 14 Generalized Modus Ponens GMP n ngt p17 p27 7 p 191 p2 p Where for a Z q0 pl is KingUohn 191 is pg is Greedyy p2 is Greedyv 0 is cJohnyJ0hn q is Evilz q0 is Evionhn GMP used with KB of definite clauses exactly one positive literal All variables assumed universally quantified Chapter 9 15 Soundness of GMP Need to show that 19131975 p1pn Qlq0 provided that pi 0pZ0 for all 139 Lemma For any definite clause p we have 19 l p0 by Ul 1 p1pngtq Ep1pngtq6p10pn0gtq0 2 pl pn Epl Apn lp1 0pn 0 3 From 1 and 2 q6 follows by ordinary Modus Ponens Chapter 9 16 Example knowledge base The law says that it is a crime for an American to sell weapons to hostile nations The country Nono an enemy of America has some missiles and all of its missiles were sold to it by Colonel West who is American Prove that Col West is a criminal Chapter 9 17 Example knowledge base contd it is a crime for an American to sell weapons to hostile nations Chapter 9 18 Example knowledge base contd it is a crime for an American to sell weapons to hostile nations AmericancAWeap0nySell8c y zH08tilez gt Criminalc Nono has some missiles Chapter 9 19 Example knowledge base contd it is a crime for an American to sell weapons to hostile nations AmericancAWeap0nySell8c y zHostilez gt Criminalc Nono has some missiles ie 3 1 Own8N0n0 13 Missilec Own8N0n0 M1 and MissileM1 all of its missiles were sold to it by Colonel West Chapter 9 20 Example knowledge base contd it is a crime for an American to sell weapons to hostile nations AmericancAWeap0nySell8c y zHostilez gt Criminalc Nono has some missiles ie 3 1 Own8N0n0 13 Missilec Own8N0n0 M1 and MissileM1 all of its missiles were sold to it by Colonel West V51 Missilev Own8N0n0 13 gt Sell8West 13 Nona Missiles are weapons Chapter 9 21 Example knowledge base contd it is a crime for an American to sell weapons to hostile nations AmericancAWeap0nySell8c y zHostilez gt Criminalc Nono has some missiles ie 3 1 Own8N0n0 13 Missilec Own8N0n0 M1 and MissileM1 all of its missiles were sold to it by Colonel West V51 Missilev Own8N0n0 13 gt Sell8West 13 Nona Missiles are weapons Missilev gt Weap0nc An enemy of America counts as hostile Chapter 9 22 Example knowledge base contd it is a crime for an American to sell weapons to hostile nations AmericancAWeap0nySell8c y zHostilez gt Criminalc Nono has some missiles ie 3 1 Own8N0n0 13 Missilec Own8N0n0 M1 and MissileM1 all of its missiles were sold to it by Colonel West V51 Missilev Own8N0n0 13 gt Sell8West 13 Nona Missiles are weapons Missilev gt Weap0nc An enemy of America counts as hostile Enemyv America gt Hostilec West who is American AmericaMWest The country Nono an enemy of America EnemyNon0 America Chapter 9 23 Forward chaining algorithm function FOL FC ASKKB oz returns a substitution or false repeat until new is empty newe for each sentence 7 in KB do 171 19m gt q H STANDARDIZE APARTU for each 6 such that 131 6 131 1990 for some 193 p in KB q H SUBST6 q if q is not a renaming of a sentence already in KB or new then d0 add q to new 9 UNIFYq oz if 15 is not fail then return 15 add new to KB return false Chapter 9 24 Forward chaining proof AmericanWesl Mssile VII Owns VonoMI Enemy VonoAmerica Chapter 9 25 Forward chaining proof I Weapon VII I ISellsWesIMIN0no I IHoinle Vono I AmericanWesl Mssile VII Owns VonoMI I Enemy VonoAmerica I Chapter 9 26 Forward chaining proof Criminal West I Weapon VII I ISellsWesIMIN0no I H0stz39leN0n0 AmericanWesl Mssile VII Owns VonoMI I Enemy VonoAmerica I Chapter 9 27 Properties of forward chaining Sound and complete for firstorder definite clauses proof similar to propositional proof Datalog firstorder definite clauses no functions eg crime KB FC terminates for Datalog in poly iterations at most 19 nk literals May not terminate in general if or is not entailed This is unavoidable entailment with definite clauses is semidecidable Chapter 9 28 Ef ciency of forward chaining Simple observation no need to match a rule on iteration k if a premise wasn t added on iteration k 1 gt match each rule whose premise contains a newly added literal Matching itself can be expensive Database indexing allows 01 retrieval of known facts eg query Missilev retrieves MissileM1 Matching conjunctive premises against known facts is NPhard Forward chaining is widely used in deductive databases Chapter 9 29 Hard matching example 0 CD Dij wa nt Dij wa 8a Dij nt qDz39 nt 8a Dij q nsw Dij q 8a Dij nsw v Dij nsw 8a Dij v m gt Colomble Dij Red Blue Dij Red Green Dij GTeen Red Dij Green Blue Dij Blue Red Dij Blue Green Colomble is inferred iff the CSP has a solution CSPs include 3SAT as a special case hence matching is NPhard Chapter 9 30 Backward chaining algorithm function FOL BC ASKKB goals 0 returns a set of substitutions inputs KB a knowledge base goals a list of conjuncts forming a query 6 already applied 6 the current substitution initially the empty substitution local variables answers a set of substitutions initially empty if goals is empty then return 6 q H SUBSTW FIRSTgoals for each sentence r in KB where STANDARDIZE APARTr 2 p1 and 9 9 UNIFYq q succeeds newgoalse p1 panESTgoalsl answerse FOL BC ASKKB newgoals COMPOSE0 0 U answers return answers Azania Chapter 9 31 Backward chaining example Criminal West Chapter 9 32 lAmericanx I WeaponOz I lSellsxyZ Chapter 9 33 I American West I I WeaponOz I I Sellsxyz Chapter 9 34 I American West I I WeaponOz I ISellsxyZ I MssileOz I Chapter 9 35 x West y VH I American West I I WeaponOz I ISellsxyZ I MssileOz I WW Chapter 9 36 lAmericanWest I WeaponO I lSellsWestM1Z l ZZVOI ZO l MssileOz llAIissileWI l 0wnsN0n0M1 l WW Chapter 9 37 x West y VI ZZVOI ZO lAmericanWest I WeaponO I lSellsWestM1Z l ZZVOI ZO Hostile Nona l MssileOz llAIissileWI l 0wnsN0n0M1 lEnemy VonoAmerica l MW Chapter 9 38 Properties of backward chaining Depthfirst recursive proof search space is linear in size of proof Incomplete due to infinite loops gt fix by checking current goal against every goal on stack Inefficient due to repeated subgoals both success and failure gt fix using caching of previous results extra space Widely used without improvements for logic programming Chapter 9 39 H Logic programming Sound bite computation as inference on logical KBs Logic programming Ordinary programming 1 Identify problem Identify problem 2 Assemble information Assemble information 3 Tea break Figure out solution 4 Encode information in KB Program solution 5 Encode problem instance as facts Encode problem instance as data 6 Ask queries Apply program to data 7 Find false facts Debug procedural errors Should be easier to debug CapitalNewY0rk US than 1 z a 2 Chapter 9 40 Prolog systems Basis backward chaining with Horn clauses bells amp whistles Widely used in Europe Japan basis of 5th Generation project Compilation techniques gt approaching a billion LIPS Program set of clauses head literall literal criminalX americanX weaponY sellsXYZ hostileZ Efficient unification by open coding Efficient retrieval of matching clauses by direct linking Depthfirst lefttoright backward chaining Builtin predicates for arithmetic etc eg X is YZ3 Closedworld assumption negation as failurequot eg given alive X not deadX alivejoe succeeds if deadjoe fails Chapter 9 41 Prolog examples Depthfirst search from a start state X dfsX goalX dfsX successorXS dfsS No need to loop over S successor succeeds for each Appending two lists to produce a third append YY appendXLYXZ appendCLYZ query appendAB 1 2 answers A B 1 2 A 1 B 2 A 1 2 B Chapter 9 42 Resolution brief summary Full firstorder version Elvmvek mlvmn Elvu39V i1V i1Vquot39 ka1quot39ij1ij1 where UNIFYMZ39 mj 0 For example RiChC V Unhappyc Rich m UnhappyKen with 6 cKen Apply resolution steps to CNFKB oz complete for FOL Chapter 9 43 Everyone who loves all animals is loved by someone V51 Vy Animaly gt L0vescy gt By L0vesya Conversion to CNF Eliminate biconditionals and implications Va Vy AmmaMy V Lovesvy V 3y Lovesyv Move inwards ova V51 3y bAmmalw V Lovesc V 3y Lovesy V51 3y AmmaMy Lovesvyl V 3y Lovesy V51 3y Animaly IL01 833y V 3 y Lovesy 231 op ab1319 2V3 op Chapter 9 44 Conversion to CNF contd Standardize variables each quantifier should use a different one Va 3 y Animaly Lovestv y V 3 z Lovesz Skolemize a more general form of existential instantiation Each existential variable is replaced by a Skolem function of the enclosing universally quantified variables Va AnimalFc Lovesm V LovesGv 13 Drop universal quantifiers AnimalFc Lovesm V LovesGc 13 Distribute over V AnimalFc V LovesGc Lovesta V LovesGc Chapter 9 45 INFERENCE IN BAYESIAN NETWORKS AIMA2E CHAPTER 144 5 AIMAQe Chamer 141 5 l II Outline ltgt Exact inference by enumeration ltgt Exact inference by variable elimination ltgt Approximate inference by stochastic simulation ltgt Approximate inference by Markov chain Monte Carlo AIMAQe Chapter 141 5 2 Inference tasks Simple queries compute posterior marginal PXiEe eg PNOGaslGauge empty Lights 0n Starts false Conjunctive queries PXiXjEe PXiEePleXiEe Optimal decisions decision networks include utility information probabilistic inference required for P0utcomeacti0nevidence Value of information which evidence to seek next Sensitivity analysis which probability values are most critical Explanation why do I need a new starter motor AIMAQe Chapter 141 5 3 Inference by enumeration Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network PltBu m Q PlBaja mPjam p1 aPBj m aZeZaPB e aj m 03 Rewrite full joint entries using product of CPT entries 13031 m aZeZaPBPePaB ePjlaPma QPBZGP6ZaPaB ePjaPma Recursive depth first enumeration 0n space 0d time AIMAQe Chapter 141 5 4 Enumeration algorithm function ENUMERATIONASKXe bn returns a distribution over X inputs X the query variable e observed values for variables E bn a Bayesian network with variables X U E U Y QX lt a distribution over X initially empty for each value 7 of Xdo extend e with value 7 for X Qm lt ENUMERATEALLVARsbn e return NORMALIZEQX function ENUMERATE ALLvars e returns a real number if EMPTYvaTS then return 10 Ylt FIRSTvars if Yhas value 3 in e then return Py PaY gtlt ENUMERATE ALLRESTUITSe else return 2y Py PaY gtlt ENUMERATE ALLRESTUaTSey where ey is e extended with Y y AIMAQe Charmer 141 5 5 Evaluation tree Enumeration is inefficient repeated computation eg computes PjaPma for each value of e P y 002 P he 95 P Iaib Ie Ph hd me d 94 06 05 POW PUi Ia POi Ia 90 05 05 P nuy P niuy P niuy 70 01 01 AIMAQe Chamer 141 5 6 Inference by variable elimination Variable elimination carry out summations right to left storing intermediate results factors to avoid recomputation PBl7 m ozPB Beige EBaPMIB e Pja Pma QPltBE Plt ZGPltalB7 PjlafMa QPltBZ Plt ZGPltalB7 fJafMa QPltBZ Plt ZaJCAlta7b7 fJltafMlta aPltBgtEePltegtfAJMltu e sum out1 aplBle JMU sum out E A 00685 gtlt fAJMltb AIMAQe Chamer 141 5 7 Variable elimination Basic operations Summing out a variable from a product of factors move any constant factors outside the summation add up submatrices in pointwise product of remaining factors 21 le kaf1gtlt XfiszHlX gtltkof1gtlt sz xf assuming f1 do not depend on X Pointwise product of factors f1 and f2 f1lt 131 ajy1 gtlt fglty1 ykzl Zl fltx17quot397aj7y1739quotaykazla39Hazl Eg f1ab gtlt fgbc fabc AIMAQe Chapter 141 5 8 Variable elimination algorithm function ELIMINATION ASKXe bn returns a distribution over X inputs X the query variable e evidence specified as an event bn a belief network specifying joint distribution PX13 Xn factors lt vars lt REVERSEVARSbn for each var in vars do factors lt MAKE FACTORMan e fact0rsl if var is a hidden variable then factorse SUM OUTvarfact0Ts return NORMALIZEPOINTWISE PRODUCTfact0TS AIMAQe Chamer 141 5 9 Irrelevant variables e PJb ozPbPePabePJlaPma Sum over m is identically 1 JV is irrelevant to the query CD Consider the query PUOhnCallslBurglary true Thm 1 Y is irrelevant unless Y E AncestorsX UE Here X JohnC39alls EBmquotglary ancl AncestorsX U E Alarm Earthquake so JV is irrelevant Compare this to backward chaining from the query in Horn clause KBs AIMAQe Chamer 141 5 10 Irrelevant variables contd Defn moral graph of Bayes net marry all parents and drop arrows Defn A is m separated from B by C iff separated by C in the moral graph Thm 2 Y is irrelevant if m separatecl from X by E For PJ0hrLCallsAlarm true both a Burglary and Earthquake are irrelevant 0 w AIMAQe Chapter Hal 5 ll Complexity of exact inference Sineg connected networks or polytrees any two nodes are connected by at most one undirected path time and space cost of variable elimination are 0dkn Multiply connected networks can reduce 3SAT to exact inference gt NP hard equivalent to counting 3SAT models gt Pcomplete 1AvaC 2CvauA 3BVCVID AIMAQe Chapter Hal 5 3912 Inference by stochastic simulation Basic idea 1 Draw N samples from a sampling distribution S A 2 Compute an approximate posterior probability P 3 Show this converges to the true probability P Outline Sampling from an empty network Rejection sampling reject samples disagreeing with evidence Likelihood weighting use evidence to weight samples Markov chain Monte Carlo MCMC sample from a stochastic process whose stationary distribution is the true posterior AIMAQe Chapter Hal 5 3913 Sampling from an empty network function PRIOR SAMPLEbn returns an event sampled from bn inputs bn a belief network specifying joint distribution PX13 X lt an event with n elements for i 1 to n do 361 a random sample from PX7 ParentsX7 return X ax AIMAQe Chapter Hal 5 39l 391 C PSC T 10 F 50 Wet Grass PRC 80 20 s RPWSR T T 99 T F 90 F T 90 F F 01 AIMAQe Chamer 11111 5 3915 C PSC T 10 F 50 Wet Grass PRC 80 20 s RPWSR T T 99 T F 90 F T 90 F F 01 AIMAQe Chamer 11111 5 3916 Wet Grass s RPWSR T T 99 T F 90 F T 90 F F 01 AIMAQe Chamer 11111 5 3917 Wet Grass s RPWSR T T 99 T F 90 F T 90 F F 01 AIMAQe Chamer 11111 5 3918 Wet Grass PWSR mmaam manJaw o o AIMAQe Chamer 11111 5 3919 mmaam 11 le AIMAQe Chamer 11111 5 20 mmaam 11 le AIMAQe Chamer 11111 5 239 Sampling from an empty network contd Probability that PRIORSAMPLE generates a particular event Sp3a1 H1PxilparentsXi Pa1 ie the true prior probability Eg 1330 t 1 05 x 09 x 08 x 09 0324 Pt f m Let Np3a1 be the number of samples generated for event 01 azn Then we have J gnoepwbuwxn gnoeNpSQZbHWxQN 3133331 PCB Pa1 That is estimates derived from PRIORSAMPLE are consistent Shorthand 15021 azn Pa1 AIMAQe Chapter Hal 5 22 Rejection sampling 13Xle estimated from samples agreeing with e function REJECTIONSAMPLINGX e bn N returns an estimate of PXe local variables N a vector of counts over X initially zero forjzlto Ndo x lt PR10RSAMPLEbn if X is consistent with e then Nzc lt N1 where a is the value of X in X return NORMALIZENX Eg estimate PRainSprinl lertrue using 100 samples 27 samples have Sprinkler true 01 these 8 have Rain true and 19 have Rain false A PRainlSprinI lertrue NORMALIZElt819gt 02960704 Similar to a basic realworld empirical estimation procedure AIMAQe Chapter Hal 5 23 Analysis of rejection sampling 15Xle oszgX e algorithm defn NPSltX eNpse normalized by Npsle z PX ePe property of PRIORSAMPLE PXe defn of conditional probability Hence rejection sampling returns consistent posterior estimates Problem hopelessly expensive it Pe is small Pe drops off exponentially with number of evidence variables AIMAQe Chapter Hal 5 21 H Likelihood weighting Idea fix evidence variables sample only nonevidence variables and weight each sample by the likelihood it accords the evidence function LIKELIHOOD WEIGHTINGX e bn N returns an estimate of PX e local variables W a vector of weighted counts over X initially zero forjzlto Ndo x W WEIGHTEDSAMPLEbn Wlxl 21 where a is the value of X in X return NORMALIZEWX function WEIGHTED SAMPLEbne returns an event and a weight X lt an event with n elements mi 1 for 239 2 1 to n do if X7 has a value 38 in e then wi w gtlt PX 38 ParentsX else 38 a random sample from PX ParentsX return X 21 AIMAQe Chapter Hal 5 25 PSIC 10 50 PW SR PRlC 80 20 S R T T 99 T F 90 F T 90 F F 01 AIMAQe Chamer 11111 5 26 PSIC 10 50 PW SR C PRC T 80 F 20 S R T T 99 T F 90 F T 90 F F 01 AIMAQe Chamer 11111 5 27 Likelihood weighting example PW SR S R T T 99 T F 90 F T 90 F F 01 AIMAQe Chamer 11111 5 28 w10gtlt01 PW SR mmaam mamaw 99 90 90 01 AIMAQe Chamer 11111 5 29 w10gtlt01 PWSR mmaam mamaw 99 90 90 01 AIMAQe Chamer 11111 5 30 w10gtlt01 AIMAQe Chamer 11111 5 339 w 10 X 01 X 099 0099 AIMAQe Chamer 11111 5 32 H Likelihood weighting analysis Sampling probability for WEIGHTEDSAMPLE is Sta ne nglelleaquln Note pays attention to evidence in ancestors only 8 somewhere quotin betweenquot prior and posterior distribution Weight for a given sample z e is wze Hil mpamm m Weighted sampling probability is Sta3zewze HP22PammsZl HZIPWAPMCMNEJ Pze by standard global semantics of network Hence likelihood weighting returns consistent estimates but performance still degrades with many evidence variables because a few samples have nearly all the total weight l gte GAPmm u 4 5 u Approximate inference using MCMC quotStatequot of network current assignment to all variables Generate next state by sampling one variable given Markov blanket Sample each variable in turn keeping evidence fixed function MCMC ASKXe bn N returns an estimate of PXe local variables a vector of counts over X initially zero Z the nonevidence variables in bn X the current state of the network initially copied from e initialize X with random values for the variables in Y for j 2 1 to N do lel lt Nlcl 1 where 10 is the value of X in X for each Z7 in Z do sample the value of Z7 in X from PZAIBZ given the values of M39BZ in X return NORMALIZENX Can also choose a variable to sample at random each time AIMAQe Chapter 11111 5 31 The Markov chain With Sprinkler ue chfh39ass Wm there are four states Wander about for a whiie average what you see MCMC example contd Estimate PRainlSprin ler true LVetGmss true Sample Cloudy or Rain given its Markov blanket repeat Count number of times Rain is true and false in the samples Eg visit 100 states 31 have Raintrue 69 have Rain false A PRainSprin ler true LVetGrass true NORMALIZE3169gt O31069gt Theorem chain approaches stationary distribution longrun fraction of time spent in each state is exactly proportional to its posterior probability AIMAQe Chapter 11111 5 36 H Markov blanket sampling Markov blanket of Cloudy is Sprinklequot and Rain Markov blanket of Rain is Cloudy Sprinkler and W39etGmss Probability given the Markov blanket is calculated as follows Pr1VBXl PrPmquoteanXlHziecmzdrmgxzPziParent zm Easily implemented in messagepassing parallel systems brains Main computational problems 1 Difficult to tell if convergence has been achieved 2 Can be wasteful if Markov blanket is large PXl1VBXl won39t change much law of large numbers V gte GAPmm u 4 5 a I 971 39171 1630qu git 39171 HEILdVHQ SHHOAALCHN NVISCHAVS Z 971 39171 1630qu suonnqmsgp pazualawemd ltgt sanuewag ltgt xelu g ltgt aumno Bayesian networks A simple graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax a set of nodes one per variable a directed acyclic graph link directly influencesquot a conditional distribution for each node given its parents PXilPCL7 TLt8XZ gtgt In the simplest case conditional distribution represented as a conditional probability table CPT giving the distribution over X2 for each combination of parent values Chapter 14143 3 Example Topology of network encodes conditional independence assertions Toothache Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity Chapter 14143 4 Example I m at work neighbor John calls to say my alarm is ringing but neighbor Mary doesn t call Sometimes it s set off by minor earthquakes Is there a burglar Variables Burglar Earthquake Alarm JohnCalls MaryCalls Network topology reflects causal knowledge A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call Chapter 14143 5 9 971 39171 1630qu IO39 1 9039 1 0L39 l 0639 l VIWM V VIDJ 10039 6 17639 9639 EI EIIVM H ELLFLL m F me 10039 01 z00 EDJ MeISJna exeanuea ll 39pquoo aIdwmxg Compactness A CPT for Boolean X2 with k Boolean parents has 2 rows for the combinations of parent values Q Q Each row requires one number p for X2 true g the number for X2 false is just 1 p G W If each variable has no more than k parents the complete network requires 0n 2 numbers le grows linearly with H vs 02 for the full joint distribution For burglary net 11 4 2 2 10 numbers vs 25 1 31 Chapter 14143 7 II Global semantics Global semantics defines the full joint distribution as the product of the local conditional distributions P11 13 Hi1Pmilparent8XZ eg Pj m a b e Q9 0360 Chapter 14143 8 II Global semantics Global semantics defines the full joint distribution as the product of the local conditional distributions Q Q P91 9 Hi1Pmilparent8XZ egPjma b e g E PjlaPmlaPalaba a PabPa 09 X 07 X 0001 X 0999 x 0998 000063 22 Chapter 14143 9 II Local semantics Local semantics each node is conditionally independent of its nondescendants given its parents Theorem Local semantics ltgt global semantics Chapter 14143 10 II Markov blanket Each node is conditionally independent of all others given its Markov blanket parents children children s parents Chapter 14143 11 Constructing Bayesian networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics 1 Choose an ordering of variables X1 Xn 2 For i 1 to n add Xi to the network select parents from X1 Xi1 such that Xi1 This choice of parents guarantees the global semantics PX1 Xn Hi1PXZlX1 Xi1 chain rule H1PXilPCLT TLt8Xigt by construction Chapter 14143 12 61 971 39171 1910 qu i d MPH g g V T W guungo aql asooqo 9M asoddng H 611dequ T71 971 39171 1610qu WM W rh d JPNH W rh d 0N i d mm g g V T W guungo aql asooqo 9M asoddng aIdwmxg Example Suppose we choose the ordering M J A B E Burglary PJM PJ No PAJ M PAJ PAJ M PA No PBA J M PBA PBA J M PB Chapter 14143 15 Example Suppose we choose the ordering M J A B E Earthquake JM PJ No P PAJ M PAJ PAJ M PA No PBA J M PBA Yes PBA J M PB No PEB A J M PEA PEB A J M PEA B Chapter 14143 16 Example Suppose we choose the ordering M J A B E Earthquake JM PJ No P PAJ M PAJ PAJ M PA No PBA J M PBA Yes PBA J M PB No PEB A J M PEA No PEB A J M PEA B Yes Chapter 14143 17 Example contd Burglary Earthquake Deciding conditional independence is hard in noncausal directions Causal models and conditional independence seem hardwired for humans Assessing conditional probabilities is hard in noncausal directions Network is less compact 1 2 4 2 4 13 numbers needed Chapter 14143 18 Example Car diagnosis Initial evidence car won t start Testable variables green broken so fix itquot variables orange Hidden variables gray ensure sparse structure reduce parameters Chapter 14143 19 Example Car insurance V w 7 edicalCost 4 iabilityCost 139 ropertyCos C hapter Compact conditional distributions CPT grows exponentially with number of parents CPT becomes infinite with continuousvalued parent or child Solution canonical distributions that are defined compactly Deterministic nodes are the simplest case X fParent8X for some function f Eg Boolean functions N orthAmerican ltgt Canadian V U S V M emican Eg numerical relationships among continuous variables Level 6t inflow precipitation outflow evaporation Chapter 14143 21 Compact conditional distributions contd NoisyOR distributions model multiple noninteracting causes 1 Parents U1 Uk include all causes can add leak node 2 Independent failure probability qz for each cause alone PXlU1Uj jilaUk1 H 1qi Cold Flu Malaria PFever PbFever 00 10 09 01 08 02 098 002 02 X 01 04 06 094 006 06 X 01 088 012 06 X 02 0988 0012 06 X 02 X 01 l l l l393939393939l3939l l l39393939l l l39393939l l3939 l3939 l3939 l3939l Number of parameters linear in number of parents Chapter 14143 22 Hybrid discrete l continuous networks Discrete Subsidy and Buys continuous Harvest and Cost Subsidy w Buys Option 1 discretization possibly large errors large CPTs Option 2 finitely parameterized canonical families 1 Continuous variable discretecontinuous parents eg Cost 2 Discrete variable continuous parents eg Buys Chapter 14143 23 II Continuous child variables Need one conditional density function for child variable given continuous parents for each possible assignment to discrete parents Most common is the linear Gaussian model eg PCost elHarvest h Subsidy true NUth bt O39tC l C 53 Mean Cost varies linearly with Harvest variance is fixed Linear variation is unreasonable over the full range but works OK if the likely range of Harvest is narrow Chapter 14143 Continuous child variables 6 f M k R e W 1g 9939 f Z g 7 6 K Allcontinuous network with LG distributions gt full joint distribution is a multivariate Gaussian Discretecontinuous LG network is a conditional Gaussian network Le a multivariate Gaussian over all continuous variables for each combination of discrete variable values Chapter 14143 25 Discrete variable w continuous parents Probability of Bugs given Cost should be a soft threshold 1 i l l i i 08 4 c 06 A falselCost 04 4 PBuys 02 4 Cost 0 Probit distribution uses integral of Gaussian 1333 KOO N0 11d1 PBuy8 true l Cost 0 C13 C uo Chapter 14143 26 Why the probit 1 It s sort of the right shape 2 Can view as hard threshold whose location is subject to noise H e Chapter 14143 27 II Discrete variable contd Sigmoid or logit distribution also used in neural networks 1 PBuy8true l Costc W 6ch JO Sigmoid has similar shape to probit but much longer tails 1 l l i i 09 A 08 4 07 4 06 4 05 4 04 4 03 A 02 A 01 4 falsel Coch PBuys Chapter 14143 28 I z 1630qu z HEILdVHQ SLNHDV LNCHDI I ICHLNI II Reminders Assignment 0 lisp refresher due 128 LispemacsAIMA tutorial 111 today and Monday 271 Soda Chapter 2 Outline H ltgt Agents and environments ltgt Rationality ltgt PEAS Performance measure Environment Actuators Sensors ltgt Environment types ltgt Agent types Chapter 2 3 Agents and environments sensors pe rce pts actions actuators Agents include humans robots softbots thermostats etc The agent function maps from percept histories to actions f 73 gt A The agent program runs on the physical architecture to produce f Chapter 2 4 9 z 1630qu dOON Wis MECLH 197 rsuonvv mung V quotS39s slualuoa pue uoneaol ISldSDJSd O o O o Sogo 8090 51 V pIJOAA JBUBBIDIHHI IDBA A vacuumcleaner agent Percept sequence Action A Clean A Dirty B Clean B Dirty A Clean A Clean A Clean A Dirty l ll ll ll ll ll l Right Sack Left Sack Right Sack if status 2 Dirty then return Suck else if locatlon A then return nght else if locatlon B then return Left What is the right function Can it be implemented in a small agent program function REFLEX VACUUM AGENT locatlonstatusD returns an action Chapter 2 6 Rationality Fixed performance measure evaluates the environment sequence one point per square cleaned up in time T one point per clean square per time step minus one per move penalize for gt k dirty squares A rational agent chooses whichever action maximizes the expected value of the performance measure given the percept sequence to date Rational y omniscient percepts may not supply all relevant information Rational y clairvoyant action outcomes may not be as expected Hence rational y successful Rational gt exploration learning autonomy Chapter 2 7 PEAS To design a rational agent we must specify the task environment Consider eg the task of designing an automated taxi Performance measure Environment Actuators Sensors Chapter 2 8 PEAS To design a rational agent we must specify the task environment Consider eg the task of designing an automated taxi Performance measure safety destination profits legality comfort Environment US streetsfreeways traffic pedestrians weather Actuators steering accelerator brake horn speakerdisplay Sensors video accelerometers gauges engine sensors keyboard GPS Chapter 2 9 Internet shopping agent Performance measure Environment Actuators Sensors Chapter 2 10 Internet shopping agent Performance measure price quality appropriateness efficiency Environment current and future WWW sites vendors shippers Actuators display to user follow URL fill in form Sensors HTML pages text graphics scripts Chapter 2 1 1 Environment types Solitaire Backgammon Internetshopping Taxi Observable Deterministic Episodic Static Discrete Singleagent Chapter 2 12 Environment types Solitaire Backgammon Internetshopping Taxi Observable Deterministic Episodic Static Discrete Singleagent Yes Yes No No Chapter 2 13 Environment types Solitaire Backgammon Internetshopping Taxi Observable Deterministic Episodic Static Discrete Singleagent Yes Yes No No Yes No Partly No Chapter 2 14 Environment types Solitaire Backgammon Internetshopping Taxi Observable Deterministic Episodic Static Discrete Singleagent No Partly No Yes No No Yes Yes No No No No Chapter 2 15 Environment types Solitaire Backgammon Internet shopping Taxi Observable Deterministic Episodic Static Discrete Singleagent Yes Yes No Yes Yes No No Semi No Partly No Semi No No No No Chapter 2 16 Environment types Solitaire Backgammon Internetshopping Taxi Observable Yes Yes No No Deterministic Yes No Partly No Episodic No No No No Static Yes Semi Semi No Discrete Yes Yes Yes No Singleagent Chapter 2 17 Environment types Solitaire Backgammon Internet shopping Taxi Observable Yes Yes No No Deterministic Yes No Partly No Episodic No No No No Static Yes Semi Semi No Discrete Yes Yes Yes No Singleagent Yes No Yes except auctions No The environment type largely determines the agent design The real world is of course partially observable stochastic sequential dynamic continuous multiagent Chapter 2 18 Agent types Four basic types in order of increasing generality simple reflex agents reflex agents with state goalbased agents utilitybased agents All these can be turned into learning agents Chapter 2 19 Simple re ex agents rAgent N SenSsors What the world is like now What action Condition action rules Should do now L Actuators 1uewuoiAuE Chapter 2 20 Example function REFLEX VACUUM AGENT locattomstatusD returns an action if status 2 Dirty then return Suck else if location A then return Right else if location B then return Left set 39oe make a ent name quotoe bod make a ent bod q J 8 J Y 8 Y program make reflex vacuum agent program defun make reflex vacuum agent program lambda percept let location first percept status second percept cond eq status dirty Suck eq location A Right eq location B Left Chapter 2 2 1 Re ex agents with state 39 x i f Sensors How the world evolves What the world What my actions do is like now What action Conditionaction rules should do now Actuators Agent L 1uewuoiAua Chapter 2 22 Example function REFLEX VACUUM AGENT locatiomstatusD returns an action static lastlA lastiB numbers initially 00 if status 2 Dirty then defun make reflex vacuum agent with state program let last A infinity last B infinity lambda percept let location first percept status second percept incf last A incf last B cond eq status dirty if eq location A setq last A O setq last B 0 Suck eq location A if gt last B 3 Right NoOp eq location B if gt last A 3 Left NoOp Chapter 2 23 Goalbased agents What my actions do What the world How the world evolves Em AV Sensors What it will be like if I do action A What action I Agent L should do now Actuators 1uewuoIiAua Chapter 2 24 Utilitybased agents What my actions do Utility What the world How the world evolves Em Agent L x f Sensors What it will be like if I do action A How happy I will be in such a state What action I should do now 1uewuoIiAua Actuators Chapter 2 25 Learning agents Performance standard I V Critic lt Sensors feedback V changes V Learning Performance element element knowledge learning goals U Problem generator quot Actuators Qgent 1uewuouua Chapter 2 26 I 91 1630qu 91 HEILdVHQ SNOISIOCHG IVNOLLVH Outline ltgtltgtltgtltgtltgtltgt Rational preferences Utilities Money Multiattribute utilities Decision networks Value of information Chapter 16 2 Preferences An agent chooses among prizes A B etc and lotteries ie situations with uncertain prizes Lottery L 1 A 1 p B 1 P Notation A gt B A N B A i B A preferred to B indifference between A and B B not preferred to A Chapter 16 3 Rational preferences Idea preferences of a rational agent must obey constraints Rational preferences gt behavior describable as maximization of expected utility Constraints Orderability AgtBVBgtAVAB Transitivity AgtBBgtC gt AgtC Continuity AgtBgtC gt Elp pA1 pC39B Substitutability AN B gt 1 1970 N p70l Monotonicity A gt B gt gt q ltgt 1 p7Bl 3 Chapter 16 4 Rational preferences contd Violating the constraints leads to selfevident irrationality For example an agent with intransitive preferences can be induced to give away all its money If B gt C then an agent who has 0 39A would pay say 1 cent to get B 1c 1c If A gt B then an agent who has B V would pay say 1 cent to get A B C If C gt A then an agent who has A I would pay say 1 cent to get 0 6 Chapter 16 5 Maximizing expected utility Theorem Ramsey 1931 von Neumann and Morgenstern 1944 Given preferences satisfying the constraints there exists a realvalued function U such that UA 2 UB ltgt A z B Ulp1Si pmSnl ZipiUSi MEU principle Choose the action that maximizes expected utility Note an agent can be entirely rational consistent with MEU without ever representing or manipulating utilities and probabilities Eg a lookup table for perfect tictactoe Chapter 16 6 II Utilities Utilities map states to real numbers Which numbers Standard approach to assessment of human utilities compare a given state A to a standard lottery Lp that has best possible prizequot UT with probability p worst possible catastrophequot ui with probability 1 p adjust lottery probability p until A N Lp continue as before pay 30 L instant death Chapter 16 7 Utility scales Normalized utilities UT 10 UL 00 Micromorts onemillionth chance of death useful for Russian roulette paying to reduce product risks etc QALYs qualityadjusted life years useful for medical decisions involving substantial risk Note behavior is invariant wrt ve linear transformation U 1 k1U1 M where k1gt 0 With deterministic prizes only no lottery choices only ordinal utility can be determined ie total order on prizes Chapter 16 8 Money Money does not behave as a utility function Given a lottery L with expected monetary value EMVL usually UL lt UEMVL ie people are riskaverse Utility curve for what probability p am I indifferent between a prize 1 and a lottery 19 M 1 p 0 for large M Typical empirical data extrapolated with riskprone behavior 0 0 W6 I I 150000 800000 Chapter 16 9 Student group utility For each 13 adjust 1 until half the class votes for lottery M10000 p H 10 09 08 07 06 05 04 03 02 01 00 39x 0 5001000 2000 3000 4000 5000 6000 7000 8000 9000 1000 Chapter 16 10 II Decision networks Add action nodes and utility nodes to belief networks to enable rational decision making Airport Site Litigation 39 Algorithm For each value of action node compute expected value of utility node given action evidence Return MEU action Chapter 16 1 1 H Multiattribute utility How can we handle utility functions of many variables X1 X Eg what is UDeath8 Noise Cost How can complex utility functions be assessed from preference behaviour Idea 1 identify conditions under which decisions can be made without com plete identification of Uv1 13 Idea 2 identify various types of independence in preferences and derive consequent canonical forms for U11 13 Chapter 16 12 II Strict dominance Typically define attributes such that U is monotonic in each Strict dominance choice B strictly dominates choice A iff Vi XAB 2 XAA and hence UB 2 UA X2 This region X2 1 dominates A 0 39D X1 X1 Deterministic attributes Uncertain attributes Strict dominance seldom holds in practice Chapter 16 13 II Stochastic dominance 08 e 06 4 s1 4 s2 444444 a Probability O 0 Probability s14 s2 444444 a a 04 e 02 e i 4 quot1 i i i wxs i 0 i i l i i i i i i 6 55 5 45 4 35 3 25 2 6 55 5 45 4 35 3 25 2 Negative cost Negative cost Distribution p1 stochastically dominates distribution p2 iff t t W mpld 3 oo WWW If U is monotonic in 13 then A1 with outcome distribution p1 stochastically dominates A2 with outcome distribution pg 00 00 L00 P1UIvd Z OOP2Ud Multiattribute case stochastic dominance on all attributes gt optimal Chapter 16 14 II Stochastic dominance contd Stochastic dominance can often be determined without exact distributions using qualitative reasoning Eg construction cost increases with distance from city 51 is closer to the city than Sg gt 51 stochastically dominates 2 on cost Eg injury increases with collision speed Can annotate belief networks with stochastic dominance information X i Y X positively influences Y means that For every value z of Y s other parents Z V331 512 511 2 512 gt PYlc1 z stochastically dominates PYlc2 2 Chapter 16 15 Label the arcs or GoodStuden 4 391 MakeMo del 4 w edicalCost 4 iabilityCost 139 ropertyCos Label the arcs or 4 391 MakeMo del 4 M edicalcost c iabilityCost 139 ropertyCos Label the arcs or 4 391 MakeMo del 4 M edicalcost c iabilityCost 139 ropertyCos Label the arcs or 4 391 MakeMo del 4 M edicalcost c iabilityCost 139 ropertyCos Label the arcs or 4 391 MakeMo del 4 M edicalcost c iabilityCost 139 ropertyCos Label the arcs or 4 391 MakeMo del 4 a edicalcost c iabilityCost 139 ropertyCos II Preference structure Deterministic X1 and X2 preferentially independent of X3 iff preference between 511132133gt and lt331332333gt does not depend on 133 Eg Noise Cost Safetygt 20000 suffer 46 billion 006 deathsmpmgt vs 70000 suffer 42 billion 006 deathsmpmgt Theorem Leontief 1947 if every pair of attributes is Pl of its com plement then every subset of attributes is Pl of its complement mutual P Theorem Debreu 1960 mutual P gt 3 additive value function WS ZZWXASD Hence assess n singleattribute functions often a good approximation Chapter 16 22 II Preference structure Stochastic Need to consider preferences over lotteries X is utilityindependent of Y iff preferences over lotteries in X do not depend on y Mutual Ul each subset is Ul of its complement gt El multiplicative utility function U k1U1 kQUQ ngg klnglUQ kgnggUg kgklUgUl l klkgnglUgUg Routine procedures and software packages for generating preference tests to identify various canonical families of utility functions Chapter 16 23 II Value of information Idea compute value of acquiring each possible piece of evidence Can be done directly from decision network Example buying oil drilling rights Two blocks A and B exactly one has oil worth k Prior probabilities 05 each mutually exclusive Current price of each block is k2 Consultant offers accurate survey of A Fair price Solution compute expected value of information expected value of best action given the information minus expected value of best action without information Survey may say oil in Aquot or no oil in Aquot prob 05 each given 05 X value of buy Aquot given oil in Aquot 05 X value of buy Bquot given no oil in Aquot 0 05 X k2 05 X k2 0 k2 Chapter 16 24 II General formula Current evidence E current best action oz Possible action outcomes Si potential new evidence Ej EUozlE mgxzi US PSilE a Suppose we knew Ej Bjk then we would choose ozejk st EUOz jklE Ej Bjk mgxzz 0 Ej Bjk E7 is a random variable whose value is currently unknown gt must compute expected gain over all possible values VP1EEj 2k PEj ejklEEUozejklE Ej 67 EUaiE VPI value of perfect information Chapter 16 25 Properties of VPI Nonnegative in expectation not post hoc Vj E VPEEj 2 0 Nonadditive consider eg obtaining Ej twice VPEEj Ek y VPEEj VPEEk Orderindependent VPEEj Ek VPIEEj VPIE7EjEk VPEEk VPIE7EkEj Note when more than one piece of evidence can be gathered maximizing VPI for each to select one is not always optimal gt evidencegathering becomes a sequential decision problem Chapter 16 26 LEARNING FROM OBSERVATIONS CHAPTER 18 SECTIONS 173 Chapter 18 Sections 173 1 II Outline ltgt Learning agents ltgt Inductive learning ltgt Decision tree learning ltgt Measuring learning performance Chapter 18 Sections 173 2 Learning Learning is essential for unknown environments ie when designer lacks omniscience Learning is useful as a system construction method ie expose the agent to reality rather than trying to write it down Learning modifies the agent s decision mechanisms to improve performance Chapter 18 Sections 173 3 Learning agents Performance standard 3 Qgent JuewuouAua V Critic Sensors feedback v changes Learning Performance element element knowledge learning goals W Problem experiments generator Effectors Chapter 18 Sections 173 4 Learning element Design of learning element is dictated by ltgt what type of performance element is used O which functional component is to be learned ltgt how that functional compoent is represented ltgt what kind of feedback is available Example scenarios Performance element Component Representation Feedback Alphabeta search Eval fn Weighted linear function Winloss Logical agent Transition model Successorstate axioms Outcome Utilitybased agent Transition model Dynamic Bayes net Outcome Simple reflex agent Perceptaction fn Neural net Correct action Supervised learning correct answers for each instance Reinforcement learning occasional rewards Chapter 18 Sections 173 5 Inductive learning aka Science Simplest form learn a function from examples tabula rasa f is the target function 0 0 An example is a pair 13 eg X 1 X Problem find an hypothesis h such that h m f given a training set of examples This is a highly simpli ed model of real learning Ignores prior knowledge Assumes a deterministic observable environment Assumes examples are given Assumes that the agent wants to learn f why Chapter 18 Sections 173 6 Inductive learning method Constructadjust h to agree with f on training set h is consistent if it agrees with f on all examples Eg curve fitting x M Chapter 18 Sections 173 7 Inductive learning method Constructadjust h to agree with f on training set h is consistent if it agrees with f on all examples Eg curve fitting f x M Chapter 18 Sections 173 8 Inductive learning method Constructadjust h to agree with f on training set h is consistent if it agrees with f on all examples Eg curve fitting f x M Chapter 18 Sections 173 9 Inductive learning method Constructadjust h to agree with f on training set h is consistent if it agrees with f on all examples Eg curve fitting f x M Chapter 18 Sections 173 10 Inductive learning method Constructadjust h to agree with f on training set h is consistent if it agrees with f on all examples Eg curve fitting f x Chapter 18 Sections 173 11 Inductive learning method Constructadjust h to agree with f on training set h is consistent if it agrees with f on all examples Eg curve fitting f x Ockham s razor maximize a combination of consistency and simplicity Chapter 18 Sections 173 12 Attributebased representations Examples described by attribute values Boolean discrete continuous etc Eg situations where l willwon t wait for a table Example Attributes Target Alt Bar Fri Hun Pat Price Ram Res Type Est WilWait X1 T F F T Some F T French 0 10 T X2 T F F T Fu F F Thai 30 60 F X3 F T F F Some 7 F F Burger 0 10 T X4 T F T T Fu F F Thai 10 30 T X 5 T F T F Fu F T French gt 60 F X6 F T F T Some T T Italian 0 10 T X7 F T F F None 7 T F Burger 0 10 F X8 F F F T Some T T Thai 0 10 T X9 F T T F Fu T F Burger gt60 F X10 T T T T Fu F T Italian 10 30 F X11 F F F F None F F Thai 0 10 F X12 T T T T Fu 7 F F Burger 30 60 T Classification of examples is positive T or negative F Chapter 18 Sections 173 Decision trees One possible representation for hypotheses Eg here is the true tree for deciding whether to wait Patrons 306 Alternate No Yes 010 Reservation FriSat Nol Yes Chapter 18 Sections 173 14 Expressiveness Decision trees can express any function of the input attributes Eg for Boolean functions truth table row gt path to leaf A B AxorB F F F F T T T F T T T F Trivially there is a consistent decision tree for any training set w one path to leaf for each example unless f nondeterministic in 13 but it probably won t generalize to new examples Prefer to find more compact decision trees Chapter 18 Sections 173 15 Hypothesis spaces How many distinct decision trees with n Boolean attributes Chapter 18 Sections 173 16 Hypothesis spaces How many distinct decision trees with n Boolean attributes number of Boolean functions Chapter 18 Sections 173 17 Hypothesis spaces How many distinct decision trees with n Boolean attributes number of Boolean functions number of distinct truth tables with 2 rows Chapter 18 Sections 173 18 Hypothesis spaces How many distinct decision trees with n Boolean attributes number of Boolean functions number of distinct truth tables with 2 rows 22 Chapter 18 Sections 173 19 Hypothesis spaces How many distinct decision trees with n Boolean attributes number of Boolean functions number of distinct truth tables with 2 rows 22 Eg with 6 Boolean attributes there are 18446744073709551616 trees Chapter 18 Sections 173 20 Hypothesis spaces How many distinct decision trees with n Boolean attributes number of Boolean functions number of distinct truth tables with 2 rows 22 Eg with 6 Boolean attributes there are 18446744073709551616 trees How many purely conjunctive hypotheses eg Hungry Ram Chapter 18 Sections 173 21 Hypothesis spaces How many distinct decision trees with n Boolean attributes number of Boolean functions number of distinct truth tables with 2 rows 22 Eg with 6 Boolean attributes there are 18446744073709551616 trees How many purely conjunctive hypotheses eg Hungry Ram Each attribute can be in positive in negative or out gt 3 distinct conjunctive hypotheses More expressive hypothesis space increases chance that target function can be expressed increases number of hypotheses consistent w training set gt may get worse predictions Chapter 18 Sections 173 22 Decision tree learning Aim find a small tree consistent with the training examples ldea recursively choose most significantquot attribute as root of subtree function DTLexamples attributes default returns a decision tree if examples is empty then return default else if all examples have the same classification then return the classification else if attributes is empty then return MODEexamples else beste CHOOSE ATTRIBUTEattributes examples tree 9 a new decision tree with root test best for each value u of best do examples 9 elements of examples with best vi subtree H DTLexamples attributes best MODEexamples add a branch to tree with label u and subtree subtree return tree Chapter 18 Sections 173 23 Choosing an attribute Idea a good attribute splits the examples into subsets that are ideally all positivequot or all negativequot None Some Full French Italian Thai Burger Patrons is a better choice gives information about the classification Chapter 18 Sections 173 24 H Information Information answers questions The more clueless I am about the answer initially the more information is contained in the answer Scale 1 bit answer to Boolean question with prior 0505gt Information in an answer when prior is P1 Pngt is 21 also called entropy of the prior Chapter 18 Sections 173 25 H Information contd Suppose we have p positive and n negative examples at the root gt Hltppnnpngt bits needed to classify a new example Eg for 12 restaurant examples pn6 so we need 1 bit An attribute splits the examples E into subsets Ei each of which we hope needs less information to complete the classification Let E have pl positive and m negative examples gt HpZpim nipinigt bits needed to classify a new example gt expected number of bits per example over all branches is 22 Humm m illpi mgt For Patrons this is 0459 bits for Type this is still 1 bit gt choose the attribute that minimizes the remaining information needed Chapter 18 Sections 173 26 Example contd Decision tree learned from the 12 examples Patrons Burger Substantially simpler than true tree a more complex hypothesis isn t jus tified by small amount of data Chapter 18 Sections 173 27 Performance measurement How do we know that h m f Hume s Problem of Induction 1 Use theorems of computationalstatistica learning theory 2 Try h on a new test set of examples use same distribution over example space as training set Learning curve correct on test set as a function of training set size 1 900 IOOG correct on test set 900 4010 0 10 20 30 40 50 60 70 80 90100 Training set size Chapter 18 Sections 173 28 Performance measurement contd Learning curve depends on realizable can express target function vs nonrealizable nonrealizability can be due to missing attributes or restricted hypothesis class eg thresholded linear function redundant expressiveness eg loads of irrelevant attributes correct realizable redundant nonrealizable it of examples Chapter 18 Sections 173 29 g 2 cwudclm muau prahlcm Classi cation practice problems Solutions m rst dm muan prahlcm lshum y scpmhic you an draw a line so can on 51d and dlof te WWWquot mama ms Dcdswu ms and boasrmg mm ammi stomps wot1a work he Maximum Manhood would work my mu 7 you could m1 Ma Gamau m each dz IMMsmakcuo sewn may makii um um Tic mud dm cahaupwhlcm ismch SVMsaxc as obvious dim m mspwhlcmc u m mama W ions 3 a polar rcprmufzhou ns prahlcm i VISION CHAPTER 24 Chapter 24 1 ltgtltgtltgtltgtltgt Perception generally Image formation Early vision 2D gt 3D Object recognition Outline Chapter 24 2 Perception generally Stimulus percept S World W SmW Eg g graphics Can we do vision as inverse graphics 1V9 Chapter 24 3 Perception generally Stimulus percept S World W S 9W Eg g graphics Can we do vision as inverse graphics W 945 Problem massive ambiguity Chapter 24 4 ll Perception generally Stimulus percept S World W S 9W Eg g graphics Can we do vision as inverse graphics W 945 Problem massive ambiguity Chapter 24 5 ll Perception generally Stimulus percept S World W S 9W Eg g graphics Can we do vision as inverse graphics W 945 Problem massive ambiguity Chapter 24 6 Better approaches Bayesian inference of world configurations PWlS oz PSlW PW quotgraphicsquot prior knowledgequot Better still no need to recover exact scene Just extract information needed for navigation manipulation recognitionidentification Chapter 24 7 Vision subsystems scenes behaviors V acking data assoc39at39on HIGH LEVEL objects 39 39 VISION objec sf contour recognition depth map sf sf stereo motion edges disparity optical flow regions edg hing sf detection quot h d LOW LEVEL features 3 a mg VISION segmentation filters image sequence Vision requires combining multiple cues Chapter 24 8 Image formation image Y plane P quotquotquot39 quotquotquotquotquotquot39 39 39 39 39 39 39 39 39 39 39 39 39 i P lt gt f P is a point in the scene with coordinates X Y Z P is its image on the image plane with coordinates 13y z 7 fX 7 fY Z y i Z by similar triangles Scaledistance is indeterminate 33 Chapter 24 9 Chapter 24 10 Images contd Icyt is the intensity at 5134 at time t CCD camera 1000000 pixels human eyes 240000000 pixels ie 025 terabitssec Chapter 24 1 1 II Color Vision Intensity varies with frequency gt infinitedimensional signal intensity sunsitivity Human eye has three types of colorsensitive cells each integrates the signal gt 3element vector intensity Chapter 24 12 Edge detection Edges in image lt discontinuities in scene 1 depth 2 surface orientation 3 reflectance surface markings 4 illumination shadows etc Chapter 2 as Edge detection contd 1 Convolve image with spatially oriented filters possibly multi scale E9v7y f9u7v1v u7y vdudv 2 Label above threshold pixels with edge orientation 3 Infer quotcleanquot line segments by combining edge pixels with same orientation Wm Chapter 24 14 Cues from prior knowledge Shape from Assumes mo on gm bodkscon nuousrno on stereo solid contiguous nonrepeating bodies texture Lu fornitexture shading un ornire ectance contour minimum curvature Chapter 24 15 Motion chm m Left im age Stereo Perceived object Right im age Chapter 24 17 Stereo depth resolution Left eye P V P Right R Simple geometry 6Z Z260 b Physiology 60 Z 242 X 10 5 radians b6cm Z30cm gt SZ 004mm Z30m gt 6Z x 40cm Large baseline gt better resolution Chapter 24 18 Texture ldea assume actual texture is uniform compute surface shape that would produce this distortion Similar idea works for shading assume uniform reflectance etc but interrei lections give nonlocal computation of perceived intensity ollows seem shallower than they really are cum u m Edge and vertex types Si 0 l f Assume world of solid polyhedral objects with trihedral vertices Vertex edge labels VSWSY VWY VVY V3V1WS TT Vertex edge labelling example 1 3 VV Vi VtTY T CSP variables edges constraints possible node configurations V y Chapter 24 22 Object recognition Simple idea extract 3D shapes from image match against shape libraryquot Problems extracting curved surfaces from image representing shape of extracted object representing shape and variability of library object classes improper segmentation occlusion unknown illumination shadows markings noise complexity etc Approaches index into library by measuring invariant properties of objects alignment of image feature with projected library object feature match image against multiple stored views aspects of library object machine learning methods based on image statistics Chapter 24 23 Handwritten digit recognition 433 f 9 3 H E i 3quot Sixfj Q 3nearestneighbor 24 error 400 300 10 unit MLP 16 error LeNet 768 192 30 10 unit MLP 09 error Chapter 24 24 Shapecontext matching Basic idea convert shape a relational concept into a fixed set of attributes using the spatial context of each of a fixed set of points on the surface of the shape Chapter 24 25 Shapecontext matching contd Each point is described by its local context histogram number of points falling into each logpolar grid bin log r log r log r Chapter 24 26 Shapecontext matching contd Determine total distance between shapes by sum of distances for correspond ing points under best matching Simple nearestneighbor learning gives 063 error rate on NIST digit data Chapter 24 27 RATIONAL DECISIONS CHAPTER 16 Chapter 16 1 ltgtltgtltgtltgtltgtltgt Outline Rational preferences Utilities Money Multiattribute utilities Decision networks Value of information Chapter 16 2 Preferences An agent chooses among prizes A B etc and lotteries ie situations with uncertain prizes Lottery L 19 A 1 p B 1 P Notation A gt B A preferred to B A N B indifference between A and B A E B B not preferred to A Chapter 16 3 Rational preferences Idea preferences of a rational agent must obey constraints Rational preferences gt behavior describable as maximization of expected utility Constraints Orderability AgtBVBgtAVAB Transitivity AgtBBgtC gt AgtC Continuity AgtBgtC gt Elp pA1 pC39B Substitutability AN B gt 1 1970 N Monotonicity Chapter 16 4 Rational preferences contd Violating the constraints leads to selfevident irrationality For example an agent with intransitive preferences can be induced to give away all its money If B gt C then an agent who has 0 39A would pay say 1 cent to get B 10 10 If A gt B then an agent who has B V would pay say 1 cent to get A B C If C gt A then an agent who has A I would pay say 1 cent to get 0 Chapter 16 5 Maximizing expected utility Theorem Ramsey 1931 von Neumann and Morgenstern 1944 Given preferences satisfying the constraints there exists a realvalued function U such that UA 2 UB gt A z B Ulp1Si pmSnl ZiPiUSi MEU principle Choose the action that maximizes expected utility Note an agent can be entirely rational consistent with MEU without ever representing or manipulating utilities and probabilities Eg a lookup table for perfect tictactoe Chapter 16 6 II Utilities Utilities map states to real numbers Which numbers Standard approach to assessment of human utilities compare a given state A to a standard lottery L that has best possible prizequot UT with probability p worst possible catastrophequot ui with probability 1 p adjust lottery probability p until A N L continue as before pay 30 L instant death Chapter 16 7 Utility scales Normalized utilities UT 10 ui 00 Micromorts onemillionth chance of death useful for Russian roulette paying to reduce product risks etc QALYs qualityadjusted life years useful for medical decisions involving substantial risk Note behavior is invariant wrt ve linear transformation U 1 k1U1 k2 where k1gt 0 With deterministic prizes only no lottery choices only ordinal utility can be determined ie total order on prizes Chapter 16 8 Money Money does not behave as a utility function Given a lottery L with expected monetary value EMVL usually UL lt UEMVL ie people are riskaverse Utility curve for what probability p am I indifferent between a prize 1 and a lottery 19 MM 1 p 0 for large M Typical empirical data extrapolated with riskprone behavior U l I 7 150000 800000 Chapter 16 9 Student group utility For each 13 adjust 19 until half the class votes for lottery M10000 p H 10 09 08 07 06 05 04 03 02 01 00 0 5001000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Chapter 16 10 Decision networks Add action nodes and utility nodes to belief networks to enable rational decision making Airport Site Litigation Algorithm For each value of action node compute expected value of utility node given action evidence Return MEU action Chapter 16 1 1 H Multiattribute utility How can we handle utility functions of many variables X1Xn Eg what is UDeath8 Noise Cost How can complex utility functions be assessed from preference behaviour Idea 1 identify conditions under which decisions can be made without com plete identification of U11 13 Idea 2 identify various types of independence in preferences and derive consequent canonical forms for U11 13 Chapter 16 12 II Strict dominance Typically define attributes such that U is monotonic in each Strict dominance choice B strictly dominates choice A iff Vi XiB Z XAA and hence UB Z UA X2 This region X2 I dominates A 39 le 39 G X1 X1 Deterministic attributes Uncertain attributes Strict dominance seldom holds in practice Chapter 16 13 II Stochastic dominance 12 1 1 39 08 08 e g g 06 398 06 a 398 398 398 H H 04 e 3 04 a 3 02 7 02 t 0 i x 4 i i i W i 0 i 7 f i i i i i i 6 55 5 45 4 35 3 25 2 6 55 5 45 4 35 3 25 2 Negative cost Negative cost Distribution p1 stochastically dominates distribution p2 iff Vt oopmdm foop2lttgtdt If U is monotonic in 13 then A1 with outcome distribution p1 stochastically dominates A2 with outcome distribution p2 f P133U33d33 2 000 P2Ud Multiattribute case stochastic dominance on all attributes gt optimal Chapter 16 14 II Stochastic dominance contd Stochastic dominance can often be determined without exact distributions using qualitative reasoning Eg construction cost increases with distance from city 31 is closer to the city than 2 gt 31 stochastically dominates 2 on cost Eg injury increases with collision speed Can annotate belief networks with stochastic dominance information X i Y X positively influences Y means that For every value z of Y s other parents Z V331 512 5131 2 132 gt PYl11 z stochastically dominates PYlc2 2 Chapter 16 15 Preference structure Deterministic X1 and X2 preferentially independent of X3 iff preference between 511132133gt and at1335569 does not depend on 5133 Eg Noise Cost Safetygt 20000 suffer 46 billion 006 deathsmpmgt vs 70000 suffer 42 billion 006 deathsmpmgt Theorem Leontief 1947 if every pair of attributes is Pl of its com plement then every subset of attributes is Pl of its complement mutual P Theorem Debreu 1960 mutual P gt 3 additive value function WS EliZXASD Hence assess n singleattribute functions often a good approximation Chapter 16 22 Preference structure Stochastic Need to consider preferences over lotteries X is utilityindependent of Y iff preferences over lotteries in X do not depend on y Mutual Ul each subset is Ul of its complement gt El multiplicative utility function U k1U1 kQUQ ngg l klnglUg kgnggUg kgklUgUl l klkgnglUgUg Routine procedures and software packages for generating preference tests to identify various canonical families of utility functions Chapter 16 23 II Value of information Idea compute value of acquiring each possible piece of evidence Can be done directly from decision network Example buying oil drilling rights Two blocks A and B exactly one has oil worth k Prior probabilities 05 each mutually exclusive Current price of each block is k2 Consultant offers accurate survey of A Fair price Solution compute expected value of information expected value of best action given the information minus expected value of best action without information Survey may say oil in Aquot or no oil in Aquot prob 05 each given 05 X value of buy Aquot given oil in Aquot 05 X value of buy Bquot given no oil in Aquot O 05 X k2 05 X k2 0 k2 Chapter 16 24 II General formula Current evidence E current best action oz Possible action outcomes Si potential new evidence Ej EUozlE mgxzi US PSilEa Suppose we knew Ej ejk then we would choose ozejk st EUOzejklE Ej ij H1ng CL Ej 6 E7 is a random variable whose value is currently unknown gt must compute expected gain over all possible values VP1EEj 2k PEj ejklEEUozejklE Ej 6 EUaiE VPI value of perfect information Chapter 16 25 Properties of VPI Nonnegative in expectation not post hoc Vj E VPIEEj Z 0 Nonadditive consider eg obtaining Ej twice VPIEEj Ek 7 VPEEj VPIEEk Orderindependent VPEEj Ek VPEEj VPIE7EjEk VPIEEk VPIE7EkEj Note when more than one piece of evidence can be gathered maximizing VPI for each to select one is not always optimal gt evidencegathering becomes a sequential decision problem Chapter 16 26 LEARNING FROM OBSERVATIONS CHAPTER 18 SECTIONS 174 Chapter 18 Sections 174 1 II Outline ltgt Learning agents ltgt Inductive learning ltgt Decision tree learning Next lecture covers neural networks Chapter 18 Sections 174 2 Learning Learning is essential for unknown environments ie when designer lacks omniscience Learning is useful as a system construction method ie expose the agent to reality rather than trying to write it down Learning modifies the agent s decision mechanisms to improve performance Chapter 18 Sections 174 3 Learning agents Performance standard f v Critic Sensors feedback U changes J Learning element V Performance element knowledge learning goals U Problem generator Qgent experiments 1uewuonAua Effectors Chapter 18 Sections 174 4 Learning element Design of learning element is dictated by ltgt what type of performance element is used O which functional component is to be learned ltgt how that functional compoent is represented ltgt what kind of feedback is available Example scenarios Performance element Component Representation Feedback Alphabeta search Eval fn Weighted linear function Winloss Logical agent Transition model Successorstate axioms Outcome Utilitybased agent Transition model Dynamic Bayes net Outcome Simple reflex agent Perceptaction fn Neural net Correct action Supervised learning correct answers for each instance Reinforcement learning occasional rewards Chapter 18 Sections 174 5 Inductive learning aka Science Simplest form learn a function from examples tabula rasa f is the target function 0 0 An example is a pair 13 eg X 1 X Problem find an hypothesis h such that h m f given a training set of examples This is a highly simpli ed model of real learning Ignores prior knowledge Assumes a deterministic observable environment Assumes examples are given Assumes that the agent wants to learn f why Chapter 18 Sections 174 6 Inductive learning method Constructadjust h to agree with f on training set h is consistent if it agrees with f on all examples Eg curve fitting f x M Chapter 18 Sections 174 7 Inductive learning method Constructadjust h to agree with f on training set h is consistent if it agrees with f on all examples Eg curve fitting f x M Chapter 18 Sections 174 8 Inductive learning method Constructadjust h to agree with f on training set h is consistent if it agrees with f on all examples Eg curve fitting f x M Chapter 18 Sections 174 9 Inductive learning method Constructadjust h to agree with f on training set h is consistent if it agrees with f on all examples Eg curve fitting f x M Chapter 18 Sections 174 10 Inductive learning method Constructadjust h to agree with f on training set h is consistent if it agrees with f on all examples Eg curve fitting f x Chapter 18 Sections 174 11 Inductive learning method Constructadjust h to agree with f on training set h is consistent if it agrees with f on all examples Eg curve fitting f x Ockham s razor maximize a combination of consistency and simplicity Chapter 18 Sections 174 12 Attributebased representations Examples described by attribute values Boolean discrete continuous etc Eg situations where I willwon t wait for a table Example Attributes Target Alt Bar Fri H un Pat Price Ram Res Type Est WilWait X1 T F F T Some F T French 0 1 0 T X 2 T F F T Fu F F Thai 30 60 F X 3 F T F F Some 7 F F Burger 0 1 0 T X 4 T F T T Fu F F Thai 10 30 T X 5 T F T F Fu F T French gt 60 F X 6 F T F T Some T T Italian 0 1 0 T X 7 F T F F None 7 T F Burger 0 1 0 F X8 F F F T Some T T Thai 0 10 T X 9 F T T F Ful T F Burger gt 60 F X10 T T T T Fu F T Italian 10 30 F X11 F F F F None F F Thai 0 10 F X12 T T T T Fu 7 F F Burger 30 60 T Classification of examples is positive T or negative F Chapter 18 Sections 174 Decision trees One possible representation for hypotheses Eg here is the true tree for deciding whether to wait Patrons Reservation FriSat Nol Yes Chapter 18 Sections 174 14 Expressiveness Decision trees can express any function of the input attributes Eg for Boolean functions truth table row gt path to leaf A B AxorB F F F F T T T F T T T F Trivially 3 a consistent decision tree for any training set w one path to leaf for each example unless f nondeterministic in 13 but it probably won t generalize to new examples Prefer to find more compact decision trees Chapter 18 Sections 174 15 Hypothesis spaces How many distinct decision trees with n Boolean attributes Chapter 18 Sections 174 16 Hypothesis spaces How many distinct decision trees with n Boolean attributes number of Boolean functions Chapter 18 Sections 174 17 Hypothesis spaces How many distinct decision trees with n Boolean attributes number of Boolean functions number of distinct truth tables with 2 rows Chapter 18 Sections 174 18 Hypothesis spaces How many distinct decision trees with n Boolean attributes number of Boolean functions number of distinct truth tables with 2 rows 22 Chapter 18 Sections 174 19 Hypothesis spaces How many distinct decision trees with n Boolean attributes number of Boolean functions number of distinct truth tables with 2 rows 22 Eg with 6 Boolean attributes there are 18446744073709551616 trees Chapter 18 Sections 174 20 Hypothesis spaces How many distinct decision trees with n Boolean attributes number of Boolean functions number of distinct truth tables with 2 rows 22 Eg with 6 Boolean attributes there are 18446744073709551616 trees How many purely conjunctive hypotheses eg Hungry Ram Chapter 18 Sections 174 21 Hypothesis spaces How many distinct decision trees with n Boolean attributes number of Boolean functions number of distinct truth tables with 2 rows 22 Eg with 6 Boolean attributes there are 18446744073709551616 trees How many purely conjunctive hypotheses eg Hungry Ram Each attribute can be in positive in negative or out gt 3 distinct conjunctive hypotheses More expressive hypothesis space increases chance that target function can be expressed increases number of hypotheses consistent w training set gt may get worse predictions Chapter 18 Sections 174 22 ll Decision tree learning Aim find a small tree consistent with the training examples ldea recursively choose most significantquot attribute as root of subtree function DTLexamples attributes default returns a decision tree if examples is empty then return default else if all examples have the same classification then return the classification else if attributes is empty then return MODEexamples else best 9 CHOOSE ATTRIBUTEattributes examples treeH a new decision tree with root test best for each value by of best do examples 9 elements of examples with best vi subtree H DTLexamples attributes best MODEexamples add a branch to tree with label 11239 and subtree subtree return tree Chapter 18 Sections 174 23 Choosing an attribute Idea a good attribute splits the examples into subsets that are ideally all positivequot or all negativequot O O O O O O O O O O O 0 None Some Full French Italian Thai Burger O O O O O O O O O O O O O O O O O O O O O O O O Patrons is a better choice gives information about the classification Chapter 18 Sections 174 24 H Information Information answers questions The more clueless I am about the answer initially the more information is contained in the answer Scale 1 bit answer to Boolean question with prior 0505gt Information in an answer when prior is P1 Pngt is 21 also called entropy of the prior Chapter 18 Sections 174 25 H Information contd Suppose we have 19 positive and n negative examples at the root gt Hppn bits needed to classify a new example Eg for 12 restaurant examples pn6 so we need 1 bit An attribute splits the examples E into subsets Ei each of which we hope needs less information to complete the classification Let E have pl positive and m negative examples gt HpZp m bits needed to classify a new example gt expected number of bits per example over all branches is 2i Hum192 m mm m For Patr0n8 this is 0459 bits for Type this is still 1 bit gt choose the attribute that minimizes the remaining information needed Chapter 18 Sections 174 26 Example contd Decision tree learned from the 12 examples Patrons Burger Substantially simpler than true tree a more complex hypothesis isn t jus tified by small amount of data Chapter 18 Sections 174 27 Performance measurement How do we know that h m f Hume s Problem of Induction 1 Use theorems of computationalstatistica learning theory 2 Try h on a new test set of examples use same distribution over example space as training set Learning curve correct on test set as a function of training set size correct on test set 1 09 e 07 e i 06 e 05 v 04 08 i J 0 20 40 60 80 Training set size 100 Chapter 18 Sections 174 28 Performance measurement contd Learning curve depends on realizable can express target function vs nonrealizable nonrealizability can be due to missing attributes or restricted hypothesis class eg thresholded linear function redundant expressiveness eg loads of irrelevant attributes correct realizable redundant nonrealizable it of examples Chapter 18 Sections 174 29 PLANNING CHAPTER 11 Ch Elmer 1 l 1 II Outline ltgt Search vs planning ltgt STRIPS operators ltgt Partial order planning Ch Elmer 1 l 2 Search vs planning Consider the task get milk bananas and a cordless drill Standard search algorithms seem to fail miserably Talk to Parrot D Go To Pet Store Buy a Dog Go To School Go To Class E Go To Supermarket Buy Tuna Fish Go To Sleep Afterthe fact heuristic goal test inadequate Search vs planning contd Planning systems do the following 1 open up action and goal representation to allow selection 2 divideandconquer by subgoaling 3 relax requirement for sequential construction of solutions l Search l Planning States Lisp data structures Logical sentences Actions Lisp code Preconditionsoutcomes Goal Lisp code Logical sentence conjunction Plan Sequence from so Constraints on actions China 11 4 STRIPS operators Tidin arranged actions descriptions restricted language ACTION Buyx PRECONDITION Atp Sellsp 12 Am SeSpX EFFECT Havex Buyx Note this abstracts away many important detailsl Havew Restricted language gt efficient algorithm Precondition conjunction of positive literals Effect conjunction of literals A complete set of STRIPS operators can be translated into a set of successorstate axioms Ch timer 1 l 5 Partially ordered plans Partially ordered collection of steps with Start step has the initial state description as its effect Finish step has the goal description as its precondition causal links from outcome of one step to precondition of another temporal ordering between pairs of steps Open condition precondition of a step not yet causally linked A plan is complete iff every precondition is achieved A precondition is achieved iff it is the effect of an earlier step and no possibly intervening step undoes it Ch apler 1 l 6 Example AtHome SellsHWSDriI SellsSMMik SellsSMBan Ha ve Milk AtHome HaveBan HaveDriI Ch Elmer 1 l Example AtHome SellsHWSDriI SellsSMMik SeIsSMBan AtHWS SellsHWSDriI Atx AtSM SellsSM Milk N I HaveMik AtHome HaveBan HaveDriI Ch Elmer 1 l 8 39 AtSM SellsSMMik Ats Sells SMBan I AtSM GoHome HaveMik AtHome HaveBan HaveDriI Ch Elmer 1 l 9 Planning process Operators on partial plans add a link from an existing action to an open condition add a step to fulfill an open condition order one step wrt another to remove possible conflicts Gradually move from incompletevague plans to complete correct plans Backtrack if an open condition is unachievable or if a conflict is unresolvable Chapter 11 10 POP algorithm sketch function POPUnlllal goal operators returns plan plan lt MAKEMINIMALPLANlnltlal goal 100p do if SOLUTIONplan then return plan 5mm 2 lt SELECT SUBGOAL plan CHOOSE OPERATOR plan operators Sum c RESOLVE THREATS plan end function SELECT SUBGOAL plan returns 5mm3 3 pick a plan step Snsed from STEPSplan with a precondition c that has not been achieved return 5mm 6 Chamer H H POP algorithm contd procedure CHOOSE OPERATORplan operators Snead 0 choose a step Sadd from operators or STEPSplan that has 0 as an effect if there is no such step then fail add the causal link Sadd 3 Smed to LINKSplan add the ordering constraint Sadd lt Snead to ORDERINGSplan if Sadd is a newly added step from operators then add Sadd to STEPSplan add Start lt Sadd lt Finish to ORDERINGSplan procedure RESOLVE THREATSplan for each Sweat that threatens a link S L Sj in LINKSplan do choose either Demotion Add 3mm lt Si to ORDERINGSplan Promotion Add 33 lt Sweat to ORDERINGS plan if not CONSISTENT plan then fail end Chapter 11 12 A clobberer is a potentially intervening step that destroys the condition achieved by a causal link Eg GoHome clobbers AtySupe7quotmarket Clobbering and promotion demotion l DEMOTION Demotion put before GoSupermarket l GoSupermarket I l GoHome AtHome AtSupermarket 39 Promotion put after BqulIiZk i PROMOTION AtIH me Charth H 13 Properties of POP Nondeterministic algorithm backtracks at choice points on failure choice of Sadd to achieve Snead choice of demotion or promotion for clobberer selection of Snead is irrevocable POP is sound complete and systematic no repetition Extensions for disjunction universals negation conditionals Can be made efficient with good heuristics derived from problem description Particularly good for problems with many loosely related subgoals Chapter 11 M Example Blocks world quotSussman anomalyquot problem C E A I H owgt Start State Goal State ClearX Onxz Cleary Clearx Onxz PutOnxy PutOnTabIeX Onxz Ceary OnXz Clearz OnXTabe Clearz Onx y several inequality constraints Chamer H 15 Example contd START OnCA OnA Tabe 05 OnB Tabe 00 OnAB OnB C FINISH NEE Chamer H 16 Example contd START E OnCA OnA Tabe 05 OnB Tabe 00 05 OnBz 010 PutOnBC OnAB OnB C FINISH NEE Chamer H 17 Example contd START E 1 OnCA OnA Tabe CIB OnB Tabe CC 39 PutOnAB clobbers CIB gt order after PutOnBC 39 39 CPB O BZ C CA OnAz CIB PutOnAB quot OnAB On C g Chamer H 18 LANGUAGE CHAPTER 22 Chapter 22 1 II Outline ltgt Communication ltgt Grammar ltgt Syntactic analysis ltgt Problems Chapter 22 II Communication Classical view pre 1953 language consists of sentences that are truefalse cf logic Modern view post1953 language is a form of action Wittgenstein 1953 Philosophical Investigations Austin 1962 How to Do Things with Words Searle 1969 Speech Acts Why Chapter 22 3 II Communication Classical view pre 1953 language consists of sentences that are truefalse cf logic Modern view post1953 language is a form of action Wittgenstein 1953 Philosophical Investigations Austin 1962 How to Do Things with Words Searle 1969 Speech Acts Why Chapter 22 4 II Communication Classical view pre 1953 language consists of sentences that are truefalse cf logic Modern view post1953 language is a form of action Wittgenstein 1953 Philosophical Investigations Austin 1962 How to Do Things with Words Searle 1969 Speech Acts Why Chapter 22 5 II Communication Classical view pre 1953 language consists of sentences that are truefalse cf logic Modern view post1953 language is a form of action Wittgenstein 1953 Philosophical Investigations Austin 1962 How to Do Things with Words Searle 1969 Speech Acts Why To change the actions of other agents Chapter 22 6 Speech acts SITUATION Speaker gt Utterance gt Hearer Speech acts achieve the speaker s goals Inform There s a pit in front of youquot Query Can you see the goldquot Command Pick it upquot Promise I ll share the gold with youquot Acknowledge OK Speech act planning requires knowledge of Situation Semantic and syntactic conventions Hearer s goals knowledge base and rationality Chapter 22 7 Stages in communication informing Intention S wants to inform H that P Generation 5 selects words W to express P Synthesis 5 utters words W Perception H perceives W Analysis H infers possible meanings P1 Pm Disambiguation H infers intended meaning Pi Incorporation H incorporates B into KB How could this go wrong Chapter 22 8 Stages in communication informing Intention S wants to inform H that P Generation 5 selects words W to express P Synthesis 5 utters words W Perception H perceives W Analysis H infers possible meanings P1 Pm Disambiguation H infers intended meaning Pi Incorporation H incorporates B into KB How could this go wrong lnsincerity 5 doesn t believe P Speech wreck ignition failure Ambiguous utterance Differing understanding of current situation Chapter 22 9 II Grammar Vervet monkeys antelopes etc use isolated symbols for sentences gt restricted set of communicable propositions no generative capacity Chomsky 1957 Syntactic Structures Grammar specifies the compositional structure of complex messages eg speech linear text linear music twodimensional A formal language is a set of strings of terminal symbols Each string in the language can be analyzedgenerated by the grammar The grammar is a set of rewrite rules eg S gtNP VP Article gtthel al anl Here S is the sentence symbol NP and VP are nonterminals Chapter 22 10 Grammar types Regular nonterminal gt terminal nonterminall S gtaS S gt Contextfree nonterminal gt anything S gt aSb Contextsensitive more nonterminals on righthand side ASB gt AAaBB Recursively enumerable no constraints Related to Post systems and Kleene systems of rewrite rules Natural languages probably contextfree parsable in real time Chapter 22 1 1 Noun Vei b Adjective Advei b Pronoun N Line Article Preposition Conjunction Digit gt gt gt gt gt a Wumpus lexicon stenchl breezel glitte39rl nothing wumpusl pitl pitsl goldl eas isl seel smelll shootl feell stinks gol grabl car39ryl killl turn rightl leftl eastl southl back smelly herel therel nearbyl ahead rightl leftl eastl southl bacl mel youl II it J0hn Mary Bost0n UCB PAJC thel al an t0 0n near andl orl bu 0 123456789 Divided into closed and open classes Chapter 22 12 Noun Vei b Adjective Advei b Pronoun N Line Article Preposition Conjunction Digit gt gt gt gt gt a Wumpus lexicon stenchl breezel glitte39rl nothing wumpusl pitl pitsl goldl eas isl seel smelll shootl feell stinks gol grabl car39ryl killl turn rightl leftl eastl southl back smelly herel therel nearbyl ahead rightl leftl eastl southl bacl mel youl I itl SHE Y ALL J0hn Mary Bost0n UCB PAJC thel al an t0 0n near andl orl bu 0 123456789 Divided into closed and open classes Chapter 22 13 NP VP PP RelCluuse Wumpus grammar NP VP S Conjunction S Pronoun Noun Article Noun Digit Digit NP PP NP RelCluuse Verb VP NP VP Adjectiue VP PP VP Aduei b Proposition NP that VP l feel a breeze I feel a breeze and I smell a wumpus pits the wumpus 3 4 the wumpus to the east the wumpus that is smelly stinks feel a breeze is smelly turn to the east go ahead to the east that is smelly Chapter 22 Grammaticality judgements Formal language L1 may differ from natural language L2 L1 L2 false false positives negatives Adjusting L1 to agree with L2 is a learning problem the gold grab the wumpus smell the wumpus the gold I give the wumpus the gold I donate the wumpus the gold lntersubjective agreement somewhat reliable independent of semantics Real grammars 10 500 pages insufficient even for proper English Chapter 22 15 Parse trees Exhibit the grammatical structure of a sentence I shoot the wumpus Chapter 22 16 Parse trees Exhibit the grammatical structure of a sentence Pronoun Verb Article Noun I shoot the wumpus Chapter 22 17 Parse trees Exhibit the grammatical structure of a sentence NP VP NP Pronoun Verb Article Noun I shoot the wumpus Chapter 22 18 Parse trees Exhibit the grammatical structure of a sentence VP NP VP P Pronoun Verb Article Noun I shoot the wumpus Chapter 22 19 Parse trees Exhibit the grammatical structure of a sentence S NP VP P Pronoun Verb Article Noun I shoot the wumpus Chapter 22 20 Syntax in NLP Most view syntactic structure as an essential step towards meaning Mary hit Johnquot y John hit Maryquot And since I was not informed as a matter of fact since I did not know that there were excess funds until we ourselves in that checkup after the whole thing blew up and that was if you ll remember that was the incident in which the attorney general came to me and told me that he had seen a memo that indicated that there were no more fundsquot Chapter 22 2 1 Syntax in NLP Most view syntactic structure as an essential step towards meaning Mary hit Johnquot y John hit Maryquot And since I was not informed as a matter of fact since I did not know that there were excess funds until we ourselves in that checkup after the whole thing blew up and that was if you ll remember that was the incident in which the attorney general came to me and told me that he had seen a memo that indicated that there were no more fundsquot Wouldn t the sentence I want to put a hyphen between the words Fish and And and And and Chips in my FishAndChips sign have been clearer if quotation marks had been placed before Fish and between Fish and and and and and And and And and and and and and And and And and and and and and Chips as well as after Chipsquot Chapter 22 22 ll Contextfree parsing Bottomup parsing works by replacing any substring that matches RHS of a rule with the rule s LHS Efficient algorithms eg chart parsing Ch 23 0n3 for contextfree run at several thousand wordssec for real grammars Contextfree parsing E Boolean matrix multiplication Lee 2002 gt unlikely to find faster practical algorithms Chapter 22 23 Logical grammars BNF notation for grammars too restrictive dif cult to add side conditionsquot number agreement etc dif cult to connect syntax to semantics Idea express grammar rules as logic X gt YZ becomes Y81 Z82 gt XAppend8182 X gt word becomes X w039rd X gt Y i Z becomes Y8 gt X8 Z8 gt X8 Here X8 means that string 3 can be interpreted as an X Chapter 22 24 Logical grammars contd Now it s easy to augment the rules NP81 EatsBreakfa8tRef81 VP82 gt NPAppend81 who 82 NP81 Numbe7 81n VP82 Number82 n gt SAppend8182 Parsing is reduced to logical inference 51113911 am an Can add extra arguments to return the parse structure semantics Generation simply requires a query with uninstantiated variables ASKKB If we add arguments to nonterminals to construct sentence semantics NLP generation can be done from a given logical sentence ASKKB 333 AtR0b0t 1 Chapter 22 25 Real language Real human languages provide many problems for NLP ltgt ltgt ltgt ltgt ltgt ltgt ltgt ltgt ambiguity anaphora indexicality vagueness noncompositionality discourse structure metonymy metaphor Chapter 22 26 Ambiguity Squad helps dog bite victim Chapter 22 27 Ambiguity Squad helps dog bite victim Helicopter powered by human flies Chapter 22 28 Ambiguity Squad helps dog bite victim Helicopter powered by human flies American pushes bottle up Germans Chapter 22 29 Ambiguity Squad helps dog bite victim Helicopter powered by human flies American pushes bottle up Germans I ate spaghetti with meatballs Chapter 22 30 Ambiguity Squad helps dog bite victim Helicopter powered by human flies American pushes bottle up Germans I ate spaghetti with meatballs salad Chapter 22 Ambiguity Squad helps dog bite victim Helicopter powered by human flies American pushes bottle up Germans I ate spaghetti with meatballs salad abandon Chapter 22 Ambiguity Squad helps dog bite victim Helicopter powered by human flies American pushes bottle up Germans I ate spaghetti with meatballs salad abandon a fork Chapter 22 Ambiguity Squad helps dog bite victim Helicopter powered by human flies American pushes bottle up Germans I ate spaghetti with meatballs salad abandon a fork a friend Chapter 22 Ambiguity Squad helps dog bite victim Helicopter powered by human flies American pushes bottle up Germans I ate spaghetti with meatballs salad abandon a fork a friend Ambiguity can be lexical polysemy syntactic semantic referential Chapter 22 35 Indexicality ndexica sentences refer to utterance situation place time SH etc I am over here Why did you do that Chapter 22 36 Anaphora Using pronouns to refer back to entities already introduced in the text After Mary proposed to John they found a preacher and got married Chapter 22 37 Anaphora Using pronouns to refer back to entities already introduced in the text After Mary proposed to John they found a preacher and got married For the honeymoon they went to Hawaii Chapter 22 38 Anaphora Using pronouns to refer back to entities already introduced in the text After Mary proposed to John they found a preacher and got married For the honeymoon they went to Hawaii Mary saw a ring through the window and asked John for it Chapter 22 39 Anaphora Using pronouns to refer back to entities already introduced in the text After Mary proposed to John they found a preacher and got married For the honeymoon they went to Hawaii Mary saw a ring through the window and asked John for it Mary threw a rock at the window and broke it Chapter 22 40 Metonymy Using one noun phrase to stand for another I ve read Shakespeare Chrysler announded record profits The ham sandwich on Table 4 wants another beer Chapter 22 41 Metaphor Nonliteralquot usage of words and phrases often systematic I ve tried killing the process but it won t die ts parent keeps it alive Chapter 22 42 ll Noncompositionality basketball shoes Chapter 22 43 ll Noncompositionality basketball shoes baby shoes Chapter 22 44 basketball shoes baby shoes alligator shoes Noncompositionality Chapter 22 45 basketball shoes baby shoes alligator shoes designer shoes Noncompositionality Chapter 22 46 Noncompositionality basketball shoes baby shoes alligator shoes designer shoes brake shoes Chapter 22 47 Noncompositionality basketball shoes baby shoes alligator shoes designer shoes brake shoes red book Chapter 22 48 Noncompositionality basketball shoes baby shoes alligator shoes designer shoes brake shoes red book red pen Chapter 22 49 Noncompositionality basketbaH shoes baby shoes alligator shoes de gnershoes brake shoes red book red pen red hair Chapter 22 50 Noncompositionality basketball shoes baby shoes alligator shoes designer shoes brake shoes red book red pen red hair red herring Chapter 22 51 Noncompositionality basketball shoes baby shoes alligator shoes designer shoes brake shoes red book red pen red hair red herring small moon Chapter 22 52 Noncompositionality basketball shoes baby shoes alligator shoes designer shoes brake shoes red book red pen red hair red herring small moon large molecule Chapter 22 53 Noncompositionality basketball shoes baby shoes alligator shoes designer shoes brake shoes red book red pen red hair red herring small moon large molecule mere child Chapter 22 54 Noncompositionality basketball shoes baby shoes alligator shoes designer shoes brake shoes red book red pen red hair red herring small moon large molecule mere child alleged murderer Chapter 22 55 Noncompositionality basketball shoes baby shoes alligator shoes designer shoes brake shoes red book red pen red hair red herring small moon large molecule mere child alleged murderer real leather Chapter 22 56 NEURAL NETWORKS CHAPTER 20 SECTION 5 Chapter 20 Section 5 1 Outline ltgt Brains ltgt Neural networks ltgt Perceptrons ltgt Multilayer perceptrons ltgt Applications of neural networks Chapter 20 Section 5 Brains 1011 neurons of gt 20 types 1014 synapses lms lOms cycle time Signals are noisy spike trainsquot of electrical potential Axonal arborization Axon from another cell Synapse Dendrite Synapses Cell body or Soma Chapter 20 Section 5 3 McCulloch Pitts unit Output is a squashed linear function of the inputs CLZ39 lt g EjoJaj 1 Bias Weight a0 W01 ai 80771 g Input Input Activation Output Links Function Function output Links A gross oversimplification of real neurons but its purpose is to develop understanding of what networks of simple units can do Chapter 20 Section 5 4 Activation functions 1 1 a b a is a step function or threshold function b is a sigmoid function 111L 6quot Changing the bias weight WW moves the threshold location go Chapter 20 Section 5 5 AND McCulloch and Pitts OR every Boolean function NOT can be implemented Chapter 20 Section 5 6 Network structures Feedforward networks singlelayer perceptrons multilayer perceptrons Feedforward networks implement functions have no internal state Recurrent networks Hopfield networks have symmetric weights gsigna ai i 1 holographic associative memory Boltzmann machines use stochastic activation functions m MCMC in Bayes nets recurrent neural nets have directed cycles with delays gt have internal state like flipflops can oscillate etc Chapter 20 Section 5 7 Feed forward example 1 W13 W W14 3 5 2 24 Feedforward network a parameterized family of nonlinear functions a5 9W35 39 a3 W45 39 a4 9W375399W1339a1 W23 39 a2 W45 399W1439a1 W24 39 a2 Adjusting weights changes the function do learning this way Chapter 20 Section 5 8 Singlelayer perceptrons Perceptron output Input Output W Umts 1 Umts Output units all operate separately no shared weights Adjusting weights moves the location orientation and steepness of cliff Chapter 20 Section 5 9 Expressiveness of perceptrons Consider a perceptron with g step function Rosenblatt 1957 1960 Can represent AND OR NOT majority etc but not XOR Represents a linear separator in input space Ejomjgt0 or WXgt0 x1 x1 1 O 1 O 0 0 0 1 x2 0 1 x2 a x1 and x2 b x1 or x2 c x1 xor x2 Minsky amp Papert 1969 pricked the neural network balloon Chapter 20 Section 5 10 H Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input X and true output y is 1 1 E 2 4y hWX2 2 2 Perform optimization search by gradient descent 9E Err 9 n 7Egtlt E x7 ZW 91 7 7 7 7 y g jig 73 E7 7 gtlt g m gtlt Jij Simple weight update rule Wj lt Wj oz gtlt Err gtlt g m gtlt Jij Eg ve error gt increase network output gt increase weights on ve inputs decrease on ve inputs Chapter 20 Section 5 11 Perceptron learning contd Perceptron learning rule converges to a consistent function for any linearly separable data set 43 1 43 1 at U j MQC 2 09 2 09 awrquotW w C C 3 3 08 t 3 08 f 8 rquotquot o quotquotquotquotquott 8 t 07 quotquot e 07 j 8 8 C 06 Perceptron C 06 9 Decision tree 9 E 05 I E 05 Perceptron g 04 I g 04 I Decision tree 0 0 10 20 30 40 50 60 70 80 90100 0 0 10 20 30 40 50 60 70 80 90100 Training set size MAJORITY on 11 inputs Training set size RESTAURANT data Perceptron learns majority function easily DTL is hopeless DTL learns restaurant function easily perceptron cannot represent it Chapter 20 Section 5 12 Chapter 20 Section 5 13 Input units Q r Hidden units Output units numbers of hidden units typically chosen by hand Layers are usually fully connected Multilayer perceptrons Expressiveness of MLPS All continuous functions w 2 layers all functions w 3 layers hWx1 x2 hWx1 x2 Combine two oppositefacing threshold functions to make a ridge Combine two perpendicular ridges to make a bump Add bumps of various sizes and locations to fit any surface Proof requires exponentially many hidden units cf DTL proof Chapter 20 Section 5 14 Backpropagation learning Output layer same as for singlelayer perceptron WM lt WM oz gtlt aj gtlt Ai where Ai E77 gtlt g mi Hidden layer backpropagate the error from the output layer Aj glam ngJ39AZ Update rule for weights in hidden layer Wm lt Wk7j0z gtlt ak gtlt Aj Most neuroscientists deny that backpropagation occurs in the brain Chapter 20 Section 5 15 Backpropagation derivation The squared error on a single example is defined as 1 E izlyi 027 where the sum is over the nodes in the output layer 9E ai 99239m 6W Jz39 WW Jz39 ail 6W 92m 7 9 ltyi aigtg39ltznigt ltyi a gtgWjzaj Jz39 allymuch 0in Chapter 20 Section 5 16 ai9mz39 Tj ijiajgt Aim2932 A Wj7iagj Ain7iginjaa 39j Ain7iginjWam WWW ZAin7iginjak CLkAj Chapter 20 Section 5 17 Backpropagation learning contd At each epoch sum gradient updates for all examples and apply Training curve for 100 restaurant examples finds exact fit AAA 4 ONPO JOOON Total error on training set 0 50 100 150 200 250 300 350 400 Number of epochs Typical problems slow convergence local minima Chapter 20 Section 5 18 Backpropagation learning contd Learning curve for MLP with 4 hidden units 1 09 o o 1 0 v o O I 0 c I I39 o39 w 08 07 06 Decision tree Multilayer network rtion correct on test set 3 05 0 10 20 30 40 50 60 70 80 90100 Training set size RESTAURANT data MLPs are quite good for complex pattern recognition tasks but resulting hypotheses cannot be understood easily Chapter 20 Section 5 19 Handwritten digit recognition 033Hf g 93934 3 m 3nearestneighbor 24 error 400 300 10 unit MLP 16 error LeNet 768 192 30 10 unit MLP 09 error LPN Current best kernel machines vision algorithms 06 error Chapter 20 Section 5 20 ARTIFICIAL INTELLIGENCE CHAPTER 1 Chapter 1 1 ltgt What is Al O A brief history O The state of the art Outline Chapter 1 2 What is AI Systems that think like humans Systems that think rationally Systems that act like humans Systems that act rationally Chapter 1 3 Acting humanly The Turing test Turing 1950 Computing machinery and intelligencequot ltgt Can machines thinkquot gt Can machines behave intelligentlyquot ltgt Operational test for intelligent behavior the Imitation Game HUMAN INTERROGATOR ltgt Predicted that by 2000 a machine might have a 30 chance of fooling a lay person for 5 minutes ltgt Anticipated all major arguments against Al in following 50 years ltgt Suggested major components of Al knowledge reasoning language understanding learning Problem Turing test is not reproducible constructive or amenable to mathematical analysis Chapter 1 4 Thinking humanly Cognitive Science 19605 cognitive revolutionquot informationprocessing psychology replaced prevailing orthodoxy of behaviorism Requires scientific theories of internal activities of the brain What level of abstraction Knowledge or circuits How to validate Requires 1 Predicting and testing behavior of human subjects topdown or 2 Direct identification from neurological data bottomup Both approaches roughly Cognitive Science and Cognitive Neuroscience are now distinct from Al Both share with Al the following characteristic the available theories do not explain or engender anything resembling humanlevel general intelligence Hence all three fields share one principal direction Chapter 1 5 Thinking rationally Laws of Thought Normative or prescriptive rather than descriptive Aristotle what are correct argumentsthought processes Several Greek schools developed various forms of logic notation and rules of derivation for thoughts may or may not have proceeded to the idea of mechanization Direct line through mathematics and philosophy to modern Al Problems 1 Not all intelligent behavior is mediated by logical deliberation 2 What is the purpose of thinking What thoughts should I have out of all the thoughts logical or otherwise that I could have Chapter 1 6 H Acting rationally Rational behavior doing the right thing The right thing that which is expected to maximize goal achievement given the available information Doesn t necessarily involve thinking eg blinking reflex but thinking should be in the service of rational action Aristotle Nicomachean Ethics Every art and every inquiry and similarly every action and pursuit is thought to aim at some good Chapter 1 7 Rational agents An agent is an entity that perceives and acts This course is about designing rational agents Abstractly an agent is a function from percept histories to actions f 73 gt A For any given class of environments and tasks we seek the agent or class of agents with the best performance Caveat computational limitations make perfect rationality unachievable gt design best program for given machine resources Chapter 1 8 Philosophy Mathematics Psychology Economics Linguistics Neuroscience Control theory AI prehistory logic methods of reasoning mind as physical system foundations of learning language rationality formal representation and proof algorithms computation undecidability intractability probability adaptation phenomena of perception and motor control experimental techniques psychophysics etc formal theory of rational decisions knowledge representation grammar plastic physical substrate for mental activity homeostatic systems stability simple optimal agent designs Chapter 1 9 1943 1950 1952 69 19505 1956 1965 1966 74 1969 79 1980 88 1988 93 1985 95 1988 1995 2003 Potted history of AI McCulloch amp Pitts Boolean circuit model of brain Turing s Computing Machinery and Intelligencequot Look Ma no hands Early Al programs including Samuel s checkers program Newell amp Simon s Logic Theorist Gelernter s Geometry Engine Dartmouth meeting Artificial Intelligencequot adopted Robinson s complete algorithm for logical reasoning Al discovers computational complexity Neural network research almost disappears Early development of knowledgebased systems Expert systems industry booms Expert systems industry busts Al Winterquot Neural networks return to popularity Resurgence of probability general increase in technical depth Nouvelle Alquot ALife GAs soft computing Agents agents everywhere Humanlevel Al back on the agenda Chapter 1 10 N State of the art Which of the following can be done at present ltgt Play a decent game of table tennis Chapter 1 N State of the art Which of the following can be done at present ltgt Play a decent game of table tennis ltgt Drive safer along a curving mountain road Chapter 1 N State of the art Which of the following can be done at present ltgt Play a decent game of table tennis ltgt Drive safely along a curving mountain road ltgt Drive safely along Telegraph Avenue Chapter 1 13 N State of the art Which of the following can be done at present ltgt Play a decent game of table tennis ltgt Drive safely along a curving mountain road ltgt Drive safely along Telegraph Avenue ltgt Buy a week s worth of groceries on the web Chapter 1 14 N State of the art Which of the following can be done at present Play a decent game of table tennis Drive safely along a curving mountain road Drive safely along Telegraph Avenue Buy a week s worth of groceries on the web Buy a week s worth of groceries at Berkeley Bowl ltgtltgtltgtltgtltgt Chapter 1 15 N State of the art Which of the following can be done at present Play a decent game of table tennis Drive safely along a curving mountain road Drive safely along Telegraph Avenue Buy a week s worth of groceries on the web Buy a week s worth of groceries at Berkeley Bowl Play a decent game of bridge ltgtltgtltgtltgtltgtltgt Chapter 1 16 N State of the art Which of the following can be done at present Play a decent game of table tennis Drive safely along a curving mountain road Drive safely along Telegraph Avenue Buy a week s worth of groceries on the web Buy a week s worth of groceries at Berkeley Bowl Play a decent game of bridge Discover and prove a new mathematical theorem ltgtltgtltgtltgtltgtltgtltgt Chapter 1 17 N State of the art Which of the following can be done at present Play a decent game of table tennis Drive safely along a curving mountain road Drive safely along Telegraph Avenue Buy a week s worth of groceries on the web Buy a week s worth of groceries at Berkeley Bowl Play a decent game of bridge Discover and prove a new mathematical theorem Design and execute a research program in molecular biology ltgtltgtltgtltgtltgtltgtltgtltgt Chapter 1 18 N State of the art Which of the following can be done at present Play a decent game of table tennis Drive safely along a curving mountain road Drive safely along Telegraph Avenue Buy a week s worth of groceries on the web Buy a week s worth of groceries at Berkeley Bowl Play a decent game of bridge Discover and prove a new mathematical theorem Design and execute a research program in molecular biology Write an intentionally funny story ltgtltgtltgtltgtltgtltgtltgtltgtltgt Chapter 1 19 N State of the art Which of the following can be done at present Play a decent game of table tennis Drive safely along a curving mountain road Drive safely along Telegraph Avenue Buy a week s worth of groceries on the web Buy a week s worth of groceries at Berkeley Bowl Play a decent game of bridge Discover and prove a new mathematical theorem Design and execute a research program in molecular biology Write an intentionally funny story Give competent legal advice in a specialized area of law ltgtltgtltgtltgtltgtltgtltgtltgtltgtltgt Chapter 1 20 State of the art Which of the following can be done at present ltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgt Play a decent game of table tennis Drive safely along a curving mountain road Drive safely along Telegraph Avenue Buy a week s worth of groceries on the web Buy a week s worth of groceries at Berkeley Bowl Play a decent game of bridge Discover and prove a new mathematical theorem Design and execute a research program in molecular biology Write an intentionally funny story Give competent legal advice in a specialized area of law Translate spoken English into spoken Swedish in real time Chapter 1 21 State of the art Which of the following can be done at present ltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgt Play a decent game of table tennis Drive safely along a curving mountain road Drive safely along Telegraph Avenue Buy a week s worth of groceries on the web Buy a week s worth of groceries at Berkeley Bowl Play a decent game of bridge Discover and prove a new mathematical theorem Design and execute a research program in molecular biology Write an intentionally funny story Give competent legal advice in a specialized area of law Translate spoken English into spoken Swedish in real time Converse successfully with another person for an hour Chapter 1 22 State of the art Which of the following can be done at present ltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgt Play a decent game of table tennis Drive safely along a curving mountain road Drive safely along Telegraph Avenue Buy a week s worth of groceries on the web Buy a week s worth of groceries at Berkeley Bowl Play a decent game of bridge Discover and prove a new mathematical theorem Design and execute a research program in molecular biology Write an intentionally funny story Give competent legal advice in a specialized area of law Translate spoken English into spoken Swedish in real time Converse successfully with another person for an hour Perform a complex surgical operation Chapter 1 23 State of the art Which of the following can be done at present ltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgt Play a decent game of table tennis Drive safely along a curving mountain road Drive safely along Telegraph Avenue Buy a week s worth of groceries on the web Buy a week s worth of groceries at Berkeley Bowl Play a decent game of bridge Discover and prove a new mathematical theorem Design and execute a research program in molecular biology Write an intentionally funny story Give competent legal advice in a specialized area of law Translate spoken English into spoken Swedish in real time Converse successfully with another person for an hour Perform a complex surgical operation Unload any dishwasher and put everything away Chapter 1 24 State of the art Which of the following can be done at present ltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgtltgt Play a decent game of table tennis Drive safely along a curving mountain road Drive safely along Telegraph Avenue Buy a week s worth of groceries on the web Buy a week s worth of groceries at Berkeley Bowl Play a decent game of bridge Discover and prove a new mathematical theorem Design and execute a research program in molecular biology Write an intentionally funny story Give competent legal advice in a specialized area of law Translate spoken English into spoken Swedish in real time Converse successfully with another person for an hour Perform a complex surgical operation Unload any dishwasher and put everything away Chapter 1 25 Unintentionally funny stories One day Joe Bear was hungry He asked his friend Irving Bird where some honey was Irving told him there was a beehive in the oak tree Joe threat ened to hit Irving if he didn t tell him where some honey was The End Henry Squirrel was thirsty He walked over to the river bank where his good friend Bill Bird was sitting Henry slipped and fell in the river Gravity drowned The End Once upon a time there was a dishonest fox and a vain crow One day the crow was sitting in his tree holding a piece of cheese in his mouth He noticed that he was holding the piece of cheese He became hungry and swallowed the cheese The fox walked over to the crow The End Chapter 1 26 LANGUAGE CHAPTER 22 Chapter 22 1 II Outline ltgt Communication ltgt Grammar ltgt Syntactic analysis ltgt Problems Chapter 22 II Communication Classical view pre 1953 language consists of sentences that are truefalse cf logic Modern view post1953 language is a form of action Wittgenstein 1953 Philosophical Investigations Austin 1962 How to Do Things with Words Searle 1969 Speech Acts Why Chapter 22 3 II Communication Classical view pre 1953 language consists of sentences that are truefalse cf logic Modern view post1953 language is a form of action Wittgenstein 1953 Philosophical Investigations Austin 1962 How to Do Things with Words Searle 1969 Speech Acts Why Chapter 22 4 II Communication Classical view pre 1953 language consists of sentences that are truefalse cf logic Modern view post1953 language is a form of action Wittgenstein 1953 Philosophical Investigations Austin 1962 How to Do Things with Words Searle 1969 Speech Acts Why Chapter 22 5 II Communication Classical view pre 1953 language consists of sentences that are truefalse cf logic Modern view post1953 language is a form of action Wittgenstein 1953 Philosophical Investigations Austin 1962 How to Do Things with Words Searle 1969 Speech Acts Why To change the actions of other agents Chapter 22 6 Speech acts SITUATION Speaker gt Utterance gt Hearer Speech acts achieve the speaker s goals Inform There s a pit in front of youquot Query Can you see the goldquot Command Pick it upquot Promise I ll share the gold with youquot Acknowledge OK Speech act planning requires knowledge of Situation Semantic and syntactic conventions Hearer s goals knowledge base and rationality Chapter 22 7 Stages in communication informing Intention S wants to inform H that P Generation 5 selects words W to express P Synthesis 5 utters words W Perception H perceives W Analysis H infers possible meanings P1 Pm Disambiguation H infers intended meaning Pi Incorporation H incorporates B into KB How could this go wrong Chapter 22 8 Stages in communication informing Intention S wants to inform H that P Generation 5 selects words W to express P Synthesis 5 utters words W Perception H perceives W Analysis H infers possible meanings P1 Pm Disambiguation H infers intended meaning Pi Incorporation H incorporates B into KB How could this go wrong lnsincerity 5 doesn t believe P Speech wreck ignition failure Ambiguous utterance Differing understanding of current situation Chapter 22 9 II Grammar Vervet monkeys antelopes etc use isolated symbols for sentences gt restricted set of communicable propositions no generative capacity Chomsky 1957 Syntactic Structures Grammar specifies the compositional structure of complex messages eg speech linear text linear music twodimensional A formal language is a set of strings of terminal symbols Each string in the language can be analyzedgenerated by the grammar The grammar is a set of rewrite rules eg S gtNP VP Article gtthel al anl Here S is the sentence symbol NP and VP are nonterminals Chapter 22 10 Grammar types Regular nonterminal gt terminal nonterminall S gtaS S gt Contextfree nonterminal gt anything S gt aSb Contextsensitive more nonterminals on righthand side ASB gt AAaBB Recursively enumerable no constraints Related to Post systems and Kleene systems of rewrite rules Natural languages probably contextfree parsable in real time Chapter 22 1 1 Noun Vei b Adjective Advei b Pronoun N Line Article Preposition Conjunction Digit gt gt gt gt gt a Wumpus lexicon stenchl breezel glitte39rl nothing wumpusl pitl pitsl goldl eas isl seel smelll shootl feell stinks gol grabl car39ryl killl turn rightl leftl eastl southl back smelly herel therel nearbyl ahead rightl leftl eastl southl bacl mel youl II it J0hn Mary Bost0n UCB PAJC thel al an t0 0n near andl orl bu 0 123456789 Divided into closed and open classes Chapter 22 12 Noun Vei b Adjective Advei b Pronoun N Line Article Preposition Conjunction Digit gt gt gt gt gt a Wumpus lexicon stenchl breezel glitte39rl nothing wumpusl pitl pitsl goldl eas isl seel smelll shootl feell stinks gol grabl car39ryl killl turn rightl leftl eastl southl back smelly herel therel nearbyl ahead rightl leftl eastl southl bacl mel youl I itl SHE Y ALL J0hn Mary Bost0n UCB PAJC thel al an t0 0n near andl orl bu 0 123456789 Divided into closed and open classes Chapter 22 13 NP VP PP RelCluuse Wumpus grammar NP VP S Conjunction S Pronoun Noun Article Noun Digit Digit NP PP NP RelCluuse Verb VP NP VP Adjectiue VP PP VP Aduei b Proposition NP that VP l feel a breeze I feel a breeze and I smell a wumpus pits the wumpus 3 4 the wumpus to the east the wumpus that is smelly stinks feel a breeze is smelly turn to the east go ahead to the east that is smelly Chapter 22 Grammaticality judgements Formal language L1 may differ from natural language L2 L1 L2 false false positives negatives Adjusting L1 to agree with L2 is a learning problem the gold grab the wumpus smell the wumpus the gold I give the wumpus the gold I donate the wumpus the gold lntersubjective agreement somewhat reliable independent of semantics Real grammars 10 500 pages insufficient even for proper English Chapter 22 15

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.