### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Artificial Intelligence Prog CS 662

USF

GPA 3.7

### View Full Document

## 14

## 0

## Popular in Course

## Popular in ComputerScienence

This 110 page Class Notes was uploaded by Michele Herzog on Thursday October 29, 2015. The Class Notes belongs to CS 662 at University of San Francisco taught by Christopher Brooks in Fall. Since its upload, it has received 14 views. For similar materials see /class/231232/cs-662-university-of-san-francisco in ComputerScienence at University of San Francisco.

## Reviews for Artificial Intelligence Prog

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/29/15

Arti cial Intelligence Programming Uninformed Search Chris Brooks Department of Computer Science University of San Francisco Looking Ahead Q In many environments it can be quite difficult to build reflex agents that act effectively Q Unable to consider where it is trying to go Q A goalbased agent is able to consider what it is trying to do and select actions that achieve that goal Q Agent program uses percepts and goal as input Q We ll look at a particular type of goalbased agent called a problemsolving agent Department of Computer Science Universitv of San Francisco 011 Problemsolving agents 9 A Problemsolving agent tries to find a sequence of actions that will lead to a goal 0 What series of moves will solve a Rubik s cube 9 How do I drive from USF to the San Francisco airport 9 How can I arrange components on a chip 0 What sequence of actions will move a robot across a room Department of Computer Science Universitv of San Francisco 021 Example Romania map Fagaras I Timisoara 39 Hirsova l Mehadia Urziceni 75 Dobreta l Crajova Eforie Department of Computer Science Universitv of San Francisco 031 Search Q The process of sequentially considering actions in order to find a sequence of actions that lead from start to goal is called search Q A search algorithm returns an action sequence that is then executed by the agent Q Search typically happens offline Q Note this assumes the environment is static Q Also environment is assumed to be discrete Q Environment is usually considered to be deterministic Department of Computer Science Universitv of San Francisco 041 Some classic search problems Q Toy problems useful to study as examples or to compare algorithms Q 8puzzle Q Vacuum world Q Rubik s cube Q Nqueens Q Realworld problems typically more messy but the answer is actually interesting Q Route finding Q Traveling salesman Q VLSI layout Q Searching the Internet Department of Computer Science Universitv of San Francisco 051 State Q We ll often talk about the state an agent is in Q This refers to the values of relevant variables describing the environment and agent Q Vacuum World 00 Clean Q Romania t O inBucharest Q Rubik s cube current arrangement of the cube Q This is an abstraction of our problem Q Focus only on the details relevant to the problem Department of Computer Science Universitv of San Francisco 061 Formulating a Search Problem 9 Initial State 9 Goal Test 0 Actions 9 Successor Function 0 Path cost 0 Solution Department of Computer Science Universitv of San Francisco 071 Initial State 9 Initial State The state that the agent starts in 9 Vacuum cleaner world 00 Clean 9 Romania nArad Department of Computer Science Universitv of San Francisco 081 Actions 9 Actions What actions is the agent able to take 9 Vacuum Left Right Up Down Suck Noop 9 Romania Go Department of Computer Science Universitv of San Francisco 091 Successor Function Q Successor function for a given state returns a set of actionnew state pairs Q This tells us for a given state what actions we re allowed to take and where they ll lead Q In a deterministic world each action will be paired with a single state Q Vacuumcleaner world ln00 gt Left ln00 Right ln00 Suck ln00 Clean Q Romania lnArad gt GoTimisoara lnTimisoara GoSibiu lnSibiu GoZerind lnZerind Q In stochastic worlds an action may be paired with many states Department of Computer Science Universitv of San Francisco 0101 Goal Test 9 Goal test This determines if a gives state is a goal state 61 There may be a unique goal state or many Q Vacuum World every room clean 9 Chess Checkmate 91 Romania inBucharest Department of Computer Science Universitv of San Francisco 0111 State space 9 The combination of problem states arrangements of variables of interest and and successor functions ways to reach states leads to the notion of a state space 0 This is a graph representing all the possible world states and the transitions between them 0 Finding a solution to a search problem is reduced to finding a path from the start state to the goal state Department of Computer Science Universitv of San Francisco 0121 State space 9 State space for simple vacuum Cleaner world R L g 391 AQ R 0588 05 09 09 88 Types of Solutions Q Depending on the problem we might want different sorts of solutions Q Maybe shortest or leastcost Q Maybejust a path Q Maybe any path that meets solution criteria satisficing Q We ll often talk about the size of these spaces as a measure of problem difficulty Q 8 puzzle 181 000 states easy Q 15puzzle N 13 trillion states pretty easy Q 24puzzle N 1025 states hard Q TSP 20 cities 20 243 x 1018 states hard Department of Computer Science Universitv of San Francisco 0141 Path cost 9 The path cost is the cost an agent must incur to go from the initial state to the currentlyexamined state 9 Often this is the sum of the cost for each action 9 This is called the step cost 0 We ll assume that step costs are nonnegative a What ifthey could be negative Department of Computer Science Universitv of San Francisco 0151 Shortestpath graph problems Q You ve probably seen other algorithms for solving pathfinding problems on a graph Q Djikstra s algorithm Prim s algorithm Maxflow Allpairs shortestpath Q These algorithms are quadratic or cubic in the number of vertices Q We ll talk about search being exponential in the number of state variables Q Is this a contradiction Q What other problem will traditional pathfinding algorithms have with a TSPsized problem Department of Computer Science Universitv of San Francisco 0161 Searching the state space Most search problems are too large to hold in memory 9 We need to dynamically instantiate portions of the search space We construct a search tree by starting at the initial state and repeatedly applying the successor function Basic idea from a state consider what can be done Then consider what can be done from each of those states Some questions we ll be interested in o Are we guaranteed to find a solution 0 Are we guaranteed to find the optimal solution 0 How long will the search take 9 How much space will it require Department of Computer Science Universitv of San Francisco 0171 Example Search Tree 9 The beginnings of a Romania search tree a The initial state b After expanding Arad c After expanding Sibiu Department of Computer Science Universitv of San Francisco 0181 Search algorithms Q The basic search algorithm is surprisingly simple fringe lt initialState do select node from fringe if node is not goal generated successors of node add successors to fringe Q We call this list of nodes generated but not yet expanded the fringe Q Question How do we select a node from the fringe Q Hint we ll use a priority queue Department of Computer Science Universitv of San Francisco 0191 Uninformed Search Q The simplest sort of search algorithms are those that use no additional information beyond what is in the problem description Q We call this uninformed search Q Sometimes these are called weak methods Q If we have additional information about how promising a nongoal state is we can perform heuristic search Department of Computer Science Universitv of San Francisco 0201 Breadth rst search Q Breadthfirst search works by expanding a node then expanding each of its children then each of their children etc Q All nodes at depth n are visited before a node at depth n1Bvad Q We can implement BFS using a queue Department of Computer Science Universitv of San Francisco 0211 Breadth rst search 9 BFS Pythonish code queueenqueueinitialState while not done node queuedequeue if goalTestnode return node else children 2 successor fnnode for child in children queueenqueuechild Department of Computer Science Universitv of San Francisco 0221 PPPPPF F BFS example Arad to Bucharest dequeue Arad enqueue Sibiu Timisoara Zerind dequeue and test Sibiu enqueue Oradea Fagaras Rimnciu Viclea dequeue and test Timisoara enqueue Lugoj Department of Computer Science Universitv of San Francisco 0231 Some subtle points Q How do we avoid revisiting Arad Q Closedlist keep a list of expanded states Q Do we want a Closedlist here Our solution is a path not a City Q How do we avoid inserting Oradea twice Q Openlist our queue actually a list of generated but unexpanded states Q Why don t we apply the goal test when we generate Children Q Not really any different Nodes are visited and tested in the same order either way Same number of goal tests are performed Department of Computer Science Universitv of San Francisco 0241 Analyzing BFS Q Completeness s BFS guaranteed to find a solution a Yes Assume the solution is at depth n Since all nodes at or above n are visited before anything at n 1 a solution will be found 9 Optimality Ifthere are multiple solutions will BFS find the best one 0 BFS will find the shallowest solution in the search tree If step costs are uniform this will be optimal Otherwise not necessarily O Arad gt Sibiu gt Fagaras gt Bucharest will be found first dist 450 a Arad gt Sibiu gt Rimnicu Vilcea gt Pitesti gt Bucharest is shorter dist 418 Department of Computer Science Universitv of San Francisco 0251 Analyzing BFS Q Time complexity BFS will require 0bd1 running time Q b is the branching factor average number of children Q d is the depth of the solution Q BFS will visit bb2b3bdbd1 6 Q Space complexity BFS must keep the whole search tree in memory since we want to know the sequence of actions to get to the goal Q This is also 0bd1 1 0bd1 nodes Department of Computer Science Universitv of San Fra ncisco 0261 Analyzing BFS Q Assume b 10 1kbnode 10000 nodessec Q depth 2 1100 nodes 011 seconds 1 megabyte 0 depth 4 111000 nodes 11 seconds 106 megabytes Q depth 6 107 nodes 19 minutes 10 gigabytes Q depth 8 109 nodes 31 hours 1 terabyte Q depth 10 1011 nodes 129 days 101 terabytes Q depth 12 1013 nodes 35 years 10 petabytes Q depth 14 1015 nodes 3523 years 1 exabyte Q In general the space requirements of BFS are a bigger problem than the time requirements Department of Computer Science Universitv of San Francisco 0271 Uniform cost search Q Recall that BFS is nonoptimal when step costs are nonuniform Q We can correct this by expanding the shortest paths first Q Add a path cost to expanded nodes Q Use a priority queue to order them in order of increasing path cost Q Guaranteed to find the shortest path Q If step costs are uniform this is identical to BFS Q This is how Djikstra s algorithm works Department of Computer Science Universitv of San Francisco 0281 Depth rst Search Q Depthfirst search takes the opposite approach to search from BFS Q Always expand the deepest node Q Expand a child then expand its leftmost child and so on Q We can implement DFS using a stack Department of Computer Science Universitv of San Francisco 0291 Depth rst Search Q DFS pythonish code stackpushinitialState while not done node 2 pop if goalTestnode return node else children 2 successor fnnode for child in children stackpushchild Department of Computer Science Universitv of San Francisco 0301 PPPPPPF P DFS example Arad to Bucharest pop Arad push Sibiu Timisoara Zerind pop and test Sibiu push Oradea Fagaras Rimnciu Viclea pop and test Oradea pop and test Fagaras push Bucharest Department of Computer Science Universitv of San Francisco 0311 Analyzing DFS Q Completeness no We can potentially wander down an infinitely long path that does not lead to a solution 9 Optimality no We might find a solution at depth n under one Child without ever seeing a shorter solution under another Child what if we popped Rimnoiu Viclea first 0 Time requirements 0bm where m is the maximum depth of the tree a m may be much larger than d the solution depth 0 In some cases m may be infinite Department of Computer Science Universitv of San Francisco 0321 Analyzing DFS 9 Space requirements 0bm Q We only need to store the currentlysearched branch 0 This is DFS strong point a In our previous figure searching to depth 12 would require 118 KB ratherthan 1O petabytes for BFS Department of Computer Science Universitv of San Francisco 0331 Reviewing Q A Search problem consists of Q A description of the states a An initial state 9 A goal test a Actions to be taken 9 A successor function 0 A path cost Department of Computer Science Universitv of San Francisco 0341 First a question Q Why are we looking at algorithms that perform an exhaustive search Isn t there something faster Q Many of the problems we re interested in are NPcomplete Q No known polynomialtime algorithm Q Worse many are also inapproximable Q In the worst case the best one can hope for is to enumerate all solutions Department of Computer Science Universitv of San Francisco 0351 Avoiding In nite Search Q There are several approaches to avoiding DFS infinite search 9 Closedlist 0 May not always help 1 Now we have to keep exponentially many nodes in memory 9 Depthlimited search a Iterative deepening DFS Department of Computer Science University of San Francisco 0361 Depthlimited Search Q Depthlimited search works by giving DFS an upper limit l Q Search stops at this depth Q Solves the problem of infinite search down one branch Q Adds another potential problem Q What ifthe solution is deeperthan l Q How do we pick a reasonable l Q In the Romania problem we know there are 20 cities so l 19 is a reasonable choice Q What about 8 puzzle Department of Computer Science Universitv of San Francisco 0371 Depthlimited Search Q DLS pseudocode stackpushinitialState while not done node 2 pop if goalTestnode return node else if depthnode lt limit children 2 successor fnnode for child in children pushchild else return None Department of Computer Science Universitv of San Francisco 0381 Iterative Deepening DFS IDS 9 Expand on the idea of depthlimited search 9 Do DLS with l 1 then l 2 then l 3 etc 0 Eventually l d the depth of the goal Q This means that IDS is complete Drawback Some nodes are generated and expanded multiple times I0 Department of Computer Science Universitv of San Francisco 0391 Iterative Deepening DFS IDS 9 Due to the exponential growth of the tree this is not as much of a problem as we might think a Level 1 b nodes generated d times 9 Level 2 b2 nodes generated d 1 times Q O Level d bd nodes generated once a Total running time 0bd Slightly fewer nodes generated than BFS 0 Still has linear memory requirements Department of Computer Science Universitv of San Francisco 0401 Iterative Deepening DFS IDS Q IDS pseudocode d 0 while True result depth limited searchd if result 22 goal return result else ddl Department of Computer Science Universitv of San Francisco 0411 Iterative Deepening DFS IDS Q IDS is actually similarto BFS in that all nodes at depth n are examined before any node at depth n 1 is examined Q As with BFS we can get optimality in nonuniform step cost worlds by expanding according to path cost rather than depth Q This is called iterative lengthening search I0 Search all paths with cost less than p Increase p by 6 Q In continuous worlds what should 6 be Department of Computer Science Universitv of San Francisco 0421 Backtracking Q What happens when DFS and its cousins reach a failure state 9 They go up to the parent and try the next sibling Q Assumption The most recentlychosen action is the one that caused the failure 9 This is called chronological backtracking undo the most recent thing you did 9 This can be a problem failure may be a result of a previous decision 0 Example 4queens Department of Computer Science Universitv of San Francisco 0431 Backtracking Q Constraints can help you limit the size of the search space 9 Intelligent backtracking tries to analyze the reason for the failure and unwind the search to that point 9 Can unwind to the most recent conflicting variable backjumping Q Can also do fonvard checking is there a possible assignment of values to variables at this point Department of Computer Science Universitv of San Francisco 0441 Summary 9 Formalizing a search problem 9 Initial State 0 Goal Test 9 Actions to be taken 0 Successorfunction Q Path cost 9 Leads to search through a state space using a search tree Department of Computer Science Universitv of San Francisco 0451 Summary 9 Algorithms 9 Breadth First Search 0 Depth First Search Uniform Cost Search Depthlimited Search Iterative Deepening Search PPP Department of Computer Science Universitv of San Francisco 0461 Coming Attractions Q Heuristic Search speeding things up Q Evaluating the goodness of nongoal nodes Q Greedy Search Q A search Q Developing heuristics Department of Computer Science Universitv of San Francisco 0471 Example Problems 9 8 puzzle Q What is a state in the 8 puzzle 9 What is a solution 0 What is the path cost 9 What are the legal actions 0 What is the successor function Department of Computer Science Universitv of San Francisco 0481 Example Problems 8 puzzle Let s say the initial state is 1 3 2 B 6 4 5 8 7 GoalstateisB12345678 First steps of BFS DFS IDS Department of Computer Science Universitv of San Francisco 0491 Example Problems Q Traveling Salesman Q What is a state in the TSP Q What is a solution Q What is the path cost Q What are the legal actions Q What is the successor function Department of Computer Science Universitv of San Francisco 0501 Example Problems Q TSP on the reduced Romania Map Q Start in Sibiu Q Visit 8 F RV C P B Q First steps of BFS DFS IDS Department of Computer Science Universitv of San Franc isco 0511 Towers of Hanoi 9 Another toy problem 9 What are the problem characteristics 9 Initial state 5 4 3 21 0 Goal State 1 5 4 3 21 9 First steps of BFS DFS IDS Department of Computer Science Universitv of San Francisco 0521 Arti cial Intelligence Programming Decision Making and Utility Chris Brooks Department of Computer Science University of San Francisco Making decisions Q At this point we know how to describe the probability of events occuring Q Or states being reached in agent terms Q Knowing the probabilities of events is only part of the battle Q Agents are really interested in maximizing performance Q This means making the correct decisions as to what to do Q Often performance can be captured by utility Q Utility indicates the relative value of a state Department of Computer Science Universitv of San Francisco 011 I0 l0 l0 Types of decisionmaking problems Singleagent deterministic full information episodic 9 We ve done this with the reflex agent Singleagent deterministic full information sequential Q We can use search here Singleagent stochastic partial information episodic Singleagent stochastic partial information sequential multipleagent deterministic full information episodic Department of Computer Science Universitv of San Francisco 021 Types of decisionmaking problems Q Singleagent deterministic full information episodic Q We ve done this with the reflex agent Q Singleagent deterministic full information sequential Q We can use search here Q Singleagent stochastic partial information episodic Q We can use utility and probability here Q Singleagent stochastic partial information sequential Q We can extend our knowledge of probability and utility to a Markov decision process Q multipleagent deterministic full information episodic orsequen aD Q This is game theory Department of Computer Science Universitv of San Francisco 031 Expected Utility 9 In episodic stochastic worlds we can use expected utility to select actions 9 An agent will know that an action can lead to one of a set S of states a The agent has a utility for each of these states 9 The agent also has a probability that these states will be reached a The expected utility of an action is Q 2865 PSUS Q The principle of maximum expected utility says that an agent should choose the action that maximizes expected utility Department of Computer Science Universitv of San Francisco 041 Example 9 Let s say there are two levers 9 Lever 1 costs 1 to pull With p 04 you win 2 With p 06 you win nothing 0 Lever 2 costs 2 to pull With p 01 you win 10 with p 09 you lose 1 on top of the charge to pull Q Should you a pull lever 1 b pu lever 2 c pu neither Department of Computer Science Universitv of San Francisco 051 Example Q EUever 1 04 1 06 1 02 Q EUever 2 01 8 09 3 53 Q EUneither 0 Q Lever 2 gives the maximum EU Q TV digression this is the choice contestants are faced with on Deal or No Deal The banker offers them a price slightly above the expected utility and yet most contestants don t take it Why Department of Computer Science Universitv of San Francisco 061 Example Vegas Q The typical roulette wheel has 38 numbers 136 plus 0 and 00 Q 136 are either red or black Q The payoff for betting on a single number is 351 Q In other words if the number you picked comes up you win 35 Otherwise you lose 1 Q What is the expected utility of betting on a single number Department of Computer Science Universitv of San Francisco 071 Example Vegas 9 The typical roulette wheel has 38 numbers 136 plus 0 and 00 9 136 are either red or black 9 The payoff for betting on a single number is 351 0 In other words if the number you picked comes up you win 35 Otherwise you lose 1 9 What is the expected utility of betting on a single number Q 35 1 0052 Department of Computer Science Universitv of San Francisco 081 Example Vegas Q The typical roulette wheel has 38 numbers 136 plus 0 and 00 Q 136 are either red or black Q The payoff for betting on a single number is 351 Q In other words if the number you picked comes up you win 35 Otherwise you lose 1 Q What if you decide to spread the risk and bet on two numbers Department of Computer Science Universitv of San Francisco 091 Example Vegas 9 The typical roulette wheel has 38 numbers 136 plus 0 and 00 9 136 are either red or black 9 The payoff for betting on a single number is 351 0 In other words if the number you picked comes up you win 35 Otherwise you lose 1 9 What if you decide to spread the risk and bet on two numbers 0 34 2 0105 Department of Computer Science Universitv of San Francisco 0101 Example Vegas 9 The typical roulette wheel has 38 numbers 136 plus 0 and 00 9 136 are either red or black 9 The payoff for betting on color is 11 0 In other words if you bet red and a red number comes up you win 1 Otherwise you lose 1 9 What is the expected utility of betting on red Department of Computer Science Universitv of San Francisco 0111 Example Vegas 9 The typical roulette wheel has 38 numbers 136 plus 0 and 00 9 136 are either red or black 9 The payoff for betting on color is 11 0 In other words if you bet red and a red number comes up you win 1 Otherwise you lose 1 9 What is the expected utility of betting on red Q 1 1 0052 Department of Computer Science Universitv of San Francisco 0121 Regarding Preferences Q In order for MEU to make sense we need to specify some hopefully reasonable constraints on agent preferences Q Orderability We must be able to say that A is preferred to B B is preferred to A or they are equally preferable We cannot have the case where A and B are incomparable Q Transitivity If A lt B and B lt C then A lt C Q Continuity If A lt B lt C then there is a scenario where the agent is indifferent to getting B and having a probability p of getting A and 1 p chance of getting 0 Department of Computer Science University of San Francisco 0131 Rational Preferences 9 Monotonicity If two actions A and B have the same outcomes and I prefer A to B I should still prefer A if the probability of A increases 9 Decomposability Utilities over a sequence of actions can be decomposed into utilities for atomic events 0 These preferences are for the most part quite reasonable and allow an agent to avoid making foolish mistakes Department of Computer Science Universitv of San Francisco 0141 Utility Money and Risk Q Utility comes from economics Q Money is often used as a substitute for utility Q Preferences for money behave oddly when dealing with small or large amounts Q For example you will often take more chance with small amounts and be very conservative with very large amounts Q This is called your risk pro le Q convex riskseeking Q concave riskaverse Q Typically we say that we have a quasilinear utility function regarding money Department of Computer Science Universitv of San Francisco 0151 Gathering information Q When an agent has all the information it can get and just needs to select a single action things are straightforward 0 Find the action with the largest expected utility Q What if an agent can choose to gather more information about the world 9 Now we have a sequential decision problem Q Should we just act or gather information first 0 What questions should we ask 0 Agents should ask questions that give them useful information 0 Useful means increasing expected utility 0 Gathering information might be costly either in time or money Department of Computer Science University of San Francisco 0161 Example 9 An agent can recommend to a prospecting company that they buy one of n plots of land 0 One of the plots has oil worth 0 dollars the others are empty a Each block costs On dollars 0 Initially agent is indifferent between buying and not buying why is that Department of Computer Science Universitv of San Francisco 0171 l0 Example Suppose the agent can perform a survey on a block that will indicate whether that block contains 0 How much should the agent pay to perform that survey P0il 171 Pboz l n 1n a If oil found buy for On Profit C On n 1On a If oi not found buy a different block 0 Probability of picking the 0 block is now 1n 1 Expected Profit Cn 1 On Cn n 1 So the expected profit given the information is n nn 1 H n n 1n 1Cn 1 0 So the company is willing to pay up to On the value of the plot for the test This is the value of that information Department of Computer Science Universitv of San Francisco 0181 Value of Perfect Information 9 Let s formalize this lt1 We find the best action a in general with 9 EUW maxaEact ILons Ziewtcomes UiPia 0 Let s say we acquire some new information E 0 Then we find the best action with o EUaE mamaeacmons Emoutcomes UiPia E Q The value of E is the difference between these two Department of Computer Science Universitv of San Francisco 0191 Value of Perfect Information However before we do the test we don t know what E will be We must average over all possible values of E VPIltEgt Ewes m jEUalE j EUltagt In words consider the possibility of each observation along with the usefulness of that observation to compute the expected information gain from this test In general information will be valuable to the extent that it changes the agent s decision Department of Computer Science Universitv of San Francisco 0201 Example 9 Imagine that you are on a game show and are given a choice of three possible doors to open 9 If you open door number 1 you will win 10 9 If you open door number 2 you have a 50 change of winning nothing and a 50 chance of winning 25 a If you open door number 3 you have a 20 of winning 20 and an 80 chance of winning 10 9 Which door should you choose Department of Computer Science Universitv of San Francisco 0211 Example 9 EUdoor1 10 lt1 EUdoor2 05 0 05 25 125 0 EUdoor3 02 20 08 10 12 9 Door 2 is best Department of Computer Science Universitv of San Francisco 0221 Example 9 Now suppose that the host offers to tell you honestly what you ll get if you open door number 2 How much would you pay for this information 0 Note you can change your mind about which door to open after the host gives you this information Department of Computer Science Universitv of San Fra ncisco 0231 Example 9 There are two cases either door 2 pays 0 or it pays 25 Q If it pays 25 we should Choose it This happens 50 of the time Q If it pays 0 we should Choose door3 which pays 12 This happens 50 of the time Q So our utility Will be 05 25 05 12 185 0 Our EU before we asked was 125 so the information is worth 6 Department of Computer Science Universitv of San Francisco 0241 Dealing with multiple agents 9 Value of information shows how an agent can rationally Iook ahead in a partially observable world 9 Looking ahead is also valuable in multiagent environments 9 Anticipate your opponent s moves or his reaction to your moves or his reaction to your reaction to his reaction Department of Computer Science Universitv of San Francisco 0251 Multiagent Environments Q Competitive environments Q Chess checkers go Q Auctions online markets Q Cooperative Environments Q Agents on a soccer team Q Rovers surveying Mars Q And inbetween cases Q Two agents making deliveries Department of Computer Science Universitv of San Francisco 0261 Game Theory Q Game Theory is the branch of mathematicseconomics that deals with interactions between multiple agents Q Prescribes methods for determining optimal behavior Q Distinctions Q Games of perfect information Agents have access to all knowledge about the world Q We d call this a fully observable environment Q Board games without randomness paperscissorsrock Q Games of imperfect information An agent must make inferences about the world or its opponent Q We call this a partially observable environment Q Games of chance some auctions most realworld interactions Department of Computer Science Universitv of San Francisco 0271 Game Theory Q Game theory assumes Q Perfect rationality agents will always do the right thing Q Unlimited computation agents are always able to determine the right thing to do Q As we ve seen these assumptions may not make sense in some cases Q However we ll still use some ideas from game theory to help decide what to do Q So how does an agent determine what do do in a multiagent environment Department of Computer Science Universitv of San Francisco 0281 Optimal Strategies 9 We ll start with the simplest sort of game Q Two players Q Perfect information a Zerosum one wins one loses this is also called perfecty competitive 9 We can define this game as a search problem Department of Computer Science Universitv of San Francisco 0291 Optimal Strategies 9 Two players max and min 0 Initial state game s initial configuration a Successor function returns all the legal moves and resulting configurations a Terminal test determines when the game is over 9 Utility function amount won by each player 0 ln zerosum games amounts are 1 win 1 loss 0 draw Department of Computer Science Universitv of San Francisco 0301 Game Trees 9 We can then in principle draw the game tree for any game MAX X 7 II MDHm III III III III In uni III III III I I I I I I I I I I I MAXOO III III all I 7 I l Mnum III XII In man man man IERMmAL IEK EEK Im IE mmm mmm umm 71 0 H 0 Utilities for terminal states are from the perspective of player1 max Department of Computer Science Universitv of San Francisco 0311 Finding a strategy 9 Max cannotjust search forward and find the path that leads to a winning outcome 9 Max must take the fact that min is also trying to win into account a Consider the following even simpler search tree MAX MIN Department of Computer Science Universitv of San Francisco 0321 Singlestage games 9 This approach is easiest to see with a singlestage game 0 Consider the bigmonkeyIittlemonkey problem There is 10 Cal worth of fruit in the tree Big Monkey spends 2 Cal getting fruit Little Monkey 0 If both climb BM gets 7 LM gets 3 If BM climbs and LM waits BM gets 6 LM 4 Q Q Q If neither gets fruit both get 0 Q Q Q If LM climbs and BM waits BM gets 9 LM gets 1 Department of Computer Science Universitv of San Francisco 0331 Singlestage games Q What if Big Monkey chooses first BM W it C imb LM wa39 limb w it limb BMO BM29 BM24 BM25 LMzO Lle LM24 LM23 Q If BM waits LM will climb BM gets 9 Q If BM climbs LM will wait BM gets 4 Q BM will wait Department of Computer Science Universitv of San Francisco 0341 Singlestage games Q This analysis only works if the monkeys choose sequentially Q If they choose simultaneously things can become more complicated Q Each must consider the best action for each opposing ac on Q We ll focus on sequential games Department of Computer Science Universitv of San Francisco 0351 Dealing with multiple stages To deal with games where many actions are taken begin at the leaves and compute their payoff The minimax value of a node is the payoff from that player s perspective assuming both players play optimally from that point on Computed recursively Begin with terminal nodes are known Minimaxvalue node a maxchidren if it s the max player s turn a minchidren if it s the min player s turn Department of Computer Science Universitv of San Francisco 0361 Example MAX MIN 0 Terminal utilities are given 0 Recurse up to the Min layer or ply a Min player will choose actions b1 1 d3 0 Recurse again to Max layer a Max player will choose action a1 Department of Computer Science Universitv of San Francisco o3739 Minimax search Q Perform a depthfirst traversal of the search tree Q Once the values for a node s children have been computed the value for that node can then be computed Q Retain only the best action sequence so far Q This will find the optimal strategy Q Problem Q Totally impractical for large games Q Serves as a basis fore more practical approaches Department of Computer Science Universitv of San Francisco 0381 Alphabeta pruning 9 Recall that pruning is the elimination of branches in the search tree without expansion 9 You can rule out the possibility that the solution is down one branch 9 Consider the previous example again remembering that nonterminal node values are generated via depthfirst search Department of Computer Science Universitv of San Francisco 0391 Alphabeta pruning 0 After traversing the left branch we know that node B has a value of 3 Q The Min player will pick 3 over 12 and 8 Q More generally the min player will pick the lowest value 9 As soon as we find a 2 down the center Branch we can prune the other children 0 Since Min would pick the smallest value down this branch C can have a value of at most 2 Q Node B already has a value of 3 so Max would prefer this 9 Still must traverse the third branch Department of Computer Science Universitv of San Francisco 0401 l0 Iquot Q l0 Alphabeta pruning Intuition behind alphabeta Consider a node n ifthe player considering n can make a better move at n or one of its parents then n will never be reached Call Alpha the best max value at a Choice point Call beta the lowest min value at a Choice point Department of Computer Science Universitv of San Francisco 0411 Alphabeta pruning Q Alphabeta is very sensitive to the order in which nodes are enqueued Q What if nodes B and C were swapped in our previous example 0 Heuristics can be used to reorder the tree and search promising branches first a Problem Alphabeta still needs to descend to a terminal node before states can be evaluated Department of Computer Science Universitv of San Francisco 0421 Evaluation Functions 9 For large problems like chess you need to be able to estimate the value of nonterminal states 9 Construct a heuristic known as an evaluation function that estimates the terminal utility of that state 9 How to build such a function 9 Should order states according to the utility of their terminals 0 Should be fast 9 Should be strongly correlated with actual chances of winning Department of Computer Science Universitv of San Francisco 0431 Evaluation Functions Q Evaluation functions introduce uncertainty Q You d like to have your function guess correctly most of the time Q Place states in categories or classes states in a category win X of the time Q This is a classification problem Q Alternatively construct a domainspecific value for a state Q In chess give pieces point values assign weights to squares controlled Q How to get these numbers Q Ask an expert Q Run minimax offline on different board configurations Department of Computer Science Universitv of San Francisco 0441 Cutting off search Q Evaluating nonterminal nodes is fine but we still need to anticipate future moves Q When can we cut off search and have a good idea of a branch s value Q Approach 1 Use a fixed depth of lookahead Q Approach 2 Use iterative deepening until ourtime to make a decision runs out Q Both of these approaches are vulnerable to the horizon probem Q Things look good at this state but will go horribly wrong next turn Department of Computer Science Universitv of San Fra ncisco 0451 Quiescence Q We want to apply our evaluation function only to search paths that are quiescent 9 Aren t changing wildly a Don t stop in the middle of a sequence of captures for example 0 If our estimate of the state is changing drastically as we expand we need to search further down that branch to get an accurate estimate a We still may not avoid the horizon problem 9 There may be a move that is ultimately unavoidable but can be prolonged indefinitely 0 Search must be able to distinguish between delaying and avoiding an outcome Department of Computer Science Universitv of San Francisco 0461 Performance improvements Q May choose to focus search more deeply down moves that are clearly better than other Q Can also store visited states Q Avoid repeated paths to the same state Q Avoid symmetric states Q Highperformance gameplaying systems will typically have a large dictionary of precomputed endgames Q Once a game reaches this state optimal moves are read from a hashtable Department of Computer Science Universitv of San Francisco 0471 Deep Blue Q Deep Blue is a great example of the combination of modern computing power and Al techniques Q 1997 Deep Blue defeats Garry Kasparov clearly the best human chess player Q This has been an Al goal since the 50s Q Deep Blue is a 32Node RS6000 supercomputer Q Software written in C Q Uses alphabeta search Department of Computer Science Universitv of San Francisco 0481 Deep Blue Q Specialized hardware for computing evaluation functions Q Evaluates 200000000 statessecond Q Evaluates 100200 billion states in 3 minutes the time alloted Q Average branching factor in chess is 35 average game is 50 moves Q State space is around 1070 Q Large amount of specialized domain knowledge from humans used to construct accurate evaluation functions Q Maintains a database of opening moves used by grandmasters in previous games Q Also a large book of endgames Department of Computer Science Universitv of San Francisco 0491 Deep Blue Q This is nothing like how humans play chess Q Kasparov claims to consider 23 states per second Q Humans seem to use patternmatching to recognize good moves Q Deep blue is intelligent in the sense that it acts rationaly but not that it thinks or acts like a human Department of Computer Science Universitv of San Fr ancisco 0501 Dealing with chance 9 Extending minimax to deal with chance is straightforward 9 Add a chance player between max and min who can affect the world 0 Each player must now act to maximize expected payoff 9 Evaluation becomes more difficult and the search tree explodes 9 There is also a distinction between random events such as dice rolls and deterministic but unknown variables such as cards in an opponent s hand a In the second case we can deduce information to help us eliminate uncertainty Department of Computer Science Universitv of San Francisco 0511 Other gameplaying AI successes Q Checkers Q By 1994 Chinook could defeat Dr Marion Tinsley who d been world checkers champion for 40 years Q Othello Q World champion defeated in 1997 Q It s generally acknowledged that Othello is small enough to play optimally Q Backgammon Q TDGammon ranked among top three players in the world Q Doesn t use alphabeta uses shallow search with a very accurate evaluation function learned by playing against itself Department of Computer Science Universitv of San Francisco 0521 Other gameplaying AI successes 9 GO Q Still open computers can beat amateurs but not top professionals a Challenges no hierarchy of pieces hard to focus search on board sections branching factor of 361 0 Bridge 0 GIB competes with human world champions 12th out of 35 0 Uses a set of scripts to plan bidding strategies Q Learns generalized rules Department of Computer Science Universitv of San Francisco 0531 AI and games 9 Minimax and gametree search is great for solving sequentialmove games 9 This works well for chess checkers etc but is not very relevant to Halo 9 What are the Al problems for modern gaming Department of Computer Science Universitv of San Francisco 0541 AI and games 9 Until recently Al behavior in games was typically pretty primitive 9 Emphasis was on graphical rendering a Very few cycles left for Al a Assumption was that suspension ofdisbelief would be enough 9 NPC behaviortypically governed by simple rules or finite state machines 1 Now graphics cards are powerful enough to provide an opportunity to incorporate Al into the game itself Department of Computer Science Universitv of San Francisco 0551 AI needs in gaming Q Modern gaming could use Q Adaptive behavior Q Rich humanlike interaction NPCs with their own motivationsgoals Q A middle ground between scripted stories and completely openended worlds Q Intelligent groupunit behavior Q Better Al opponents Department of Computer Science Universitv of San Francisco 0561

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I made $350 in just two days after posting my first study guide."

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.