Special Topics ECE 4823
Popular in Course
verified elite notetaker
Popular in ELECTRICAL AND COMPUTER ENGINEERING
This 0 page Class Notes was uploaded by Cassidy Effertz on Monday November 2, 2015. The Class Notes belongs to ECE 4823 at Georgia Institute of Technology - Main Campus taught by Jeff Shamma in Fall. Since its upload, it has received 11 views. For similar materials see /class/233874/ece-4823-georgia-institute-of-technology-main-campus in ELECTRICAL AND COMPUTER ENGINEERING at Georgia Institute of Technology - Main Campus.
Reviews for Special Topics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/02/15
ESSENTIALS OF GAME THEORY Synthesis Lectures on Arti cial Intelligence and Machine Learning Editors Ronald Brachman7 Yaboo Rexemcb Tom Dietterich7 Oregon State University Intelligent Autonomous Robotics Peter Stone 2007 A Concise Introduction to Multiagent Systems and Distributed Arti cial Intelligence Nikos Vlassis 2007 Essentials of Game Theory A Concise Multidisciplinary Introduction Kevin LeytoneBrown and Yoav Shoham 2008 Copyright 2008 byMorgan BLClaypool All rights reserved No part of this publication may be reproduced stored in a retrieval system or transmitted in any form or by any meansielectronic mechanical photocopy recording or any other except for brief quotations in printed reviews without the prior permission ofthe publisher Essentials ofGame Theory Kevin Ley39toneBrown and Yoav Shoham wwwmorganclaypoolcom ISBN 9781598295931 paper ISBN 9781598295948 ebook D01 102200S00108ED1V01Y200802A1M003 A Publication in the Morgan 81 Claypool Publishers series SYNTHESIS LECTURES ON AR TIFI CML INTELLIGENCE AND W CHINE LE RNING 3 Lecture 3 Series Editor Ronald Brachman Yahool Research and Tom Dietterich Oregon State University Library of Congress CatalogingeinePublication Data Series ISSN 19394608 print Series ISSN 19394616 electronic ESSENTIALS OF GAME THEORY A Concise Multidisciplinary Introduction Kevin Leyton Brown University39of British Columbia Vancouver BC Canada httpcstubc caNkevinlb Yoav Shoham Stanford University Palo Alto CA USA httpcststanfordteduNshoham MORGAN CLAYPOOL PUBLISHERS ABSTRACT Game theory is the mathematical study of interaction among independent selfeinterested agents The audience for game theory has grown dramatically in recent years and now spans disciplines as diverse as political science biology psychology economics linguistics sociology and computer scienceiamong others What has been missing is a relatively short introduction to the eld covering the common basis that anyone with a professional interest in game theory is likely to require Such a text would minimize notation ruthlessly focus on essentials and yet not sacri ce rigor This Synthesis Lecture aims to ll this gap by providing a concise and accessible introduction to the eld It covers the main classes of games their representations and the main concepts used to analyze them This introduction is just what a growing multidisciplinary audience needs it is concise aue thoritative up to date and clear on the important conceptual issues Robert Stalnaker MIT Linguistics and Philosophy 1 wish I d had a comprehensive clear and rigorous introduction to the essentials of game theory in under one hundred pages when l was starting out David Parkes Harvard University Computer Science Beside being concise and rigorous Essentials of Game Theory is also quite comprehensive It includes the formulations used in most applications in engineering and the social sciences and illustrates the concepts with relevant examples Robert Wilson Stanford University Graduate School of Business Best short introduction to game theory I have seen lwish it was available when I started being interested in the eld Silvio Micali MIT Computer Science Although written by computer scientists this book serves as a sophisticated introduction to the main concepts and results of game theory from which other scientists including social scientists can greatly bene t In eighty pages Essentials of Game Theory formally de nes key concepts illustrated with apt examples in both cooperative and noncooperative game theory Steven Brams New York University Political Science This book will appeal to readers who do not necessarily hail from economics and who want a quick grasp of the fascinating eld of game theory The main categories of games are introduced in a lucid way and the relevant concepts are clearly de ned with the underlying intuitions always provided Krzysztof Apt University of Amsterdam Institute for Logic Language and Computation To a large extent modern behavioral ecology and behavioral economics are studied in the framework of game theory Students and faculty alike will nd this concise rigorous and clear introduction to the main ideas in game theory immensely valuable Marcus Feldman Stanford University Biology This unique book is today the best short technical introduction to game theory Accessible to a broad audience it will prove invaluable in arti cial intelligence more generally in computer science and indeed beyond Moshe Tennenholtz Technion Industrial Engineering and Management Excerpted from a mucheanticipated crossedisciplinary book on multiagent systems this terse incisive and transparent book is the ideal introduction to the key concepts and methods of game theory for researchers in several elds including arti cial intelligence networking and algorithms Vijay VaZirani Georgia Institute of Technology Computer Science The authors admirably achieve their aim of providing a scientist or engineer with the essentials of game theory in a text that is rigorous readable and concise Frank Kelly University of Cambridge Statistical Laboratory KEI WORDS Game theory multiagent systems competition coordination Prisoner s Dilemma zeroesum games Nash equilibrium extensive form repeated games stochastic games Bayesian games coalitional games T0 my parents Anne andDwvid Leytan Bm39wn KLB T0 myparem s Leila andevis Stein YS wit7 mac7 law and tlmnksfar all tlmz yau 1mm taught us Contents Credits and Acknowledgments xiii Preface xv Games in Normal Form 1 11 Example The TCP User s Game 2 12 De nition of Games in Normal Form 3 13 More Examples ofNormaleForm Games 4 131 Prisoner s Dilemma 4 132 Commonepayoff Games 4 133 Zeroesum Games 5 134 Battle ofthe Sexes 7 14 Strategies in Normaleform Games 7 Analyzing Games From Optimality To Equilibrium 9 21 Pareto optimality 9 22 De ning Best Response and Nash Equilibrium 10 23 Finding Nash Equilibria 11 Further Solution Concepts for Normal Form Games 15 31 Maxmin and Minmax Strategies 15 32 Minimax Regret 18 33 Removal of Dominated Strategies 20 34 Rationalizability 23 35 Correlated Equilibrium 24 36 TremblingeHand Perfect Equilibrium 26 37 eeNash Equilibrium 27 38 Evolutionarin Stable Strategies 28 Games With Sequential Actions The Perfect information Extensive Form 31 41 De nition 31 42 Strategies and Equilibria 32 43 SubgameePerfect Equilibrium 44 Backward Induction 38 xii CONTENTS 5 Generalizing the Extensive Form Imperfect Information Games 41 51 De nition 41 52 Strategies and Equilibria 42 53 Sequential Equilibrium 45 Repeated and Stochastic Games 49 61 Finiter Repeated Games 49 62 In nitely Repeated Games 50 63 Stochastic Games 53 631 De nition 53 632 Strategies and Equilibria 54 Uncertainty About Payoffs Bayesian Games 71 De nition 711 Information Sets 59 712 Extensive Form with Chance Moves 60 713 Epistemic Types 61 72 Strategies and Equilibria 61 73 Computing Equilibria 64 74 Exrpwz Equilibria 67 Coalitional Game Theory 81 Coalitional Games with Transferable Utility 69 82 Classes of Coalitional Games 70 83 Analyzing Coalitional Games 72 831 The Shapley Value 73 832 The Core 75 History and References 79 References 83 Index 85 xiii Credits and Acknowledgments We thank the many colleagues including past and present graduate students who made substantial contributions Sam leong deserves special mention Chapter 8 coalitional games is based entirely on writing by him and he was also closely involved in the editing of this chapter Other colleagues either supplied material or provided useful council These include Felix Brandt Vince Conitzer Yossi Feinberg Jeff Fletcher Nando de Freitas Ben Galin Geoff Gordon Teg Grenager Albert Xin Jiang David Poole Peter Stone David Thompson Bob Wilson and Erik Zawadzki We also thank David Thompson for his assistance with the production of this book particularly with the index and bibliography Of course none of our colleagues are to be held accountable for any errors or other shortcomings We claim sole credit for those We thank Morgan amp Claypool and particularly our editor Mike Morgan for publishing Exxmz iaZx ofGame Tyeary and indeed for suggesting the project in the rst place This booklet weaves together excerpts from our much longer book Multiagem Syxz emxAZgaritbmic Gamer Tyeorez ic and Logical Foundatiam published by Cambridge University Press We thank CUP and particularly our editor Lauren Cowles for not only agreeing to but indeed supporting the publication of this booklet We are fortunate to be working with such stellar forwardelooking editors and publishers A great many additional colleagues contributed to the full Multiagent Systems book and we thank them there Since the project has been in the works in one way or another since 2001 it is possibleiindeed likelyithat we have failed to thank some people We apologize deeply in advance Last and certainly not least we thank our families for supporting us through this timeeconsuming project We dedicate this book to them with love Preface Game theory is the mathematical study of interaction among independent selfeinterested agents It is studied primarily by mnr i and i i i being its main initial application area So what business do two computer scientists have publishing a text on game theory The origin of this booklet is our much longer book Multiagent Syxz emx Algmiz bmic GamerTbearez ic and Logical Foundatiam which covers diverse theories relevant to the broad area of Multiagent Systems within Arti cial Intelligence Al and other areas of computer science Like many other disciplines computer scienceiand in particular Alihave been profoundly in uenced by game theory with much back and forth between the elds taking place in recent years And so it is not surprising to nd that Multiagent Syxz emx contains a fair bit of material on game theory That material can be crudely divided into two kinds basics and more advanced material relevant to Al and computer science This booklet weaves together the material of the rst kind Many textbooks on game theory exist some of them superb The serious student of game theory cannot afford to neglect those and in the nal chapter we provide some references But the audience for game theory has grown dramatically in recent years spanning disciplines as diverse as political science biology psychology linguistics sociologyiand indeed computer scienceiamong many others What has been missing is a relatively short introduction to the eld covering the common basis that any one interested in game theory is likely to require Such a text would minimize notation ruthlessly focus on essentials and yet not sacri ce rigor This booklet aims to ll this gap It is the book we wish we had had when we rst ventured into the eld We should clarifywhatwe mean by essentials We cover the main classes of games their representations and the main concepts used to analyze them soecalled solution concepts We cannot imagine any consumer of game theory who will not require a solid grounding in each of these topics We discuss them in suf cient depth to provide this grounding though of course much more can be said about each of them This leaves out many topics in game theory that are key in certain applications but not in all Some examples are computational aspects of games and computationally motivated representations learning in games and mechanism design in particular auction theory By omitting these topics we do not mean to suggest that they are unimportant only that they will not be equally relevant to everyone who nds use for xvi PREFACE game theory The reader of this booklet will likely be grounded in a particular discipline and will thus need to augment his or her reading with material essential to that discipline This book makes an appropriate text for an advanced undergraduate course or a game theory unit in a graduate course The book s Web site http wwwgtessentialsorg contains additional resources for both students and instructors A nal word on pronouns and gender We use male pronouns to refer to agents throughout the book We debated this between us7 not being happy with any of the alternatives In the end we reluctantly settled on the standard male convention rather than the reverse female convention or the grammaticallyedubious they We urge the reader not to read patriarchal intentions into our choice CHAPTER 1 Games in Normal Form Game theory studies what happens when selfeinterested agents interact What does it mean to say that agents are selfeinterested It does not necessarily mean that they want to cause harm to each other7 or even that they care only about themselves Tnstead7 it means that each agent has his own description of which states of the world he hkeswhich can include good things happening to other agents and that he acts in an attempt to bring about these states of the world The dominant approach to modeling an agent s interests is utiZiz y tyeary This theoretical approach quanti es an agent s degree of preference across a set of available alternatives7 and describes how these preferences change when an agent faces uncertainty about which alternative he will receive Speci cally7 a utiZityfumtim is a mapping from states of the world to real numbers These numbers are interpreted as measures of an agent s level of happiness in the given states When the agent is uncertain about which state of the world he faces7 his utility is de ned as the expected value of his utility function with respect to the appropriate probability distribution over states When agents have utility functions acting optimally in an uncertain environment is conceptually straightforwardiat least as long as the outcomes and their probabilities are known to the agent and can be succinctly represented However7 things can get considerably more complicated when the world contains two or more utilityemaXimiZing agents whose ac tions can affect each other s utilities To study such settings7 we must turn to noncooperative game theory The term noncooperative could be misleading7 since it may suggest that the theory applies exclusively to situations in which the interests of different agents con ict This is not the case7 although it is fair to say that the theory is most interesting in such situations By the same token7 in Chapter 8 we will see that coalition game tyemy also known as cooperative game tyeary does not apply only in situations in which agents interests align with each other The essential difference between the two branches is that in noncooperative game theory the basic modeling unit is the individual including his beliefs7 preferences7 and possible actions while in coalitional game theory the basic modeling unit is the group We will return to that later in Chapter 8 but for now let us proceed with the individualistic approach 2 ESSENTIALS OF GAME THEORY D O 74 73 73 FIGURE 11 The TCP user s aka the Prisoner s Dilemma 1 1 EXAMPLE THE TCP USER S GAME Let us begin with a simpler example to provide some intuition about the type of phenomena we would like to study Imagine that you and another colleague are the only people using the internet Internet traf c is governed by the TCP protocol One feature of TCP is the bathf mechanism if the rates at which you and your colleague send information packets into the network causes congestion you each back off and reduce the rate for a while until the congestion subsides This is how a correct implementation works A defective one however will not back off when congestion occurs You have two possible strategies C for using a correct implementation and D for using a defective one If both you and your colleague adopt C then your average packet delay is 1 ms Ifyou both adopt D the delay is 3 ms because of additional overhead at the network router Finally if one of you adopts D and the other adopts C then the D adopter will experience no delay at all but the C adopter will experience a delay of4 ms These consequences are shown in Figure 11 Your options are the two rows and your colleague s options are the columns In each cell the rst number represents your payoff or the negative of your delay and the second number represents your colleague s payoff1 Given these options what should you adopt C or D Does it depend on what you think your colleague will do Furthermore from the perspective of the network operator what kind of behavior can he expect from the two users Will any two users behave the same when presented with this scenario Will the behavior change if the network operator allows the users to communicate with each other before making a decision Under what changes to the delays would the users decisions still be the same How would the users behave if they have the opportunity to face this same decision with the same counterpart multiple times Do answers to these questions depend on how rational the agents are and how they view each other s rationality 1A more standard name for this game is the Prisoner s Dilemma we return to this in Section 131 GAMES IN NORMAL FORM 3 Game theory gives answers to many of these questions It tells us that any rational user when presented with this scenario once will adopt Diregardless of what the other user does It tells us that allowing the users to communicate beforehand will not change the outcome It tells us that for perfectly rational agents the decision will remain the same even if they play multiple times however if the number of times that the agents will play this is in nite or even uncertain we may see them adopt C 12 DEFINITION OF GAMES IN NORMAL FORM The normal form also known as the strategic or matrix form is the most familiar representation of strategic interactions in game theory A game written in this way amounts to a representation of every player s utility for every state of the world in the special case where states of the world depend only on the players combined actions Consideration of this special case may seem uninteresting However it turns out that settings inwhich the state of the world also depends on randomness in the environmenticalled Bayesian games and introduced in Chapter 7ican be reduced to much larger normaleform games Indeed there also exist normaleform reductions for other game representations such as games that involve an element of time extensiveeform games introduced in Chapter 4 Because most other representations of interest can be reduced to it the normaleform representation is arguably the most fundamental in game theory De nition 121 Normal form game A nite nipeixm normaleform game i3 it tupZe IV A it wyeie n Nix a niz e 361 afnpiizyen indexed by i I A A1 gtlt X A wyeie 14 i3 a niz e 36f of actions immiiizble to pizzer i Eileb vector it a1 an 6 Air eaZZedim action pro le I it m u wyeie u A gt E ii a ieaii wzluea utility or payoff function for piayer i A natural way to represent games is via an nedimensional matrix We already saw a twoe dimensional example in Figure 11 In general each row corresponds to a possible action for player 1 each column corresponds to a possible action for player 2 and each cell corresponds to one possible outcome Each player s utility for an outcome is written in the cell corresponding to that outcome with player 1 s utility listed rst 13 MORE EXAMPLES OF NORMALFORM GAMES 131 Prisoner s Dilemma Previously we saw an example of a game in normal form namely the Prisoner s or the TCP user s Dilemma However it turns out that the precise payoff numbers play a limited role The 4 ESSENTIALS OF GAME THEORY C D C aa bc D cb dd FIGURE 12 Any gt a gt d gt 17 de ne an instance ofPrisoner s Dilemma essence of the Prisoner s Dilemma example would not change if the 4 was replaced by 5 or iflOO was added to each ofthe numbers2 In its most general form the Prisoner s Dilemma is any normaleform game shown in Figure 12 in which E gt a gt d gt 53 Incidentally the name Prisoner s Dilemma for this famous gameetheoretic situation derives from the original story accompanying the numbers The players of the game are two prisoners suspected of a crime rather than two network users The prisoners are taken to separate interrogation rooms and each can either confess y to the crime or deny it or alternatively cooperate or defect If the payoff are all nonpositive their absolute values can be interpreted as the length ofjail term each ofprisoner will get in each scenario 132 Commonpayoff Games There are some restricted classes of normaleform games that deserve special mention The rst is the class ofcammmepayafgamm These are games in which for every action pro le all players have the same payoff De nition 131 Common payoffgame A commonepayoffgame i3 agamc in wyicbfm aZZ actianpm kx a 6 A1 gtlt X A andanypair afagcm x i j it is tbs me tmf 2111 tila Commonepayoff games are also called pure coordination gar153 or team gamer In such games the agents have no con icting interests their sole challenge is to coordinate on an action that is maximally bene cial to all As an example imagine two drivers driving towards each other in a country having no traf c rules and who must independently decide whether to drive on the left or on the right If the drivers choose the same side left or right they have some high utility and otherwise they have a low utility The game matrix is shown in Figure 13 2More generally under standard utility theory games are are insensitive to any positive af ne transformation of the payoffs This means that one can replace each payoffx by ax 5 for any xed real numbers a gt 0 and 5 3Under some de nitions there is the further requirement that a gt m which guarantees that the outcome C C 2 maximizes the sum of the agents utilities GAMES IN NORMAL FORM 5 Left Right Left 11 00 Right 0 0 11 FIGURE 13 Coordination game 133 Zerosum Games At the other end of the spectrum from pure coordination games he Zemrmm gamex which bearing in mind the comment we made earlier about positive af ne transformations are more properly called cami n mm gamex Unlike commonipayoff games7 constantisum games are meaningful primarily in the context of twoiplayer though not necessarily twoistrategy games De nition 132 Constant sum game A fwarphzyer norma farm game it constantisum tyere exixz x a camz ant 6 web tymffar eaeb xfmz egy pm Ze a 6 A1 X A it i3 tbe eme tmz 21101 1121 2 e For convenience7 when we talk of constantisum games going forward we will always assume that c 07 that is7 that we have a zeroisum game If commonipayoff games represent situations of pure coordination7 zeroisum games represent situations of pure competition one player s gain must come at the expense of the other player The reason zeroisum games are most meaningful for two agents is that if you allow more agents7 any game can be turned into a zeroisum game by adding a dummy player whose actions do not impact the payoffs to the other agents7 and whose own payoffs are chosen to make the sum of payoffs in each outcome zero A classical example ofa zeroisum game is the game ome ebing Penniex In this game7 each of the two players has a penny7 and independently chooses to display either heads or tails The two players then compare their pennies If they are the same then player 1 pockets both7 and otherwise player 2 pockets them The payoff matrix is shown in Figure 14 The popular children s game of Rock7 Paper7 Scissors7 also known as Rochambeau7 provides a threeistrategy generalization of the matchingipennies game The payoff matrix of this zeroisum game is shown in Figure 15 In this game7 each of the two players can choose either rock7 paper7 or scissors If both players choose the same action7 there is no winner and the utilities are zero Otherwise7 each of the actions wins over one of the other actions and loses to the other remaining action 6 ESSENTIALS OF GAME THEORY Heads Tails Heads 1 71 71 1 Tails 71 1 1 71 FIGURE 14 Matching Pennies game 134 Battle of the Sexes In general games tend to include elements of both coordination and competition Prisoner s Dilemma does although in a rather paradoxical way Here is another welliknown game that includes both elements In this game called Battle afth 559653 a husband and wife wish to go to the movies and they can select among two movies Lethal Weapon LW and Wondrous Love VVL They much prefer to go together rather than to separate movies but while the wife player 1 prefers LW the husband player 2 prefers WL The payoffmatrix is shown in Figure 16 We will return to this game shortly 14 STRATEGIES IN NORMALFORM GAMES We have so far de ned the actions available to each player in a game but not yet his set of 3tmfegiex or his available choices Certainly one kind of strategy is to select a single action and play it We call such a strategy a pure xtmz egy and we will use the notation we have already developed for actions to represent it We call a choice of pure strategy for each agent a purerx ategy pro le Rock Paper Scissors Rock 00 711 171 Paper 171 00 711 Scissors 711 1 71 0 0 FIGURE 15 Rock Paper Scissors game GAMES IN NORMAL FORM 7 Husband LW WL LW 2 1 0 0 Wife WL 0 O 1 2 FIGURE 16 Battle of the Sexes game Players could also follow another less obvious type of strategy randomizing over the set of available actions according to some probability distribution Such a strategy is called a mixed strategy Although it may not be immediately obvious why a player should introduce randomness into his choice of action in fact in a multiagent setting the role of mixed strategies is critical We de ne a mixed strategy for a normaleform game as follows De nition 141 Mixed strategy L51 N A 11 65 11 norma fmm gam5 andfm any 351 XZ51 l39lX 65 1 65 351 of 11 p706116iZi1 y di31 7i6111 im3 07157 X T6571 1 65 351 of mixed strategies for pkg57 i 13 S I39M4 De nition 142 Mixed strategy pro le T65 351 of mixedestrategy pro les 13 3imply 1 65 011715311171 product of 1 65 individmZ mix5a r31 m1 5gy 351 3 51 X X S By 311 we denote the probability that an action 11 will be played under mixed strategy 3 The subset of actions that are assigned positive probability by the mixed strategy 3 is called the 3uppar1 of3 De nition 143 Support T65 support of11 mixm 31 7111 5gy 3 for 11 p613 i 13 1 65 351 of p105 31 7111 5gi53 11l311 gt 0 Note that a pure strategy is a special case of a mixed strategy in which the support is a single action At the other end of the spectrum we have fuZZy mixm 31 7111 5gi53 A strategy is fully mixed if it has full support ie if it assigns every action a nonzero probability We have not yet de ned the payoffs of players given a particular strategy pro le since the payoff matrix de nes those directly only for the special case of pureestrategy pro les But the generalization to mixed strategies is straightforward and relies on the basic notion of decision theoryi5xp551 5d u1 iZi1 y lntuitively we rst calculate the probability of reaching each outcome 8 ESSENTIALS OF GAME THEORY given the strategy pro le and then we calculate the average of the payoffs of the outcomes weighted by the probabilities of each outcome Formally we de ne the expected utility as follows overloading notation we use u for both utility and expected utility De nition 144 Expected utility of a mixed strategy Given a norma farm game IV A 11 Me expected utiZiz y uifarplayer i aftbe mixedrxz mtegypm le 3 31 in i3 de ned m 113 Z i 1 5j 3939 1391 46A C H A P T E R 2 Analyzing Games From Optimality To Equilibrium Now that we have de ned what games in normal form are and what strategies are available to players in them the question is how to reason about such games In singleeagent decision theory the key notion is that of an aptimalxtmz egy that is a strategy that maximizes the agent s expected payoff for a given environment in which the agent operates The situation in the singleeagent case can be fraught with uncertainty since the environment might be stochastic partially observable and spring all kinds of surprises on the agent However the situation is even more complex in a multiagent setting In this case the environment includesior in many cases we discuss consists entirely ofiother agents all of whom are also hoping to maximize their payoffs Thus the notion of an optimal strategy for a given agent is not meaningful the best strategy depends on the choices of others Game theorists deal with this problem by identifying certain subsets of outcomes called relation concepts that are interesting in one sense or another In this section we describe two of the most fundamental solution concepts Pareto optimality and Nash equilibrium 2 PARETO OPTIMALITY First let us investigate the extent to which a notion of optimality can be meaningful in games From the point of view of an outside observer can some outcomes of a game be said to be better than others This question is complicated because we have no way of saying that one agent s interests are more important than another s For example it might be tempting to say that we should prefer outcomes in which the sum of agents utilities is higher However as remarked in Footnote 2 earlier we can apply any positive af ne transformation to an agent s utility function and obtain another valid utility function For example we could multiply all of player is payoffs by l000this could clearly change which outcome maximized the sum of agents utilities Thus our problem is to nd away of saying that some outcomes are better than others even when we only know agents utility functions up to a positive af ne transformation Imagine 10 ESSENTIALS OF GAME THEORY that each agent s utility is a monetary payment that you will receive but that each payment comes in a different currency and you do not know anything about the exchange rates Which outcomes should you prefer Observe that while it is not usually possible to identify the best outcome there are situations in which you can be sure that one outcome is better than another For example it is better to get 10 units of currency A and 3 units of currency B than to get 9 units of currency A and 3 units of currency B regardless of the exchange rate We formalize this intuition in the following de nition De nition 211 Pareto domination Strategy profie 3 Pareto dominates xtrategypra te 3 faraZZi 6 IV 193 2 a3 andtbere exixtx mmej e Nfar wyieb le3 gt le3 In other words in a Paretoedominated strategy pro le some player can be made better off without making any other player worse off Observe that we de ne Pareto domination over strategy pro les not just action pro les Pareto domination gives us a partial ordering over strategy pro les Thus in answer to our question before we cannot generally identify a single best outcome instead we may have a set of noncomparable optima De nition 212 Pareto optimality Strategy profie 3 i3 Pareto optimal or strictly Pareto ef cient tyere a aex not exixt anatber xtrategypra te 3 e S tyat Pareto dominatex 3 We can easily draw several conclusions about Pareto optimal strategy pro les First every game must have at least one such optimum and there must always exist at least one such optimum in which all players adopt pure strategies Second some games will have mule tiple optima For example in zeroesum games a strategy pro les are strictly Pareto ef cient Finally in commonepayoff games all Pareto optimal strategy pro les have the same payoffs 22 DEFINING BEST RESPONSE AND NASH EQUILIBRIUM Now we will look at games from an individual agent s point of view rather than from the vantage point of an outside observer This will lead us to the most in uential solution concept in game theory the Naxb equilibrium Our rst observation is that if an agent knew how the others were going to play his stratee gic problem would become simple Speci cally he would be left with the singleeagent problem of choosing a utilityemaximizing action Formally de ne 3 31 31 341 3 a strategy pro le 3 without agent i s strategy Thus we can write 3 3 3 If the agents other than i whom we denote i were to commit to play 3 a utilityemaximizing agent i would face the problem of determining his best response ANALYZING GAME52FROM OPTIMALITYTO EQUILIBRIUNI 11 De nition 221 Best response PZayai 139 best response to tba 31 iaz agy pm Za 3 13 a mixaa 3tiaz agy 33 e S 3aab tyaz a3 3 Z a3i 3fai aZZ3 az agia3 3 6 5 The best response is not necessarily unique Indeed except in the extreme case in which there is a unique best response that is a pure strategy the number of best responses is always in nite When the support ofa best response 3 includes two or more actions the agent must be indifferent among themiotherwise the agent would prefer to reduce the probability of playing at least one of the actions to zero But thus any mixture of these actions must also be a best response not only the particular mixture in 3quot Similarly if there are two pure strategies that are individually best responses any mixture of the two is necessarily also a best response Of course in general an agent will not know what strategies the other players will adopt Thus the notion of best response is not a solution conceptiit does not identify an interesting set of outcomes in this general case However we can leverage the idea of best response to de ne what is arguably the most central notion in noncooperative game theory the Nash equilibrium De nition 222 Nash equilibrium A 3 az agypio Za 3 31 3 13 a Nash equilibrium iffai aZZ again i 3 13 a ban 7a3poi13a to 3 lntuitively a Nash equilibrium is a 31 abla strategy pro le no agent would want to change his strategy if he knew what strategies the other agents were following We can divide Nash equilibria into two categories strict and weak depending on whether or not every agent s strategy constitutes a ain39gaa best response to the other agents strategies De nition 223 Strict Nash A3 az agypm Za 3 31 3 13 a sza7 a again i ai ta foi all3tiaz agia33 75 3 a3 3 gt Zli 3 De nition 224 Weak Nash A3 afagypm Za3 31 3 13 a iffai aZZ again i ai ta foi all3tiaz agia33 75 3 a3 37 Z a3 37 ai ta 3 i3 nota3z 7in Na3b aqaiZiMiam lntuitively weak Nash equilibria are less stable than strict Nash equilibria because in the former case at least one player has a best response to the other players strategies that is not his equilibrium strategy Mixedestrategy Nash equilibria are necessarily always weak while pureestrategy Nash equilibria can be either strict or weak depending on the game 23 FINDING NASH EQUILIBRIA Consider again the Battle of the Sexes game We immediately see that it has two pureestrategy Nash equilibria depicted in Figure 21 We can check that these are Nash equilibria by con rming that whenever one of the players plays the given pure strategy the other player would only lose by deviating 12 ESSENTIALS OF GAME THEORY LW WL LW 21 00 WL 00 1 2 FIGURE 21 Pureestrategy Nash equilibria in the Battle of the Sexes game Are these the only Nash equilibria The answer is no although they are indeed the only pureestrategy equilibria there is also another mixedestrategy equilibrium In general it is tricky to compute a games mixedestrategy equilibria This is a weighty topic lying outside the scope of this booklet but see for example Chapter 4 of Shoham and LeytoneBrown 2008 However we will show here that this computational problem is easy when we know or can guess the mppan of the equilibrium strategies particularly so in this small game Let us now guess that both players randomize and let us assume that husband s strategy is to play LW with probability p and WL with probability 1 p Then if the wife the row player also mixes between her two actions she must be indifferent between them given the husband s strategy Otherwise she would be better off switching to a pure strategy according to which she only played the better of her actions Then we can write the following equations UwifeLW UwafeWVL 2p01 p0p11 p P We get the result that in order to make the wife indifferent between her actions the husband must choose LW with probability 13 and WL with probability 23 Of course since the husband plays a mixed strategy he must also be indifferent between his actions By a similar calculation it can be shown that to make the husband indifferent the wife must choose LW with probability 23 and WL with probability 13 Now we can con rm that we have indeed found an equilibrium since both players play in a way that makes the other indifferent between their actions they are both best responding to each other Like all mixedestrategy equilibria this is aweak Nash equilibrium The expected payoffofboth agents is 23 in this equilibrium which means that each of the pureestrategy equilibria Paretoedominates the mixedestrategy equilibrium ANALYZING GAMESFROM OPTIMALITYTO EQUILIBRIUlVl 13 Heads Tails Heads 1 71 71 1 Tails 71 1 1 71 FIGURE 22 The Matching Pennies game Earlier we mentioned brie y that mixed strategies play an important role The previous example may not make it obvious but now consider again the Matching Pennies game reproi duced in Figure 22 It is not hard to see that no pure strategy could be part ofan equilibrium in this game of pure competition Therefore likewise there can be no strict Nash equilibrium in this game But using the aforementioned procedure the reader can verify that again there exists a mixedistrategy equilibrium in this case each player chooses one of the two available actions with probability 1 2 We have now seen two examples in which we managed to nd Nash equilibria three equilibria for Battle of the Sexes one equilibrium for Matching Pennies Did we just luck out Here there is some good newsiit was not just luck Theorem 231 Nash 1951 EMUgame witb a m39z e number oprajen and action pro ex but at emf one Nmb equi brium The proof of this result is somewhat involved and we do not discuss it here except to mention that it is typically achieved by appealing to a xedrpainf tyearem from mathematics such as those due to Kakutani and Brouwer a detailed proof appears for example in Chapter 3 of Shoham and LeytoniBrown 2008 Nash s theorem depends critically on the availability of mixed strategies to the agents Many games such as Matching Pennies have only mixedistrategy equilibria However what does it mean to say that an agent plays a mixedistrategy Nash equilibrium Do players really sample probability distributions in their heads Some people have argued that they really do One welliknown motivating example for mixed strategies involves soccer speci cally a kicker and a goalie getting ready for a penalty kick The kicker can kick to the left or the right and the goalie can jump to the left or the right The kicker scores ifand only ifhe kicks to one side and the goalie jumps to the other this is thus best modeled as Matching Pennies Any pure strategy on the part ofeither player invites a winning best response on the part ofthe other player It is only by kicking or jumping in either direction with equal probability goes the argument that the opponent cannot exploit your strategy 14 ESSENTIALS OF GAME THEORY Of course this argument is not uncontroversial In particular it can be argued that the strategies of each player are deterministic but each player has uncertainty regarding the other player s strategy This is indeed a second possible interpretation of mixed strategies the mixed strategy of player i is everyone else s assessment of how likely i is to play each pure strategy In equilibrium is mixed strategy has the further property that every action in its support is a best response to player is beliefs about the other agents strategies Finally there are two interpretations that are related to learning in multiagent systems In one interpretation the game is actually played many times repeatedly and the probability of a pure strategy is the fraction of the time it is played in the limit its soecalled empirical equemy In the other interpretation not only is the game played repeatedly but each time it involves two different agents selected at random from a large population In this interpretation each agent in the population plays a pure strategy and the probability of a pure strategy represents the fraction of agents playing that strategy CHAPTER 3 Further Solution Concepts for NormalForm Games As described earlier at the beginning of Chapter 2 we reason about multiplayer games using relation cancepz x principles according to which we identify interesting subsets of the outcomes of a game While the most important solution concept is the Nash equilibrium there are also a large number ofothers only some ofwhich we will discuss here Some ofthese concepts are more restrictive than the Nash equilibrium some less so and some noncomparable In subsequent chapters we will introduce some additional solution concepts that are only applicable to game representations other than the normal form 31 MAXMIN AND MINMAX STRATEGIES The maxmin xtmz egy ofplayer i in an neplayer generalesum game is a not necessarily unique and in general mixed strategy that maximizes i s worstecase payoff in the situation where all the other players happen to play the strategies which cause the greatest harm to i The maxmin thm or xecmity 57150 of the game for player i is that minimum amount of payoff guaranteed by a maxmin strategy De nition 311 Maxmin Tbs maxmin strategyfm ay i 13 arg maxh minL 296 37 and tbs maxmin ValuefarpZayeri i3 maxh mian 193 3 Although the maxmin strategy is a concept that makes sense in simultaneousemove games it can be understood through the following temporal intuition The maxmin strategy is is best choice when rst 139 must commit to a possibly mixed strategy and then the remaining agents i observe this strategy but not is action choice and choose their own strategies to minimize is expected payoff In the Battle of the Sexes game Figure 16 the maxmin Value for either player is 2 3 and requires the maximizing agent to play a mixed strategy Do you see why While it may not seem reasonable to assume that the other agents would be solely interested in minimizing i s utility it is the case that i plays a maxmin strategy and the other 16 ESSENTIALS OF GAME THEORY agents play arbitrarily i will still receive an expected payoff of at least his maxmin value This means that the maxmin strategy is a sensible choice for a conservative agent who wants to maximize his expected utility without having to make any assumptions about the other agents such as that they will act rationally according to their own interests or that they will draw their action choices from some known distributions The minmax xtmz egy and minmax value play a dual role to their maxmin counterparts In twoeplayer games the minmax strategy for player i against player i is a strategy that keeps the maximum payoff of i at a minimum and the minmax value of player i is that minimum This is useful when we want to consider the amount that one player can punish another without regard for his own payoff Such punishment can arise in repeated games as we will see in Section 6 The formal de nitions follow De nition 312 Minmax two player In a tworplnyergnme tbe minmax strategy for phler i ngninxt phler i ii argminh maxL ux3 and pizzer i x minmax value 13 min max 71313 37139 In neplayer games with n gt 2 de ning player is minmax strategy against player is a bit more complicated This is because i will not usually be able to guarantee that achieves minimal payoff by acting unilaterally However if we assume that all the players other than choose to gang up on fiand that they are able to coordinate appropriately when there is more than one strategy pro le that would yield the same minimal payoff for jithen we can de ne minmax strategies for the neplayer case De nition 313 Minmax n player In on nepher game tbe minmax strategy for phler i ngninxt zzer 75 i ii iii component of tbe mixedrxtmtegy pro Ze Lj in tbe exprem39on arg mint max le39339 31 wyere denotex tbe 36f of nyen otber tmn 143 before tbe minmax valueforpZnyerj i3 mint max ujJj Lj As before we can give intuition for the minmax value through a temporal setting Imagine that the agents i must commit to a possibly mixed strategy pro le to which i can then play a best response Player i receives his minmax value ifplayers i choose their strategies in order to minimize is expected utility after he plays his best response In twoeplayer games a player s minmax value is always equal to his maxmin value For games with more than two players a weaker condition holds a player s maxmin value is always less than or equal to his minmax value Can you explain why this is Since neither an agent s maxmin strategy nor his minmax strategy depend on the strategies that the other agents actually choose the maxmin and minmax strategies give rise to solution concepts in a straightforward way We will call a mixedestrategy pro le 3 31 32 a maxmin xtmz egypro le ofa given game ifxl is a maxmin strategy for player 1 32 is a maxmin FURTHER SOLUTION CONCEPTS FOR NORMALFORM GAMES 17 strategy for player 2 and so on In twoeplayer games we can also de ne minmax xtraz egypm lex analogously 1n twoeplayer zeroesum games there is a very tight connection between minmax and maxmin strategy pro les Furthermore these solution concepts are also linked to the Nash equilibrium Theorem 314 Minimax theorem von Neumann 1928 In any niz e twor ayer 2670 3am game in any Naxb equilibriaml eaeb pZayer receivex a payojfz bat i3 eqaal to 501 lyix maxmin vaZae and lyix minmax valae Why is the minmax theorem important It demonstrates that maxmin strategies mine max strategies and Nash equilibria coincide in twoeplayer zeroesum games In particular Theorem 314 allows us to conclude that in twoeplayer zeroesum games 1 Each player s maxmin value is equal to his minmax value By convention the maxmin value for player 1 is called the value aftbe game 2 For both players the set of maxmin strategies coincides with the set of minmax strategies and 3 Any maxmin strategy pro le or equivalently minmax strategy pro le is a Nash equilibrium Furthermore these are all the Nash equilibria Consequently all Nash equilibria have the same payoff vector namely those in which player 1 gets the value of the game For example in the Matching Pennies game in Figure 14 the value of the game is 0 The unique Nash equilibrium consists of both players randomizing between heads and tails with equal probability which is both the maxmin strategy and the minmax strategy for each player Nash equilibria in zeroesum games can be viewed graphically as a saddle in a highe dimensional space At a saddle point any deviation of the agent lowers his utility and increases the utility of the other agent It is easy to visualize in the simple case in which each agent has two pure strategies In this case the space of strategy pro les can be viewed as all points on the square between 00 and 11 with each point in the square describing the mixed strategy of each agent The payoff to player 1 and thus the negative of the payoff to player 2 is indeed a saddleeshaped threeedimensional surface above this square Figure 31 left gives a pictorial example illustrating player 1 s expected utility in Matching Pennies as a function of both players probabilities of playing heads Figure 31 right adds a plane at z 0 to make it 1The attentive reader might wonder how a theorem from 1928 can use the term Nash equilibrium when Nash s work was published in 1950 Von Neumann used different terminology and proved the theorem in a different way however the given presentation is probably clearer in the context of modern game theory 18 ESSENTIALS OF GAME THEORY 1 N 1 Player 1393 player 139s expected 0 expected 0 utility utility O5 g I o5 H O i i i 39 I 39 n i I I I i n I I I I 39 39 39 39 39 39 39 39 39 I 39 39 i r 025 39i player 139s 1 i 39 025 Player 1393 03975 1 i 025 Prheads 0 player 28 Prheads 0 player 28 Prheads Prheads FIGURE 31 The saddle point in Matching Pennies with and without a plane at z O easier to see that it is an equilibrium for both players to play heads 50 of the time and that zero is both the maxmin value and the minmaX value for both players 32 MINIMAX REGRET We argued earlier that agents might play maxmin strategies in order to achieve good payoffs in the worst case even in a game that is not zero sum However consider a setting in which the other agent is not believed to be malicious but is instead believed to be entirely unpredictable Crucially in this section we do not approach the problem as Bayesians saying that agent i s beliefs can be described by a probability distribution instead we use a preBayesian model in which i does not know such a distribution and indeed has no beliefs about it In such a setting it can also make sense for agents to care about minimizing their worstcase 033 rather than simply maximizing their worstcase payoff Consider the game in Figure 32 Interpret the payoff numbers as pertaining to agent 1 only and let 6 be an arbitrarily small positive constant For this example it does not matter what agent 2 s payoffs a 5 c and a7 are and we can even imagine that agent 1 does not know these values Indeed this could be one reason why player 1 would be unable to form beliefs about how player 2 would play even if he were to believe that player 2 was rational Let us imagine that agent 1 wants to determine a strategy to follow that makes sense despite his uncertainty about player 2 First agent 1 might play his maxmin or safety level strategy In this game it is easy to see that player 1 s maxmin strategy is to play B this is because player 2 s minmaX strategy is to play R and B is a best response to R If player 1 does not believe that player 2 is malicious however he might instead reason as follows If player 2 were to play R then it would not matter very much how player 1 plays the most he could lose by playing the wrong way would be 6 On the other hand if player 2 FURTHER SOLUTION CONCEPTS FORNORMAL FORM GAMES 19 L R 1007 a 17gb B 2 c 1 FIGURE 32 A game for contrasting maxmin with minimax regret The numbers refer only to player 1 s payoffs e is an arbitrarily small positive constant Player 2 s payoffs are the arbitrary and possibly unknown constants a 17 c and a were to play L then player 1 s action would be very signi cant if player 1 were to make the wrong choice here then his utility would be decreased by 98 Thus player 1 might choose to play T in order to minimize his worstecase loss Observe that this is the opposite of what he would choose if he followed his maxmin strategy Let us now formalize this idea We begin with the notion of regret De nition 321 Regret An agent i 3 regret for pZaying an action a tbe otber agentx adopt action profie a if de ned a3 max ME 70 Mr Li ax eA Tn words7 this is the amount that i loses by playing a rather than playing his best response to a Of course7 i does not know what actions the other players will take however7 he can consider those actions that would give him the highest regret for playing a De nition 322 Max regret An agent i x maximum regretforplaying an action a i3 a e nea 13 max max w an m a 471647 ale4 This is the amount that i loses by playing a rather than playing his best response to a7 if the other agents chose the ac that makes this loss as large as possible Finally7 i can choose his action in order to minimize this worstecase regret De nition 323 Minimax regret Minimax regret actionx for agenti are de ned a3 arg min max max iga7 aci aa7 aci 416A 11 614 416A 20 ESSENTIALS OF GAME THEORY Thus an agent s minimax regret action is an action that yields the smallest maximum regret Minimax regret can be extended to a solution concept in the natural way by identifying action pro les that consist of minimax regret actions for each player Note that we can safely restrict ourselves to actions rather than mixed strategies in the de nitions above ie maximizing over the sets A and AL instead of S and S7 because of the linearity of expectation We leave the proof of this fact as an exercise 33 REMOVAL OF DOMINATED STRATEGIES We rst de ne what it means for one strategy to dominate another lntuitively one strategy dominates another for a player i if the rst strategy yields i a greater payoff than the second strategy for any strategy pro le of the remaining players2 There are however three gradations of dominance which are captured in the following de nition De nition 331 Domination Let 3 ana 3 be two xtiategiex ofptayer i and Si tbe 361 ofaZZ xtiategypio lex oftbe remaining ptayerx Tyen J 3 strictly dominates if iffoi a3 6 S7 it i3 tbe eaxe tyat 296 37 gt 29 6f 37 2 3 weakly dominates 3 ffoi attxi 6 S9 it i3 tbe eaxe tyat 296 37 Z 296 37 and for at eaxt one 3 6 S9 it it tbe eaxe tyat 296 39 gt 296 39 3 3 very weakly dominates 3 iffoi a3 6 S7 it i3 tbe eaxe tyat 296 37 2 296 37 If one strategy dominates all others we say that it is strongly weakly or very weakly a ominant De nition 332 Dominant strategy A strategy i3 strictly resp weakly very weakly dome inantfor an agent it xtiietty weakty very weakly a ominatex any otber xtiategyfoi tyat agent It is obvious that a strategy pro le 61 3n inwhich every 3 is dominant for player i whether strictly weakly or very weakly is a Nash equilibrium Such a strategy pro le forms what is called an eqaiiioiiam in dominant xtiategiex with the appropriate modi er 6tiietZy etc An equilibrium in strictly dominant strategies is necessarily the unique Nash equilibrium For example consider again the Prisoner s Dilemma game For each player the strategy D is strictly dominant and indeed D D is the unique Nash equilibrium Indeed we can now explain the dilemma which is particularly troubling about the Prisoner s Dilemma game the outcome reached in the unique equilibrium which is an equilibrium in strictly dominant strategies is also the only outcome that is not Pareto optimal 2Note that here we consider strategy domination from one individual player s point of view thus this notion is unrelated to the concept of Pareto domination discussed earlier FURTHER SOLUTION CONCEPTS FOR NORMAL FORM GAMES 21 L O R D 01 41 00 FIGURE 33 A game with dominated strategies Games with dominant strategies play an important role in game theory7 especially in games handcrafted by experts This is true in particular in meemnixm dexign an area of game theory not covered in this booklet I Iowever7 dominant strategies are rare in naturally occurring games More common are dominated strategies De nition 333 Dominated strategy A xtmz egy 3 ix strictly weakly very weakly domie natedfor an agem i Jame atber xtmz egy 3 xtrie y weakly very weakly dominatex 3 Let us focus for the moment on strictly dominated strategies Intuitively7 all strictly dominated pure strategies can be ignored7 since they can never be best responses to any moves by the other players There are several subtleties7 however First7 once a pure strategy is eliminated7 another strategy that was not dominated can become dominated And so this process of elimination can be continued Second7 a pure strategy may be dominated by a mixture of other pure strategies without being dominated by any of them independently To see this7 consider the game in Figure 33 Column R can be eliminated7 since it is dominated by7 for example7 column L We are left with the reduced game in Figure 34 In this game M is dominated by neither U nor D7 but it is dominated by the mixed strategy that selects either U or D with equal probability Note7 however7 that it was not dominated before the elimination of the R column And so we are left with the maximally reduced game in Figure 35 This yields us a solution concept the set of all strategy pro les that assign zero probability to playing any action that would be removed through iterated removal of strictly dominated strategies Note that this is a much weaker solution concept than Nash equilibriumithe set of strategy pro les will include all the Nash eclluilibria7 but it will include many other 22 ESSENTIALS OF GAME THEORY D 01 41 FIGURE 34 The game from Figure 33 after removing the dominated strategy R mixed strategies as well In some games7 it will be equal to S7 the set of all possible mixed strategies Since iterated removal of strictly dominated strategies preserves Nash equilibria7 we can use this technique to computational advantage In the previous example7 rather than computing the Nash equilibria in the original 3 X 3 game7 we can now compute them in this 2 X 2 game7 applying the technique described earlier In some cases the procedure ends with a single cell y this is the case7 for example7 with the Prisoner s Dilemma game In this case we say that the game is mlwzble by iterated elimination Clearly7 in any nite game7 iterated elimination ends after a nite number of iterations One might worry that7 in general7 the order of elimination might affect the nal outcome It turns out that this elimination order does not matter when we remove xtrithy dominated strategies This is called a Claim371303357 property However7 the elimination order can make a difference to the nal reduced game if we remove weakly or very weakly dominated strategies Which avor of domination should we concern ourselves with In fact7 each avor has advantages and disadvantages7 which is why we present all of them here Strict domination D 01 41 FIGURE 35 The game from Figure 34 after removing the dominated strategy M FURTHER SOLUTION CONCEPTS FOR NORMALFORM GAMES 23 leads to betterebehaved iterated elimination it yields a reduced game which is independent of the elimination order and iterated elimination is more computationally manageable There is also a further related advantage that we will defer to Section 34 Weak domination can yield smaller reduced games but under iterated elimination the reduced game can depend on the elimination order Very weak domination can yield even smaller reduced games but again these reduced games depend on elimination order Furthermore very weak domination does not impose a strict order on strategies when two strategies are equivalent each very weakly dominates the other For this reason this last form of domination is generally considered the least important 34 RATIONALIZABILITY A strategy is mtimaliza e if a perfectly rational player could justi ably play it against one or more perfectly rational opponents lnformally a strategy pro le for player i is rationaliZable if it is a best response to some beliefs that i could have about the strategies that the other players will take The wrinkle however is that i cannot have arbitrary beliefs about the other players actionsihis beliefs must take into account his knowledge of tyeir rationality which incorporates their knowledge of lyix rationality their knowledge of his knowledge of their rationality and so on in an in nite regress A rationaliZable strategy pro le is a strategy pro le that consists only of rationaliZable strategies For example in the Matching Pennies game given in Figure 14 the pure strategy 65115173 is rationalizable for the row player First the strategy 735111175 is a best response to the pure strategy lymch by the column player Second believing that the column player would also play 6511ch is consistent with the column player s rationality the column player could believe that the row player would play 17111 to which the column player s best response is beads It would be rational for the column player to believe that the row player would play 11173 because the column player could believe that the row player believed that the column player would play 171173 to which 111in is a best response Arguing in the same way we can make our way up the chain of beliefs However not every strategy can be justi ed in this way For example considering the Prisoner s Dilemma game given in Figure 11 the strategy C is not rationaliZable for the row player because C is not a best response to any strategy that the column player could play Similarly consider the game from Figure 33 M is not a rationaliZable strategy for the row player although it it a best response to a strategy ofthe column player s R there do not exist any beliefs that the column player could hold about the row player s strategy to which Rwould be a best response Because of the in nite regress the formal de nition of rationaliZability is somewhat involved however it turns out that there are some intuitive things that we can say about rae tionalizable strategies First Nash equilibrium strategies are always rationaliZable thus the set 24 ESSENTIALS OF GAME THEORY of rationaliZable strategies and strategy pro les is always nonempty Second7 in twoeplayer games rationaliZable strategies have a simple characterization they are those strategies that survive the iterated elimination of strictly dominated strategies In neplayer games there exist strategies which survive iterated removal of dominated strategies but are not rationalizable In this more general case7 rationaliZable strategies are those strategies which survive iterae tive removal of strategies that are never a best response to any strategy pro le by the other players We now de ne rationalizability more formally First we will de ne an in nite sequence of possibly mixed strategies S7 S S127 for each player i Let S S thus7 for each agent i7 the rst element in the sequence is the set ofall is mixed strategies Let CH S denote the convex hull of a set S the smallest convex set containing all the elements of S Now we de ne Sf as the set of all strategies 3 e Sfil for which there exists some 3 e 111 CH Sffl such that for all 3 e Sfil ux Li Z uxf Li That is7 a strategy belongs to Sf ifthere is some strategy 3 for the other players in response to which 3 is at least as good as any other strategy from Sfil The convex hull operation allows i to best respond to uncertain beliefs about which strategies from Sffl another player will adopt CHSf 1 is used instead of Haj 1 the set of all probability distributions over S11517 because the latter would allow consideration of mixed strategies that are dominated by some pure strategies for Player i could not believe that would play such a strategy because such a belief would be inconsistent with is knowledge of j s rationality Now we de ne the set of rationaliZable strategies for player i as the intersection of the sets S7 S S127 De nition 341 Iquot 39 quot 039 Tbs i 4 quot strategies for piayei i are 020 5 35 CORRELATED EQUILIBRIUM The caireim m equilibrium is a solution concept which generalizes the Nash equilibrium Some people feel that this is the most fundamental solution concept of all3 In a standard game7 each player mixes his pure strategies independently For example7 consider again the Battle of the Sexes game reproduced here as Figure 36 and its mixede strategy equilibrium As we saw in Section 23 this game s unique mixedestrategy equilibrium yields each player an expected payoff of 2 3 But now imagine that the two players can observe the result 3One Nobeleprizeewinning game theorist Re Myerson has gone so far as to say that if there is intelligent life on other planets in a majority of them they would have discovered correlated equilibrium before Nash equilibrium FURTHER SOLUTION CONCEPTS FOR NORMAL FORM GAMES 25 LW WL LW 21 00 WL 00 12 FIGURE 36 Battle of the Sexes game of a fair coin flip and can condition their strategies based on that outcome They can now adopt strategies from a richer set for example they could choose VVL if heads LW if tails Indeed this pair forms an equilibrium in this richer strategy space given that one player plays the strategy the other player only loses by adopting another Furthermore the expected payoff to each player in this soecalled correlated equilibrium is 5 gtk 2 5 gtk 1 15 Thus both agents receive higher utility than they do under the mixedestrategy equilibrium in the uncorrelated case which had expected payoff of 2 3 for both agents and the outcome is fairer than either of the pureestrategy equilibria in the sense that the worsteoff player achieves higher expected utility Correlating devices can thus be quite useful The aforementioned example had both players observe the exact outcome of the coin ip but the general setting does not require this Generally the setting includes some random variable the external event with a commonly known probability distribution and a private signal to each player about the instantiation of the random variable A player s signal can be correlated with the random variables value and with the signals received by other players without uniquely identifying any of them Standard games can be viewed as the degenerate case in which the signals of the different agents are probabilistically independent To model this formally consider 11 random variables with a joint distribution over these variables Imagine that nature chooses according to this distribution but reveals to each agent only the realized value of his variable and that the agent can condition his action on this value4 De nition 351 Correlated equilibrium Given an niggentgame G A u a corre lated equilibrium ii a tupZe v n a wyere 1 ii a tupZe ofmmz om variabZex v 2 v1 v witb 7e3peez ive domaim D 2 D1 D 7 1393 ajoim dix ibm im over 1 a 71 an is a vector afmappingx a D gt Ag mtde eaeb agenti and every mapping a D gt A it i3 tbe eme Mat 2 mm 1011 adgtgt 2 2 mm a dl aw 46D 46D 4This construction is closely related to one used later in the book in connection with Bayesian Games in Chapter 7 26 ESSENTIALS OF GAME THEORY Note that the mapping is to an action that is to a pure strategy rather than a mixed one One could allow a mapping to mixed strategies but that would add no greater generality Do you see why Clearly for every Nash equilibrium we can construct an equivalent correlated equilibrium in the sense that they induce the same distribution on outcomes Theorem 352 For every Nuxb equilibrium 7quot tbere exim u earreprmling correlated equilibrium 7 The proof is straightforward Roughly we can construct a correlated equilibrium from a given Nash equilibrium by letting each D A and letting the joint probability distribution be 7Zlt6lgt HiEN rtquot5b Then we choose a as the mapping from each cl to the corresponding ui When the agents play the strategy pro le a the distribution over outcomes is identical to that under 7quot Because the vi s are uncorrelated and no agent can bene t by deviating from 7quot a is a correlated equilibrium On the other hand not every correlated equilibrium is equivalent to a Nash equilibrium the BattleeofetheSexes example given earlier provides a countereexample Thus correlated equilibrium is a strictly weaker notion than Nash equilibrium Finally we note that correlated equilibria can be combined together to form new cor related equilibria Thus if the set of correlated equilibria of a game C does not contain a single element it is in nite lndeed any convex combination of correlated equilibrium payoffs can itself be realized as the payoff pro le of some correlated equilibrium The easiest way to understand this claim is to imagine a public random device that selects which of the correlated equilibria will be played next another random number is chosen in order to allow the chosen equilibrium to be played Overall each agent s expected payoff is the weighted sum of the payoffs from the correlated equilibria that were combined Since no agent has an incentive to deviate regardless of the probabilities governing the rst random device we can achieve any convex combination of correlated equilibrium payoffs Finally observe that having two stages of random number generation is not necessary we can simply derive new domains D and a new joint probability distribution 7 from the D s and It s of the original correlated equilibria and so perform the random number generation in one step 36 TREMBLINGHAND PERFECT EQUILIBRIUM Another important solution concept is the tremblingrbumlperfeez equilibrium or simply perfect equilibrium While rationaliZability is a weaker notion than that of a Nash equilibrium pere fection is a stronger one Several equivalent de nitions of the concept exist In the following de nition recall that a fully mixed strategy is one that assigns every action a strictly positive probability FURTHER SOLUTION CONCEPTS FORNORMAL FORM GAMES 27 De nition 361 Trembling hand perfect equilibrium A mixedn r z egy S ii a tremblinge hand perfect equilibrium oft normai rm game G ifz bere exixz x a requeme SO Sl mixedrxtmz egypra lex web tmz limnaoo S S imde tmz far eaeb SZ in tbe requeme and eaeb pZayer i tbe xtrm egy 3 ix a 5le rexpome to tbe xtmz egiex 3 1 Perfect equilibria are an involved topic and relate to other subtle re nements of the Nash equilibrium such as the proper equilibrium The notes at the end of the booklet point the reader to further readings on this topic We should however at least explain the term trembling hand One way to think about the concept is as requiring that the equilibrium be robust against slight errors trembles ion the part of players In other words one s action ought to be the best response not only against the opponents equilibrium strategies but also against small perturbation of those However since the mathematical de nition speaks about arbitrarily small perturbations whether these trembles in fact model player fallibility or are merely a mathematical device is open to debate 37 6 NASH EQUILIBRIUM Another solution concept reflects the idea that players might not care about changing their strategies to a best response when the amount of utility that they could gain by doing so is very small This leads us to the idea of an eeNash equilibrium De nition 371 6 Nash Fix 6 gt 0 A xtrm egypro ie 3 31 in ii an eerb equilibr rium iffar aliagem x i andfara xtmz egiex 3 75 3 296 37 Z 296 37 6 This concept has various attractive properties eeNash equilibria always existquot indeed every Nash equilibrium is surrounded by a region of eeNash equilibria for any 6 gt 0 The argument that agents are indifferent to suf ciently small gains is convincing to many Further the concept can be computationally useful algorithms that aim to identify eeNash equilibria need to consider only a nite set of mixedestrategy pro les rather than the whole continuous space Ofcourse the size ofthis nite set depends on both 6 and on the game s payoffs Since computers generally represent real numbers using a oatingepoint approximation it is usually the case that even methods for the exact computation of Nash equilibria actually nd only eeequilibria where e is roughly the machine precision on the order of10 16 or less for most modern computers eeNash equilibria are also important to multiagent learning algorithms not discussed in this booklet However eeNash equilibria also have several drawbacks First although Nash equilibria are always surrounded by eeNash equilibria the reverse is not true Thus a given eeNash equilibrium is not necessarily close to any Nash equilibrium This undermines the sense in 28 ESSENTIALS OF GAME THEORY D 1 1 500500 27 FIGURE 37 A game with an interesting eeNash equilibrium which eeNash equilibria can be understood as approximations of Nash equilibria Consider the game in Figure 37 This game has a unique Nash equilibrium of D R which can be identi ed through the iterated removal of dominated strategies D dominates U for player 1 on the removal of U R dominates L for player 2 D R is also an eeNash equilibrium of course However there is also another eeNash equilibrium U L This game illustrates two things First neither player s payoff under the eeNash equilibrium is within 6 of his payoff in a Nash equilibrium indeed in general both players payoffs under an eeNash equilibrium can be arbitrarily less than in any Nash equilibrium The problem is that the requirement that player 1 cannot gain more than 6 by deviating from the eeNash equilibrium strategy pro le of U L does not imply that pkg572 would not be able to gain more than 6 by best responding to player 1 s deviation Second some eeNash equilibria might be very unlikely to arise in play Although player 1 might not care about a gain of 5 he might reason that the fact that D dominates U would lead player 2 to expect him to play D and that player 2 would thus play R in response Player 1 might thus play D because it is his best response to R Overall the idea ofeeapproximation is much messier when applied to the identi cation ofa xed point than when it is applied to a singleeobjective optimization problem 38 EVOLUTIONARILY STABLE STRATEGIES Roughly speaking an evolutionarily stable strategy is a mixed strategy that is resistant to invasion by new strategies As can be gleaned from the name inspiration for to concept of evolutionarily stable strategies comes from evolutionary biology There one speaks about dif ferent species within a population and how each species relative tnessH causes its proportion within the population to grow or shrink In our setting the species are those agents playing a particular strategy Then suppose that a small population of invaders playing a different strategy is added to the population The original strategy is considered to be an ESS if it gets a FURTHER SOLUTION CONCEPTS FOR NORMALFORM GAMES 29 higher payoff against the resulting mixture of the new and old strategies than the invaders do thereby chasing out the invaders More formally we have the following De nition381 F 39 39 quot 39 39 a 1355 139 I HMmm Vicfwar ayernmma formg me G 1 2 A u d mixedxtmz egy S wemy tmf S i3 an evaZm immiZyxtablexz mz egy ifand only for Jame e gt 0 andfm aZZ atber xfmz egiex S if i3 tbe me tymf uS 1 S eS gt uS 1 S sS We can wepmpen im of expectation to Hate fyix condition equiv m y m 1 uS S euS S gt 1 euS S euS S Note that since this only need hold for small 6 this is equivalent to requiring that either uS S gt uS S holds or else both uS S uS S and uS S gt uS S hold Note that this is a strict de nition We can also state a weaker de nition of ESS De nition 382 Weak ESS S i3 a weak evolutionarily stable strategy ifmm only iffmmme e gt OzmdfanzZZ S inkth mmtmteiz beruS S gt uS S baldx orelxebw b uS S uS S anduS S 2 uS S bah This weaker de nition includes strategies in which the invader does just as well against the original population as it does against itself In these cases the population using the invading strategy will not grow but it will also not shrink We illustrate the concept of ESS with the instance of the HawkiDo ve game shown in Figure 38 The story behind this game might be as follows Two animals are ghting over a prize such as a piece of food Each animal can choose between two behaviors an aggressive hawkish behavior H or an accommodating dovish behavior D The prize is worth 6 to each of them Fighting costs each player 5 When a hawk meets a dove he gets the prize without a ght and hence the payoffs are 6 and 0 respectively When two doves meet they split the prize H D H 2 2 6 0 FIGURE 38 HawkiDove game 30 ESSENTIALS OF GAME THEORY without a ght hence a payoff of 3 to each one When two hawks meet a ght breaks out costing each player 5 or equivalently yielding 5 In addition each player has a 50 chance of ending up with the prize adding an expected bene t of3 for an overall payoffof Z It is not hard to verify that the game has a unique symmetric Nash equilibrium S S where S g g and that S is also the unique ESS of the game To con rm that S is an ESS we need that for all S 75 S aS S aS S and aS S gt aS S The equality condition is true of any mixed strategy equilibrium with full support so follows directly To demonstrate that the inequality holds it is suf cient to nd the S for equivalently the probability of playing Hithat minimizes fS aS S aS S Expanding fS we see that it is a quadratic equation with the unique maximum S S proving our result The connection between an E88 and a Nash equilibrium is not accidental The following two theorems capture this connection Theorem 383 Given axymme ie fwarpiayer normai rmgame G 1 2 A a anda mixed x az egy S ifS ii an e voZaz imariZy xtabie x az egy tyen S S ii a Narb eqailibiiam ofG This is easy to show Note that by de nition an ESS S must satisfy aS S 2 aS S In other words it is a best response to itself and thus must be a Nash equilibrium However not every Nash equilibrium is an ESS this property is guaranteed only for strict equilibria Theorem 384 Given axymme ie fwarpiayer normai rmgame G l 2 A a anda mixed x afegy S S S ii a x iez xymme ie Narb eqaiZibriam afG tyen S ii an e voZaz imariZy xtabie J az egy This is also easy to show Note that for any strict Nash equilibrium S it must be the case that aS S gt aS S But this satis es the rst criterion of an ESS CHAPTER 4 Games With Sequential Actions The PerfectInformation Extensive Form In Chapter 1 we assumed that a game is represented in normal form effectively as a big table In some sense this is reasonable The normal form is conceptually straightforward and most game theorists see it as fundamental While many other representations exist to describe nite games we will see in this chapter and in those that follow that each of them has an induced normal form a col I o llulllldl fofm in that preserves gameetheoretic prope r J erties such as Nash equilibria Thus the results given in previous chapters hold for all nite games no matter how they are represented in that sense the normaleform representation is universal In this chapter we will look at perfecteinformation extensiveeform games a nite repre sentation that does not always assume that players act simultaneously This representation is in general exponentially smaller than its induced normal form and furthermore can be much more natural to reason about While the Nash equilibria of an extensiveeform game can be found through its induced normal form computational bene t can be had by working with the extensive form directly Furthermore there are other solution concepts such as subgame perfect equilibrium see Section 43 which explicitly refer to the sequence in which players act and which are therefore not meaningful when applied to normaleform games The normaleform game representation does not incorporate any notion of sequence or time of the actions of the players The extemi ve or edfmm is an alternative representation that makes the temporal structure explicit In this chapter we discuss the special case of payed information extensiveeform gamesWe will restrict the discussion to nite games that is to games represented as nite trees 41 DEFINITION Tnformally speaking a perfecteinformation game in extensive form or more simply a perfect information game is a tree in the sense of graph theory in which each node represents the choice of one of the players each edge represents a possible action and the leaves represent nal 32 ESSENTIALS OF GAME THEORY outcomes over which each player has a utility function Indeed in certain circles in particular in arti cial intelligence these are known simply as game trees Formally we de ne them as follows De nition 411 Perfect information game A 0571115 p57f551 ri71f07m111 i071 g11m5 in 5x1 5n3i715 f07m i3111 11pZ5 G 2 IV A H Z x 0 a 11 706575 N 13 11 351 0f71 1131573 A 13 11 3i71gZ5 351 0f1151 i0713 H 13 11 351 0f710711 57mi7111l 560155 71051753 Z 13 11 351 0f1 57mim1l 71051753 di3j0im fr0m f I X H I gt 2A 13 1 65 1151 i071f117151 i071 7116156 113313713 1 0 51156 560155 7105175 11 351 0fp033i6l5 1151 i0n3 I 0 H gt N 13 1 65 113157 f117151 i071 7116156 113313713 1 0 51156 710711 5rmi7111l 7105175 11 pZ11y57i e N 7060 5600353 1171 1151 i071 111 1 6111 710515 I a H X A gt HU Z 13 1 65 3115553307f117151 i0n 706156 m0p3 11 560155 7105175 110039 1171 1151 i071 1 0 11 71570 560155 7105175 07 1 57mim1l 7105175 31156 1 6111 f07 11H 61 62 e H 1100 111 112 e A 061 111 2 062 1121 6571 61 62 117151111 2 115110517 I 11 111 11 706575 11 Z gt R 13 11 751167111615517 utilig mtim rpiay i 00 1 65 1 57mi7111l 7105153 Z Since the choice nodes form a tree we can unambiguously identify a node with its 6i31 07y that is the sequence of choices leading from the root node to it We can also de ne the d535571d11711 3 of a node 6 namely all the choice and terminal nodes in the subtree rooted in 6 An example of such a game is the S6117i71gg11m5 Imagine a brother and sister following the following protocol for sharing two indivisible and identical presents from their parents First the brother suggests a split which can be one ofthreeihe keeps both she keeps both or they each keep one Then the sister chooses whether to accept or reject the split If she accepts they each get their allocated presents and otherwise neither gets any gift Assuming both siblings Value the two presents equally and additively the tree representation of this game is shown in Figure 41 42 STRATEGIES AND EQUILIBRIA A pure strategy for a player in a perfecteinformation game is a complete speci cation of which deterministic action to take at every node belonging to that player A more formal de nition follows GAMESWITH SEQIIENTIALACTIONS THEPERFECT INFORMATION EXTENSIVE FORM 33 00 20 00 11 00 02 FIGURE 41 The Sharing game De nition 421 Pure strategies Let G 2 IV A P Z x 0 a u beaperfecfiinfmmatim extemive rm game Tym tbs pure xtmz egiex ef gy i comixt of tbe Can m39an product MILL75 XV Notice that the de nition contains a subtlety An agent s strategy requires a decision at each choice node regardless of whether or not it is possible to reach that node given the other choice nodes In the Sharing game above the situation is straightforwardiplayer 1 has three pure strategies and player 2 has eight as follows S1 2 2 0 110 2 S2 yexyexyex yexyex no yer nayex yer no no nayexyex nayexno 710710353 nonana But now consider the game shown in Figure 42 In order to de ne a complete strategy for this game each of the players must choose an action at each of his two choice nodes Thus we can enumerate the pure strategies of the players 38 83 210 10 FIGURE 42 A perfecteinformation game in extensive form 34 ESSENTIALS OF GAME THEORY 07E OF DE DE AG 38 3 8 AH 38 3 8 BG 55 210 55 210 BH 55 10 5 5 FIGURE 43 The game from Figure 42 in normal form as follows Si 147 G 147 H B G 3 PD 52 C E C F D E D F It is important to note that we have to include the strategies A G and A H even though once player 1 has chosen A then his own GiversuseH choice is moot The de nition of best response and Nash equilibria in this game are exactly as they are for normaleform games Indeed this example illustrates how every perfecteinformation game can be converted to an equivalent normaleform game For example the perfecteinformation game of Figure 42 can be converted into the normal form image of the game shown in Figure 43 Clearly the strategy spaces of the two games are the same as are the pureestrategy Nash equilibria Indeed both the mixed strategies and the mixedestrategy Nash equilibria of the two games are also the same however we defer further discussion of mixed strategies until we consider imperfecteinformation games in Chapter 5 In this way for every perfecteinformation game there exists a corresponding normale form game Note however that the temporal structure of the extensiveeform representation can result in a certain redundancy within the normal form For example in Figure 43 there are 16 different outcomes while in Figure 42 there are only 5 the payoff 3 8 occurs only once in Figure 42 but four times in Figure 43 One general lesson is that while this transformation can always be performed it can result in an exponential blowup of the game representation GAMESWITHSEQUENTIALACTIONS THEPERFECTINFORMATION EXTENSIVE FORM 35 This is an important lesson since the didactic examples of normaleform games are very small wrongly suggesting that this form is more compact The normal form gets its revenge however since the reverse transformationifrom the normal form to the perfecteinformation extensive formidoes not always exist Consider for example the Prisoner s Dilemma game from Figure 11 A little experimentationwill convince the reader that there does not exist a perfecteinformation game that is equivalent in the sense of having the same strategy pro les and the same payoffs lntuitively the problem is that perfecte information extensiveeform games cannot model simultaneity The general characterization of the class of normaleform games for which there exist corresponding perfecteinformation games in extensive form is somewhat complex The reader will have noticed that we have so far concentrated on pure strategies and pure Nash equilibria in extensiveeform games There are two reasons for this or perhaps one reason and one excuse The reason is that mixed strategies introduce a new subtlety and it is convenient to postpone discussion of it The excuse which also allows the postponement though not for long is the following theorem Theorem 422 Every nite perfeez rin rmm im game in extemi vefmm lynx a pureixz mz egy Nmb equiZiMium This is perhaps the earliest result in game theory due to Zermelo in 1913 see the historical notes at the end of the book The intuition here should be clear since players take turns and everyone gets to see everything that happened thus far before making a move it is never necessary to introduce randomness into action selection in order to nd an equilibrium We will see this plainly when we discuss backward induction below Both this intuition and the theorem will cease to hold when we discuss more general classes of games such as imperfecte information games in extensive form First however we discuss an important re nement of the concept of Nash equilibrium 43 SUBGAMEPERFECT EQUILIBRIUM As we have discussed the notion of Nash equilibrium is as well de ned in perfecteinformation games in extensive form as it is in the normal form However as the following example shows the Nash equilibrium can be too weak a notion for the extensive form Consider again the perfecteinformation extensiveeform game shown in Figure 42 There are three pureestrategy Nash equilibria in this game A C C F A H C F and B H C This can be determined by examining the normal form image ofthe game as indicated in Figure 44 However examining the normal form image of an extensiveeform game obscures the game s temporal nature To illustrate a problem that can arise in certain equilibria of 36 ESSENTIALS OF GAME THEORY 07E OF DE DE AG 3 8 38 83 83 AH 3 8 38 83 83 BG 55 210 55 210 BH 55 10 55 10 FIGURE 44 Equilibria of the game from Figure 42 extensiveeform games in Figure 45 we contrast the equilibria 14 C C and B H C by drawing them on the extensiveeform game tree First consider the equilibrium A C C If player 1 chooses A then player 2 receives a higher payoff by choosing C than by choosing D If player 2 played the strategy C E rather than C F then player 1 would prefer to play B at the rst node in the tree as it is player 1 gets a payoffof3 by playing A rather than a payoffon by playing B Hence we have an equilibrium The second equilibrium 8 H C is less intuitive First note that B G C is not an equilibrium player 2 s best response to B G is C Thus the only reason that player 2 chooses to play the action E is that he knows that player 1 would play H at his second decision node This behavior by player 1 is called a t mt by committing to choose an action that is harmful to player 2 in his second decision node player 1 can cause player 2 to avoid that part of the tree Note that player 1 bene ts from making this threat he gets a payoff of 5 instead on by playing B H instead of B So far so good The problem however is that player 2 may not consider player 1 s threat to be credible if player 1 did reach his nal decision node actually choosing H over C would also reduce player 1 s own utility If player 2 played F would player 1 really follow through on his threat and play H or would he relent and pick G instead To formally capture the reason why the B H C equilibrium is unsatisfying and to de ne an equilibrium re nement concept that does not suffer from this problem we rst de ne the notion ofa subgame GAMESWITH SEQUENTIALACTIONS THEPERFECT INFORMATION EXTENSIVE FORM 37 38 8 3 210 10 38 8 3 210 10 FIGURE 45 Two out of the three equilibria of the game from Figure 42 A G C F and B If C Bold edges indicate players choices at each node De nition 431 Subgame Given pf ffii l i extemive im game G tbe subgame afG mated at node 6 i3 tbe 7 3 iffi0 afG to tbe dexeena imtx afb Tbe 361 afmbgizmex afG camim afaZZ ofmbgizmex afG mated gimme node in G Nowwe can de ne the notion ofa Jubgizmerpeifeez equilibrium a re nement of the Nash equilibrium in perfecteinformation games in extensive form7 which eliminates those unwanted Nash equilibria1 De nition 432 quot 39 a Fufect 1 quotquot 39 Tilt sub amuperfect equilibria SPE ofiz game G me a xtmz egypm lex 3 web tmz fai any Jubgizme G afG tbe 7EJl 7iEl i07l oft to G ii a Nmb equiiibiium afG Since G is in particular its own subgame7 every SPE is also a Nash equilibrium Furthere more7 although SPE is a stronger concept than Nash equilibrium ie7 every SPE is a NE7 but 1Note that the word perfect is used in two different senses here 38 ESSENTIALS OF GAME THEORY not every NE is a SPE it is still the case that every perfecteinformation extensiveeform game has at least one subgameeperfect equilibrium This de nition rules out noncredible threats of the sort illustrated in the above example In particular note that the extensiveeform game in Figure 42 has only one subgameeperfect equilibrium7 Al7 G7 C Neither ofthe other Nash equilibria is subgame perfect Consider the subgame rooted in player 1 s second choice node The unique Nash equilibrium of this trivial game is for player 1 to play G Thus the action H7 the restriction of the strategies A7 H and B7 H to this subgame7 is not optimal in this subgame7 and cannot be part ofa subgameeperfect equilibrium of the larger game 44 BACKWARD INDUCTION Inherent in the concept of subgameeperfect equilibrium is the principle of backward induction One identi es the equilibria in the bottomemost subgame trees7 and assumes that those equilibria will be played as one backs up and considers increasingly larger trees We can use this procedure to compute a sample Nash equilibrium This is good news not only are we guaranteed to nd a subgameeperfect equilibrium rather than possibly nding a Nash equilibrium that involves noncredible threats7 but also this procedure is computationally simple The algorithm BACKWARDTNDUCTION is described in Figure 46 The variable utiLm J MM is a vector denoting the utility for each player at the child node utiLm J 517de denotes the element of this vector corresponding to the utility for player 0b the player who gets to move at node 6 Similarly7 553 Luz iZ is a vector giving utilities for each player Observe that this procedure does not return an equilibrium strategy for each of the n players7 but rather describes how to label each node with a vector of 71 real numbers This function BACKWARDTNDUCTION node 13 returns 216 if b e Zthen L return b is a terminal node 5531111177 OO forall a e xb do utiZJltJbiZd BACKWARDTNDUCTIONQIM 11 if utiLm J 517de gt bexhutilpw then 5531111177 utiZle JbiZd return 5531111177 FIGURE 46 Procedure for nding the value of a sample subgameeperfect Nash equilibrium of a perfecteinformation extensiveeform game GAMESWITH SEQUENTIALACTIONS THEPERFECTINFORMATION EXTENSIVE FORM 39 labeling can be seen as an extension of the game s utility function to the nonterminal nodes H The players equilibrium strategies follow straightforwardly from this extended utility function every time a given player i has the opportunity to act at a given node 6 e H ie7 0b 2 i7 that player will choose an action a e xb that solves arg maxilla 210 a These strategies can also be returned by BACKWARDINDUCTION given some extra bookkeeping In general in this bookletwe do not quotH ess r innnlissues7soth I d be misleading without additional explanation While the procedure demonstrates that in principle a sample SPE is effectively computable in practice many game trees are not enumerated in advance and are hence unavailable for backward induction For example7 the extensiveeform representation of chess has around 10150 nodes7 which is vastly too large to represent explicitly We note that BACKWARDINDUCTION has another name in the twoeplayer zeroesum context the minimax aigaritbm Recall that in such games7 only a single payoff number is required to characterize any outcome Player 1 wants to maximize this number7 while player 2 wants to minimize it In this context BACKWARDINDUCTION can be understood as propagating these single payoff numbers from the root of the tree up to the root Each decision node for player 1 is labeled with the maximum of the labels of its child nodes representing the fact that player 1 would choose the corresponding action7 and each decision node for player 2 is labeled with the minimum of that node s children s labels The label on the root node is the value of the game player 1 s payoff in equilibrium As we said7 realeworld games even zeroesum ones7 such as chess cannot be represented explicitly Such games require the gradual development of the tree and its heuristic search At least in the context of zeroesum games7 considerable effort has gone into such search algorithms The besteknown one7 ALPHABETAPRUNING7 is a heuristic version of the minimax algorithm Despite the fact that strong arguments can be made in its favor7 the concept of backward induction is not without controversy To see why this is7 consider the welleknown Centipede game depicted in Figure 47 The game starts at the node at the upper left In this game two players alternate in making decisions7 at each turn choosing between going down and ending the game or going across and continuing it except at the last node where going across also ends the game The payoffs are constructed in such a way that the only SPE is for each player 10 02 31 24 43 FIGURE 47 The Centipede game 40 ESSENTIALS OF GAME THEORY to always choose to go down So see why7 consider the last choice Clearly at that point the best choice for the player is to go down Since this is the case7 going down is also the best choice for the other player in the previous choice point By induction the same argument holds for all choice points This would seem to be the end of this story7 except for two pesky factors The rst problem is that the SPE prediction in this case ies in the face of intuition lndeed7 in laboratory experiments subjects in fact continue to stay play across until close to the end of the game The second problem is theoretical Imagine that you are the second player in the game7 and in the rst step of the game the rst player actually goes across What should you do The SPE suggests you should go down7 but the same analysis suggests that you would not have gotten to this choice point in the rst place In other words7 you have reached a state to which your analysis has given a probability of zero How should you amend your beliefs and course of action based on this measureezero event It turns out i small 4 i actually raises a fundamental problem in game theory We will not develop the subject further here7 but let us only mention that there exist different accounts of this situation7 and they depend on the probabilistic assumptions made on what is common knowledge in particular7 whether there is common knowledge of rationality7 and on exactly how one revises one s beliefs in the face Of measureezero CVCHES 41 CHAPTER 5 Generalizing the Extensive Form Imperfect Information Games In Chapter 4 in our discussion ofextensiveiform games we allowed players to specify the action that they would take at every choice node of the game This implies that players know the node they are in andirecalling that in such games we equate nodes with the histories that led to themiall the prior choices including those of other agents For this reason we have called these p57f551 rin 7mm im gam5x We might not always want to make such a strong assumption about our players and our environment In many situations we may want to model agents needing to act with partial or no knowledge of the actions taken by others or even agents with limited memory of their own past actions The sequencing of choices allows us to represent such ignorance to a limited degree an earlier choice might be interpreted as a choice made without knowing the later choices However so far we could not represent two choices made in the same play of the game in mutual ignorance of each other 51 DEFINITION Imp5yr55fiiryro7mm im games in extensive form address this limitation An imperfecti information game is an extensiveiform game inwhich each player s choice nodes are partitioned into information sets intuitively if two choice nodes are in the same information set then the agent cannot distinguish between them De nitionSALI Imperfect information game An imp57f55trinfa7mm im gam5 in 5xt5mi v5 fo7m i371 tupZ5 N A H Z x 0 a u I 703575 N A H Z x 0 a u ix ap57f55tiinfo7mm im 5xt5miv5 7m gam5 and I 11 In 703575 I 2 L31 131 ii an 5gm39sz5m5 75Zm im on 75 apan iz im of b e H Mb 2 i witb ty5p7op57ty tmfxb xb andKb ob wy5n57157 173575 5xixz x ajfm 703755 6 e I zmdb 6 L3 42 ESSENTIALS OF GAME THEORY 00 24 24 00 FIGURE 51 An imperfecteinformation game Note that in order for the choice nodes to be truly indistinguishable we require that the set of actions at each choice node in an information set be the same otherwise the player would be able to distinguish the nodes Thus if IN 6 I is an equivalence class we can unambiguously use the notation xIy to denote the set of actions available to player i at any node in information set L3 Consider the imperfecteinformation extensiveeform game shown in Figure 51 In this game player 1 has two information sets the set including the top choice node and the set including the bottom choice nodes Note that the two bottom choice nodes in the second information set have the same set of possible actions We can regard player 1 as not knowing whether player 2 chose A or B when he makes her choice between E and 7 52 STRATEGIES AND EQUILIBRIA A pure strategy for an agent in an imperfecteinformation game selects one of the available actions in each information set of that agent De nition 521 Pure strategies Let G 2 IV A H Z x 0 a u I be on imperfm r information extemive rm game Tym tbepme x oz egiex of pkg57139 tom th ofth Cortexionproo mt FILMS XIi39 Thus perfecteinformation games can be thought of as a special case of imperfecte information games in which every equivalence class of each partition is a singleton Consider again the Prisoner s Dilemma game shown as a normaleform game in Fige ure 11 An equivalent imperfecteinformation game in extensive form is given in Figure 52 Note that we could have chosen to make player 2 choose rst and player 1 choose second GENERALIZING THE EXTENSIVE FORM IMPERFECT INFORMATION GAMES 43 7171 740 074 7373 FIGURE 52 The Prisoner s Dilemma game in extensive form Recall that perfecteinformation games were not expressive enough to capture the Prise oner s Dilemma game and many other ones In contrast as is obvious from this example any normaliform game can be trivially transformed into an equivalent imperfecteinformation game However this example is also special in that the Prisoner s Dilemma is a game with a dominant strategy solution and thus in particular a pureestrategy Nash equilibrium This is not true in general for imperfecteinformation games To be precise about the equivalence between a normaliform game and its extensiveeform image we must consider mixed strategies and this is where we encounter a new subtlety As we did for perfecteinformation games we can de ne the normaliform game cor responding to any given imperfecteinformation game this normal game is again de ned by enumerating the pure strategies of each agent Now we de ne the set of mixed strategies of an imperfecteinformation game as simply the set of mixed strategies in its image normaleform game in the same way we can also de ne the set of Nash equilibria1 However we can also de ne the set of bebmviamZ x az egiex in the extensiveeform game These are the strategies in which each agent s potentially probabilistic choice at each node is made independently of his choices at other nodes The difference is substantive and we illustrate it in the special case of perfecteinformation games For example consider the game of Figure 42 A strategy for player 1 that selects A with probability 5 and G with probability 3 is a behavioral strategy In contrast the mixed strategy 6A G 4B is not abehavioral strategy for that player since the choices made by him at the two nodes are not independent in fact they are perfectly correlated In general the expressive power of behavioral strategies and the expressive power of mixed strategies are noncomparable in some games there are outcomes that are achieved via 1Note that we have de ned two transformationsione om any normaleform game to an imperfecteinformation game and one in the other direction However the rst transformation is not one to one and so ifwe transform a normaleform game to an extensiveeform one and then back to normal form we will not in general get back the same game we started out with However we will get a game with identical strategy spaces and equilibria 44 ESSENTIALS OF GAME THEORY 10 100100 51 22 FIGURE 53 A game with imperfect recall mixed strategies but not any behavioral strategies and in some games it is the other way around Consider for example the game in Figure 53 In this game when considering mixed strategies but not behavioral strategies R is a strictly dominant strategy for agent 1 D is agent 2 s strict best response and thus R D is the unique Nash equilibrium Note in particular that in a mixed strategy agent 1 decides probabilistically whether to play L or R in his information set but once he decides he plays that pure strategy consistently Thus the payoff of 100 is irrelevant in the context of mixed strategies On the other hand with behavioral strategies agent 1 gets to randomize afresh each time he nds himself in the information set Noting that the pure strategy D is weakly dominant for agent 2 and in fact is the unique best response to all strategies of agent 1 other than the pure strategy L agent 1 computes the best response to D as follows If he uses the behavioral strategy p 1 p ie choosing L with probability p each time he nds himself in the information set his expected payoff is 1p2100p1 p21 p The expression simpli es to 99p2 98p 2 whose maximum is obtained at p 2 98198 Thus R D 0 1 0 1 is no longer an equilibrium in behavioral strategies and instead we get the equilibrium 98198 100198 0 There is however a broad class of imperfecteinformation games in which the expressive power of mixed and behavioral strategies coincides This is the class of games of payed recaZZ Intuitively speaking in these games no player forgets any information he knew about moves made so far in particular he remembers precisely all his own moves A formal de nition follows De nition 522 Perfect recall PZayer i bar perfect recall in an imperfectrinformaz ion game G iffor any two noa ex lo 6 oat are in tbe Jame information xeffor ay i for any pain GENERALIZING THE EXTENSIVE FORM IMPERFECT INFORMATION GAMES 45 b0 a0 b1 a1 b2 b an bfrom tbe root oftbe game to b wbere tbe by are deeixion nodex andtbeaj are aetiom andforanypatb b0 a6 b l a3 b z M a b from tbe roottob it mth be tbe eaxe tbat I n m 2 for allO f f n by and b are in tbe Jame equivalence elaxxforplayeri and 3 for allO 5 5 n ifpbj i ie by ixadeeixion node ofplayer i tben a a G i3 a game ofperfeet recall every player baxperfeet recall in it Clearly every perfecteinformation game is a game of perfect recall Theorem 523 Kuhn 1953 In agame ofperfeet recall any mixed xtrategy ofa given agent can be replaced by an equivalent bebavioral xtrategy and any bebavioral strategy can be replaced by an equivalent mixed xtrategy Here two xtrategiex are equivalent in tbe xenxe tbat tbey induce tbe Jame probabilitiex on outeomex for any xed xtrategy pro le mixed or bebavioral oftbe remaining agentx As a corollary we can conclude that the set of Nash equilibria does not change if we restrict ourselves to behavioral strategies This is true only in games of perfect recall and thus for example in perfecteinformation games We stress again however that in general imperfecteinformation games mixed and behavioral strategies yield noncomparable sets of equilibria 53 SEQUENTIAL EQUILIBRIUM We have already seen that the Nash equilibrium concept is too weak for perfecteinformation games and how the more selective notion of subgameeperfect equilibrium can be more inf structive The question is whether this essential idea can be applied to the broader class of imperfecteinformation games it turns out that it can although the details are considerably more involved Recall that in a subgameeperfect equilibrium we require that the strategy of each agent be a best response in every subgame not only the whole game It is immediately apparent that the de nition does not apply in imperfecteinformation games if for no other reason than we no longer have a wellede ned notion of a subgame What we have instead at each information set is a subforest or a collection of subgames We could require that each player s strategy be a best response in each subgame in each forest but that would be both too strong a requirement and too weak To see why it is too strong consider the game in Figure 54 46 ESSENTIALS OF GAME THEORY 01000 00 10 31 FIGURE 54 Player 2 knows where in the information set he is The pure strategies ofplayer 1 are L C7 R and ofplayerZ U7 D Note also that the two pure Nash equilibria are L7 U and R7 D But should either ofthese be considered subgame perfect On the face of it the answer is ambiguous7 since in one subtree U dramatically dominates D and in the other D dominates U However7 consider the following argument R dominates C for player 17 and player 2 knows this So although player 2 does not have explicit information about which of the two nodes he is in within his information set7 he can deduce that he is in the rightmost one based on player is incentives7 and hence will go D Furthermore player 1 knows that player 2 can deduce this7 and therefore player 1 should go R Thus R7 D is the only subgameeperfect equilibrium This example shows how a requirement that a subestrategy be a best response in all subgames is too simplistic However7 in general it is not the case that subtrees of an information set can be pruned as in the previous example so that all remaining ones agree on the best strategy for the player In this case the naive application of the SPE intuition would rule out all strategies There have been several related proposals that apply the intuition underlying subgamee perfection in more sophisticated ways One of the more in uential notions has been that of requem iaZ equilibrium lt shares some features with the notion of tremblingehand perfection7 discussed in Section 36 Note that indeed tremblingehand perfection7 which was de ned for normaleform games7 applies here just as well just think of the normal form induced by the extensiveeform game However7 this notion makes no reference to the tree structure of the game SE does7 but at the expense of additional complexity Sequential equilibrium is de ned for games of perfect recall As we have seen7 in such games we can restrict our attention to behavioral strategies Consider for the moment a fully mixedestrategy pro le2 Such a strategy pro le induces a positive probability on every node in ZAgain recall that a strategy is fully mixed if at everyinformation set each action is given some positive probability GENERALIZING THE EXTENSIVE FORM IMPERFECTINFORMATION GAMES 47 the game tree This means in particular that every information set is given a positive probability Therefore for a given fully mixedestrategy pro le one can meaningfully speak of is expected utility given that he nds himself in any particular information set The expected utility of starting at any node is well de ned and since each node is given positive probability one can apply Bayes rule to aggregate the expected utilities of the different nodes in the information set If the fully mixedestrategy pro le constitutes an equilibrium it must be that each agent s strategy maximizes his expected utility in each of his information sets holding the strategies of the other agents xed All of the preceding discussion is for a fully mixedestrategy pro le The problem is that equilibria are rarely fully mixed and strategy pro les that are not fully mixed do not induce a positive probability on every information set The expected utility of starting in information sets whose probability is zero under the given strategy pro le is simply notwell de ned This is where the ingenious device of SE comes in Given any strategy pro le S not necessarily fully mixed imagine a probability distribution MM over each information set a has to be consistent with S in the sense that for sets whose probability is nonzero under their parents conditional distribution S this distribution is precisely the one de ned by Bayes rule However for other information sets it can be any distribution lntuitively one can think of these distributions as the new beliefs of the agents if they are surprised and nd themselves in a situation they thought would not occur This means that agents expected utility is now well de ned in any information set including those having measure zero For information set 6 belonging to agent i with the associated probability distribution MM the expected utility under strategy pro le S is denoted by aS lo With this the precise de nition of SE is as follows De nition 531 Sequential equilibrium A strategy profie S ii a sequential equilibrium of an extenxive rm game G tyere exixtprooaoility dixtrioationx MM for earn information 361 b in G xaeb tyat tyefoZZowing two eonditionx bold I S a limnaooS M for3omexeqaeneeS1 1 S2 2 wyere S isfallymixed and M ii eonxixtent witb S infaet xinee S mixed M ii aniqaely determined by S and 2 For any information 361 b belonging to agent i and any alternative xtrategy S ofi we lya ve tyat lids by Mb 2 HMS7 57 by M5 Analogous to subgame perfection in games of perfect information sequential equilibria are guaranteed to always exist 48 ESSENTIALS OF GAME THEORY Theorem 532 Every niz e game afperfeef reeuZZ buy it xequenz iui equiZibrium Finally while sequential equilibria are de ned for games of imperfect information they are obviously also well de ned for the special case of games of perfect information This raises the question of whether in the context of games of perfect information the two solution concepts coincide The answer is that they almost do but not quite Theorem 533 Every xuugumerperfeez equilibrium if u xequem iui equiZibrium but tbe earmerxe i3 not true in general 49 CHAPTER 6 Repeated and Stochastic Games In repeated games a given game often thought of in normal form is played multiple times by the same set of players The game being repeated is called the ridge game For example Figure 61 depicts two players playing the Prisoner s Dilemma exactly twice in a row This representation of the repeated game while intuitive obscures some key factors Do agents see what the other agents played earlier Do they remember what they knew And while the utility ofeach stage game is speci ed what is the utility of the entire repeated game We answer these questions in two steps We rst consider the case in which the game is repeated a nite and commonlyeknown number of times Then we consider the case in which the game is repeated in nitely often or a nite but unknown number of times 61 FINITELY REPEATED GAMES One way to completely disambiguate the semantics of a nitely repeated game is to specify it as an imperfecteinformation game in extensive form Figure 62 describes the twiceeplayed Prisoner s Dilemma game in extensive form Note that it captures the assumption that at each iteration the players do not know what the other player is playing but afterward they do Also note that the payoff function of each agent is additive that is it is the sum of payoffs in the twoestage games The extensive form also makes it clear that the strategy space of the repeated game is much richer than the strategy space in the stage game Certainly one strategy in the repeated C D C D c 71 71 74 0 c 71 71 74 0 gt D 0 74 73 73 D 0 74 73 73 FIGURE 61 Twiceeplayed Prisoner s Dilemma 5 o ESSENTIALS OF GAME THEORY 7 5 7 1 74 74 780 7773 7474 7377 7773 7676 FIGURE 62 Twiceeplayed Prisoner s Dilemma in extensive form game is to adopt the same strategy in each stage game clearly this memory less strategy called a xt timmy xtmz egy is a behavioral strategy in the extensiveeform representation of the game But in general the action or mixture of actions played at a stage game can depend on the history of play thus far Since this fact plays a particularly important role in in nitely repeated games we postpone further discussion of it to the next section Indeed in the nite known repetition case we encounter again the phenomenon of backward induction which we rst encountered when we introduced subgame perfect equilibria Recall that in the Centipede game discussed in Section 43 the unique SPE was to go down and terminate the game at every node Now consider a nitely repeated Prisoner s Dilemma game Again it can be argued in the last round it is a dominant strategy to defect no matter what happened so far This is common knowledge and no choice of action in the preceding rounds will impact the play in the last round Thus in the secondetoelast round too it is a dominant strategy to defect Similarly by induction it can be argued that the only equilibrium in this case is to always defect However as in the case of the Centipede game this argument is vulnerable to both empirical and theoretical criticisms 62 INFINITELY REPEATED GAMES When the in nitely repeated game is transformed into extensive form the result is an in nite tree So the payoffs cannot be attached to any terminal nodes nor can they be de ned as the sum of the payoffs in the stage games which in general will be in nite There are two common ways of de ning a player s payoff in an in nitely repeated game to get around this problem The rst is the average payoff of the stage game in the limit1 1The observant reader will notice a potential dif culty in this de nition since the limit may not exist One can extend the de nition to cover these cases by using the lim sup operator in De nition 621 rather than lim REPEATED AND STOCHASTIC GAMES 51 De nition 621 Average reward Given an in nite xequenee ofpnya fr rill 72 faipi yei i tbe average reward of i ii The future dixeauntea reward to a player at a certain point of the game is the sum of his payoff in the immediate stage game plus the sum of future rewards discounted by a constant factor This is a recursive de nition since the future rewards again give a higher weight to early payoffs than to later ones De nition 622 Discounted reward Given an in nite xequenee afpnya 71 752 fai z pizzer i nna n dixeaunt zetai witb 0 f 8 f 1 tbe future discounted reward afi i3 Blrim The discount factor can be interpreted in two ways First it can be taken to represent the fact that the agent cares more about his wellebeing in the near term than in the long term Alternatively it can be assumed that the agent cares about the future just as much as he cares about the present but with some probability the game will be stopped any given round 1 8 represents that probability The analysis of the game is not affected by which perspective is adopted Now let us consider strategy spaces in an in nitely repeated game In particular consider the in nitely repeated Prisoner s Dilemma game As we discussed there are many strategies other than stationary ones One of the most famous ones is TitrfoirTnt TfT is the strategy in which the player starts by cooperating and thereafter chooses in round l the action chosen by the other player in round Besides being both simple and easy to compute this strategy is notoriously hard to beat it was the winner in several repeated Prisoner s Dilemma competitions for computer programs Since the space of strategies is so large a natural question is whether we can characterize all the Nash equilibria of the repeated game For example if the discount factor is large enough both players playing TfT is a Nash equilibrium But there is an in nite number of others For example consider the trigger xtmtegy This is a draconian version of TfT in the trigger strategy a player starts by cooperating but if ever the other player defects then the rst defects forever Again for suf ciently large discount factor the trigger strategy forms a Nash equilibrium not only with itself but also with TfT The folk theoremisoecalled because it was part of the common lore before it was formally written downihelps us understand the space of all Nash equilibria of an in nitely repeated game by answering a related question It does not characterize the equilibrium strategy pro les but rather the payoffs obtained in them Roughly speaking it states that in an in nitely repeated game the set of average rewards attainable in equilibrium are precisely those pairs attainable 52 ESSENTIALS OF GAME THEORY under mixed strategies in a singleestage game with the constraint on the mixed strategies that each player s payoff is at least the amount he would receive if the other players adopted minmax strategies against him More formally consider any neplayer game G 2 IV A a and any payoff pro le r r1 r2 r Let 1 min maxai3x 5ES 56 In words 1 is player is minmaxvalue his utility when the other players play minmax strategies against him and he plays his best response Before giving the theorem we provide some more de nitions De nition 623 Enforceable Apayafpra ie r r1 r2 r it enforceable Vi e N 7139 Z 1 De nition624 Feasible Apayafpra le r r1 r2 r i3 feasible ifz bere exth rational nannegaz i ve vaZaex at web tyatfar a i we can exprexx r a3 ERA at aa witb ERA at 1 In other words a payoff pro le is feasible if it is a convex rational combination of the outcomes in G Theorem 625 Folk Theorem Conxia39er any nrpiayer normaifarm game G and anypaya quot pra ler r1 r2 rquot 1 If r i3 tyepaya rpra lefar any Naxb eqaiiiariam 3 aftbe iryfniz eiy repeated G witb average rewardx tyenfar earn pZayer i r i3 enforceable If r i3 batyfeaxiaie and enforceable tyen r i3 tyepayafpra iefar Jame Naxb equilibrium of tbe in niteZy repeated G witb average rewardx N Although we do not give the proof of this theorem here a highelevel description of the argument is both instructive and intuitive The proof proceeds in two parts The rst part uses the de nition of minmax and best response to show that an agent can never receive less than his minmax value in any equilibrium The second part shows how to construct an equilibrium that yields each agent the average payoffs given in any feasible and enforceable payoff pro le r This equilibrium has the agents cycle in perfect lockestep through a sequence of game outcomes that achieve the desired average payoffs If any agent deviates the others punish him forever by playing their minmax strategies against him Theorem 625 is actually an instance of a large family of folk theorems As stated Theorem 625 is restricted to in nitely repeated games to average reward to the Nash equie librium and to games of complete information However there are folk theorems that hold for other versions of each of these conditions as well as other conditions not mentioned here In REPEATED AND STOCHASTIC GAMES 53 particular there are folk theorems for in nitely repeated games with discounted reward for a large enough discount factor for nitely repeated games for subgameeperfect equilibria ie where agents only administer nite punishments to deviators and for games of incomplete information We do not review them here but the message of each of them is fundamentally the same the payoffs in the equilibria of a repeated game are essentially constrained only by enforceability and feasibility 63 STOCIMSTIC GAMES lntuitively speaking a stochastic game is a collection of normaleform games the agents ref peatedly play games from this collection and the particular game played at any given iteration depends probabilistically on the previous game played and on the actions taken by all agents in that game 631 De nition De nition 631 Stochastic game A stochastic game aka known m a Markov game ix 71 mph Q N A P R 703575 I Q ix a m39z 5 351 ofxmm I N ix a m39f5 351 af pl y x I A A1 gtlt X A 703575 A ix 71571715 35f of a iam a vm zbk topZay57 i I P Q X A X Q 7 0 1 731 65 tmmiz impmba i ty ma im Pq a ix ty5p7obabilr ity aftmmitiming am xtm 5 3 to 31 711 5 71 57 action p7o Z5 a and I R 71 7 703575 7 Q X A 7 1R ix a 7571Zr7mlu5dpayof mctimfa7pkg57 i In this de nition we have assumed that the strategy space of the agents is the same in all games and thus that the difference between the games is only in the payoff function Removing this assumption adds notation but otherwise presents no major dif culty or insights Restricting Q and each A to be nite is a substantive restriction but we do so for a reason the in nite case raises a number of complications that we wish to avoid We have speci ed the payoff of a player at each stage game or in each state but not how these payoffs are aggregated into an overall payoff To solve this problem we can use solutions already discussed earlier in connection with in nitely repeated games Section 62 Speci cally the two most commonly used aggregation methods are a715mg5 7570717577 and futu75 5173507177155 75707175 Stochastic games are very broad framework generalizing both Markov decision processes MDPs and repeated games An MD is simply a stochastic game with only one player while a repeated game is a stochastic game in which there is only one state or stage game Another 54 ESSENTIALS OF GAME THEORY interesting subclass of stochastic games is zeroesum stochastic games in which each stage game is a zeroesum game ie for any a 6 Q a e A we have that Z viq a 0 Finally in a Jingtercmtroller xtocyaxtic game the transition probabilities depend only on the actions of one particular agent while players payoffs still depend on their joint actions 632 Strategies and Equilibria Wewill nowde ne the strategy space ofan agent Let bi 70 a0 71 a1 at l 7 denote a history of t stages ofa stochastic game and let H be the set of all possible histories of this length The set of deterministic strategies is the Cartesian product Ht H A which requires a choice for each possible history at each point in time As in the previous game forms an agent s strategy can consist of any mixture over deterministic strategies However there are several restricted classes of strategies that are of interest and they form the following hierarchy The rst restriction is that the mixing take place at each history independently this is the restriction to behavioral strategies seen in connection with extensiveeform games De nition 632 Behavioral strategy A beya viarat strategy 3 a retarm tbe prababitity afplaying action a for lyixtary In A Markov strategy further restricts a behavioral strategy so that for a given time t the distribution over actions depends only on the current state De nition 633 Markov strategy A Markov strategy 3 ix a beya viarat strategy in wyicb xbt a 2 362 aij ifqt 7 wyere qt arm q are tye natxtatex ofbt andbg rexpecti vely The nal restriction is to remove the possible dependence on the time t De nition 634 Stationary strategy A stationary strategy 3 ix a Markov strategy in wyicb 3bt1 a 3b2 a qu1 72 wyere an arta qt 2 are tye natxtatex aqu arm biz rexpecti vely Now we can consider the equilibria of stochastic games a topic that turns out to be fraught with subtleties The discountedereward case is the less problematic case In this case it can be shown that a Nash equilibrium exists in every stochastic game In fact we can state a stronger property A strategy pro le is called a Markov perfect eqaitibriam if it consists of only Markov strategies and is a Nash equilibrium regardless of the starting state In a sense MPE plays a role analogous to the subgameeperfect equilibrium in perfecteinformation games Theorem 635 Every nrptayer generalrxam a ixcoantea39rrewara39 xtacyaxtic game lyax a Markov perfect eqaiZibriam The case of average rewards presents greater challenges For one thing the limit average may not exist ie although the stageegame payoffs are bounded their average may cycle and REPEATED AND STOCHASTIC GAMES 55 not converge However7 there is a class of stochastic games which is well behaved in this regard This is the class of irreducible stochastic games A stochastic game is irreducible if every strategy pro le gives rise to an irreducible Markov chain over the set of games7 meaning that every game can be reached with positive probability regardless of the strategy adopted In such games the limit averages are well de ned7 and we have the following theorem Theorem 636 Every twbrplayer generalnrum average reward irreducible xtbcbaxz ic game bay a Naxb equilibrium lndeed7 under the same condition we can state a folk theorem similar to that presented for repeated games in Section 62 That is7 as long as we give each player an expected payoffthat is at least as large as his minmaX value7 any feasible payoff pair can be achieved in equilibrium through the use of threats Theorem 637 For every twbrplayer generalrxum irreducible xtbcbaxtic game and every feaxible outcome witb a paybjfvecz br r tbaz providex to eacb player at leaxz bix minmax value tbere exixz x a Naxb equilibrium witb a pay vectbr r Tbix i3 truefbr gamex witb average rewardx a3 well as gamex witb large enougb dixcbum zcz m or witb playerx tbaz are Jigjfciem lypaz iem CHAPTER 7 Uncertainty About Payoffs Bayesian Games All of the game forms discussed so far assumed that all players know what game is being played Speci cally the number of players the actions available to each player and the payoff associated with each action vector have all been assumed to be common knowledge among the players Note that this is true even of imperfecteinformation games the actual moves of agents are not common knowledge but the game itselfis In contrast Bayexian gamer or games ofincomplete information allow us to represent players uncertainties about the very game being played1 This uncertainty is represented as a probability distribution over a set of possible games We make tWO assumptions 1 All possible games have the same number of agents and the same strategy space for each agent they differ only in their payoffs 2 The beliefs ofthe different agents are posteriors obtained by conditioning a common prior on individual private signals The second assumption is substantive and we return to it shortly The rst is not particularly restrictive although at rst it might seem to be One can imagine many other potential types of uncertainty that players might have about the gameihow many players are involved what actions are available to each player and perhaps other aspects of the situation It might seem that we have severely limited the discussion by ruling these out However it turns out that these other types of uncertainty can be reduced to uncertainty only about payoffs via problem reformulation For example imagine that we want to model a situation in which one player is uncertain about the number of actions available to the other players We can reduce this uncertainty to uncertainty about payoffs by padding the game with irrelevant actions For example consider the following twoeplayer game in which the row player does not know whether his opponent has only the two strategies L and R or also the third one C 1h is easy to confuse the term incomplete information with imperfect information don t 58 ESSENTIALS OF GAME THEORY L R L C R U 11 13 U 11 02 13 D 05 113 D 05 2 8 113 Now consider replacing the leftmost7 smaller game by a padded version7 in which we add a new C column U 11 0 100 1 3 D 05 2 100 113 Clearly the newly added column is dominated by the others and will not participate in any Nash equilibrium or any other reasonable solution concept Indeed7 there is an isomorphism between Nash equilibria of the original game and the padded one Thus the uncertainty about the strategy space is reduced to uncertainty about payoffs Using similar tactics7 it can be shown that it is also possible to reduce uncertainty about other aspects of the game to uncertainty about payoffs only This is not a mathematical claim7 since we have given no mathematical characterization of all the possible forms of uncere tainty7 but it is the case that such reductions have been shown for all the common forms of uncertainty The second assumption about Bayesian games is the commanrprim mmmpz im A Bayesian game thus de nes not only the uncertainties of agents about the game being played7 but also their beliefs about the beliefs of other agents about the game being played7 and indeed an entire in nite hierarchy of nested beliefs the soecalled epistemic type space The commoneprior assumption is a substantive assumption that limits the scope of applicability We nonetheless make this assumption since it allows us to formulate the main ideas in Bayesian games7 and without the assumption the subject matter becomes much more involved than is appropriate for this text Indeed7 most but not all work in game theory makes this assumption UNCERTAINTYABOUT PAYOFFS BAYESIAN GAMES 59 71 DEFINITION There are several different ways of presenting Bayesian games we will offer three de nitions of Bayesian games All three are equivalent modulo some subtleties which lie outside the scope of this booklet We include all three since each formulation is useful in different settings and offers different intuition about the underlying structure of this family of games 711 Information Sets First we resenta de nition that is based on information sets Under this de nition a Ba esian P y game consists ofa set of games that differ only in their payoffs a common prior de ned over them and a partition structure over the games for each agent De nition 711 Bayesian game information sets A Bayesian game it a tupl5 N G P I 706575 a N ii a 351 ofagmtx I G ii a 351 afgamm wit6 Nag5m x 51156 32156 1661 g g e G t65nfa7 51156 agmt i e Nt65 xtmz 5gy 1155 in g L id5m imZ to 1 65 xtmz 5gy 1155 in g P e H G L a commanprim w5rg m5x 706575 1391 G L 1 65 351 ofaZme6a6iZig distri6uz iam 07157 G and I 11 IN 13 a tupZ5 afpmtitiam afG m5far 51156 ag5m Figure 71 gives an example ofa Bayesian game It consists offour 2 X 2 games Matching Pennies Prisoner s Dilemma Coordination and Battle of the Sexes and each agent s partition consists of two equivalence classes 21 22 7 7 U to I p03 Coord p02 FIGURE 71 A Bayesian game 6 o ESSENTIALS OF GAME THEORY 20 02 02 20 22 03 30 11 22 00 00 11 21 00 00 12 FIGURE 72 The Bayesian game from Figure 71 in extensive form 712 A second way of capturing the common prior is to hypothesize a special agent called Nature who Extensive form with Chance Moves makes probabilistic choices While we could have Nature s choice be interspersed arbitrarily with the agents moves without loss of generality we assume that Nature makes all its choices at the outset Nature does not have a utility function or alternatively it can be viewed as having a constant one and has the unique strategy of randomizing in a commonly known way The agents receive individual signals about Nature s choice and these are captured by their information sets in a standard way The agents have no additional information in particular the information sets capture the fact that agents make their choices without knowing the choices of others Thus we have reduced games of incomplete information to games of imperfect information albeit ones with chance moves These chance moves of Nature require minor adjustments of existing de nitions replacing payoffs by their expectations given Nature s moves2 For example the Bayesian game of Figure 71 can be represented in extensive form as depicted in Figure 72 Although this second de nition of Bayesian games can be initially more intuitive than our rst de nition it can also be more cumbersome to work with This is because we use an extensiveeform representation in a setting where players are unable to observe each others moves Indeed for the same reason we do not routinelyuse extensiveeform games of imperfect information to model simultaneous interactions such as the Prisoner s Dilemma though we could do so if we wished For this reason we will not make further use of this de nition We close by noting one advantage that it does have however it extends very naturally to Bayesian 2Note that the special structure of t 39t c we do notL L of Nash equilibrium since agents have no information about prior choices made other than by Nature all Nash equilibria are also sequential equilibria UNCERTAINTYABOUT PAYOFFS BAYESIAN GAMES 61 games in which players move sequentially and do at least sometimes learn about previous players moves 713 Epistemic Types Recall that a game may be de ned by a set of players actions and utility functions In our rst de nition agents are uncertain about which game they are playing however each possible game has the same sets of actions and players and so agents are really only uncertain about the game s utility function Our third de nition uses the notion of an 5pix1 5mic 1fyp5 or simply a typ5 as a way of de ning uncertainty directly over a game s utility function De nition 712 Bayesian game types A Bayesian game it 11 1 u 5 IV A 9 p 71 706575 I N ii 11 351 of 11g57t1 3 I A 141 X X 14 706575 A ix 1 65 351 ofactiam avaiZa6l5 topZay57 i I 9 1 X X 9 706575 i ii 1 65 typ5 1155 apr1zy57 i I p gt 0 1 i3 11 cammmpn39m 071571 yp531md I u 711 71 706575 71 A X 9 I gt E ii 1 65 utilityfumtim 7 ay57i The assumption is that all of the above is common knowledge among the players and that each agent knows his own type This de nition can seem mysterious because the notion of type can be rather opaque In general the type of agent encapsulates all the information possessed by the agent that is not common knowledge This is often quite simple eg the agent s knowledge of his private payoff function but can also include his beliefs about other agents payoffs about their beliefs about his own payoff and any other highereorder beliefs We can get further insight into the notion of a type by relating it to the formulation at the beginning ofthis section Consider again the Bayesian game in Figure 71 For each ofthe agents we have two types corresponding to his two information sets Denote player 1 s actions as U and D player 2 s actions as L and R Call the types of the rst agent 011 and 012 and those of the second agent 021 and 022 The joint distribution on these types is as follows p011 021 2 3 p01y1 022 2 1 p01y2 021 2 2 p01y2 022 2 4 The conditional probabilities for the rst player are 911 l 911 347 912 l 911 147 911 l 912 137 and 912 l 912 23 Both players utility functions are given in Figure 73 72 STRATEGIES AND EQUILIBRIA Now that we have de ned Bayesian games we must explain how to reason about them We will do this using the epistemic type de nition given earlier because that is the de nition most 62 ESSENTIALS OF GAME THEORY a1 a2 91 92 U1 U2 111 a2 91 92 U1 U2 U L 0L1 011 2 0 D L 01 021 0 2 U L an on 2 2 D L an on 3 0 U L 0L2 0Z1 2 2 D L em on 0 0 U L 013 023 2 1 D L 013 023 0 0 U R an 0Z1 0 2 D R em on 2 0 U R 0L1 023 0 3 D R 0L1 023 1 1 U R 913 911 0 0 D R 913 911 1 1 U R 913 022 0 0 D R 012 022 1 2 FIGURE 73 Utility functions ul and M2 for the Bayesian game from Figure 71 commonly used in mechanism design7 one of the main applications of Bayesian games All of the concepts de ned below can also be expressed in terms of the rst two Bayesian game de nitions as well The rst task is to de ne an agent s strategy space in a Bayesian game Recall that in an imperfecteinformation extensiveeform game a pure strategy is a mapping from information sets to actions The de nition is similar in Bayesian games a pure strategy at 9 gt A is a mapping from every type agent i could have to the action he would play if he had that type We can then de ne mixed strategies in the natural way as probability distributions over pure strategies As before we denote a mixed strategy for i as 3 6 5 where S is the set of all is mixed strategies Furthermore7 we use the notation 3 jlttl 1101 to denote the probability under mixed strategy 31 that agent plays action tlj39 given that s type is Oj Next7 since we have de ned an environment with multiple sources ofuncertainty7 we will pause to reconsider the de nition of an agent s expected utility In a Bayesian game setting7 there are three meaningful notions of expected utility ex paxt ex interim and ex ante The rst is computed based on all agents actual types7 the second considers the setting in which an agent knows his own type but not the types of the other agents7 and in the third case the agent does not know anybody s type De nition 721 Ex past expected utility Agent i 3 ex post expected utiZity in a Bayexiizn game M A 9 p it wyeie tbe ngentx xtmtegiex are given by 3 and tbe agent typex are given by 0 i3 de ned at EUx0Z xingm Mme 71 4614 jEN UNCERTAINTYABOUT PAYOFFS BAYESIAN GAMES 63 In this case7 the only uncertainty concerns the other agents mixed strategies7 since agent is ex pmt expected utility is computed based on the other agents actual types Of course7 in a Bayesian game no agent wiZZ know the others types while that does not prevent us from offering the de nition given7 it might make the reader question its usefulness We will see that this notion of expected utility is useful both for de ning the other two and also for de ning a specialized equilibrium concept De nition 722 Ex interim expected utility Agent i 3 ex interim expected atiZity in a Bayexian game N A 9 p a wyeie i x type i3 0 and wyeie tbe agentx xtiategiex are given by tbe mixedr xtrategypm Ze 3 if de ned a3 Elx9 Z 074002 HumI0 Wat0 72 9 66 asA jEN or egai vaZentZy a3 EUx 0 Z plt0IegtEUltx ea07 73 9e Thus i must consider every assignment of types to the other agents 0 and every pure action pro le a in order to evaluate his utility function a a7 0 07139 He must weight this utility value by two amounts the probability that the other players types would be 0 given that his own type is 0 and the probability that the pure action pro le a would be realized given all players mixed strategies and types Observe that agents types may be correlated Because uncertainty over mixed strategies was already handled in the ex pmt case7 we can also write ex interim expected utility as a weighted sum of E Us7 0 terms Finally7 there is the ex ante case7 where we compute is expected utility under the joint mixed strategy 3 without observing any agents types De nition 723 Ex ante expected utility Agenti x ex ante expectedatiZity in a Bayexian game N A 9 7 p a wyeie tbe agentx x ategiex are given by tbe mixedrxtrategypm le 3 if de ned a3 Elix 2120 X H Wale w 0 74 966 aeA jEN or eqai vaZentZy a3 EUzs Z pltegtEUltx 0 75 966 64 ESSENTIALS OF GAME THEORY 07 again equivaZentZy m EUzs Z pwnEUlw 76 966 Next7 we de ne best response De nition 724 Best response in a Bayesian game Tbe 36f ofizgem i x best responses to mixedrxz mtegypm le 3 me given by Eli7 argmaxEU33 77 516539 Note that ER is a set because there may be many strategies for i that yield the same expected utility It may seem odd that ER is calculated based on is ex ante expected utility However7 write Ellx as 29166 p0EUs7 0 and observe that EUtJf7 37 0 does not depend on strategies that i would play ifhis type were not 0 Thus7 we are in fact performing independent maximization of is ex im eiim expected utility conditioned on each type that he could have lntuitively speaking7 ifa certain action is best after the signal is received7 it is also the best conditional plan devised ahead of time for what to do should that signal be received We are now able to de ne the BayesiNash equilibrium De nition 725 Bayes Nash equilibrium A BayesiNash equilibrium i3 i1 mixedrxtmtegy pm Ze 3 tmt mfix ex Vi 3 6 BR 3 ii This is exactly the de nition we gave for the Nash equilibrium in De nition 222 each agent plays a best response to the strategies of the other players The difference from Nash equilibrium7 of course7 is that the de nition of BayesiNash equilibrium is built on top of the Bayesian game de nitions of best response and expected utility Observe that we would not be able to de ne equilibrium in this way if an agent s strategies were not de ned for every possible type In order for a given agent i to play a best response to the other agents i7 i must know what strategy each agent would play for each of his possible types Without this information7 it would be impossible to evaluate the term EUAJE 3 in Equation 7 7 73 COMPUTING EQUILIBRIA Despite its similarity to the Nash equilibrium7 the BayesiNash equilibrium may seem more conceptually complicated However7 as we did with extensiveeform games7 we can construct a normaleform representation that corresponds to a given Bayesian game As with games in extensive form the induced normal form for Bayesian games has an action for every pure strategy That is7 the actions for an agenti are the distinct mappings from UNCERTAINTYABOUT PAYOFFS BAYESIAN GAMES 65 9 to Ag Each agent is payoff given a pureestrategy pro le 3 is his ex ante expected utility under 3 Then as it turns out7 the BayesiN ash equilibria of a Bayesian game are precisely the Nash equilibria of its induced normal form This fact allows us to note that Nash s theorem applies directly to Bayesian games7 and hence BayesiNash equilibria always exist An example will help Consider again the Bayesian game from Figure 73 Note that in this game each agent has four possible pure strategies two types and two actions Then player 1 s four strategies in the Bayesian game can be labeled UU7 UD7 DU7 and DD UU means that 1 chooses U regardless of his type7 UD that he chooses U when he has type 011 and D when he has type 012 and so forth Similarly7 we can denote the strategies of player 2 in the Bayesian game by RR7 RL LR7 and LL We now de ne a 4 X 4 normaleform game in which these are the four strategies of the two agents7 and the payoffs are the expected payoffs in the individual games7 given the agents common prior beliefs For example7 player 2 s ex ante expected utility under the strategy pro le UU7 LL is calculated as follows mwuLmijme 966 19117 920100 L7 9117 921 Plt9117922 2U7L7 9117922 Plt9L27 921 2U7 L7 9127 921 912 922 2U7 L7 9127 922 0 0 0 04h1 Continuing in this manner7 the complete payoff matrix can be constructed as indicated in Figure 74 LL LR RL RR UU a1 L07 LL2 Q09 UD 0amp02111 oiioaiy DU rai40aiiizo4ozoi DD 0amp060amp16LL02Lamp11 FIGURE 74 Induced normal form ofthe game from Figure 73 66 ESSENTIALS OF GAME THEORY LL LR RL RR UU 205 15075 052 0225 UD 205 15075 052 0225 DU 07515 025175 2250 175025 DD 07515 025175 2250 175025 FIGURE 75 Ex interim induced normaleform game where player 1 observes type 0111 Now the game may be analyzed straightforwardly For example7 we can determine that player 1 s best response to KL is DU Given a particular signal7 the agent can compute the posterior probabilities and recompute the expected utility of any given strategy vector Thus in the previous example once the row agent gets the signal 011 he can update the expected payoffs and compute the new game shown in Figure 75 Note that for the row player7 DU is still a best response to KL what has changed is how much better it is compared to the other three strategies In particular7 the row player s payoffs are now independent of his choice ofwhich action to take upon observing type 012 in effect7 conditional on observing type 011 the player needs only to select a single action U or D Thus7 we could have written the ex im erim induced normal form in Figure 75 as a table with four columns but only two rows Although we can use this matrix to nd best responses for player 17 it turns out to be meaningless to analyze the Nash equilibria in this payoff matrix This is because these expected payoffs are not common knowledge if the column player were to condition on his signal7 he would arrive at a different set of numbers though7 again7 for him best responses would be preserved Ironically7 it is only in the induced normal form7 in which the payoffs do not correspond to any ex interim assessment of any agent7 that the Nash equilibria are meaningful Other computational techniques exist for Bayesian games which also have temporal structureithat is7 for Bayesian games written using the extensive form with chance moves UNCERTAINTYABOUT PAYOFFS BAYESIAN GAMES 67 formulation for which the game tree is smaller than its induced normal form For example there is an algorithm for Bayesian games of perfect information that generalizes backward induction de ned in Section 44 called expectimax lntuitively this algorithm is very much like the standard backward induction algorithm given in Figure 46 Like that algorithm expectimax recursively explores a game tree labeling each noneleaf node 6 with a payoff vector by examining the labels of each of 6 s child nodesithe actual payoffs when these child nodes are leaf nodesiand keeping the payoff vector in which the agent who moves at b achieves maximal utility The new wrinkle is that chance nodes must also receive labels Expectimax labels a chance node 6 with a weighted sum of the labels of its child nodes where the weights are the probabilities that each child node will be selected This is a popular algorithmic framework for building computer players for perfecteinformation games of chance such as Backgammon 74 EXPOST EQUILIBRIA Finally working with ex pm utilities allows us to de ne an equilibrium concept that is stronger than the BayerNash equilibrium De nition 741 Ex past equilibrium An ex post equilibrium i3 i1 mixea rx iztegy profie 3 tmtmz ix exVO Vi 3 e argmax51ESI EUtxf Li 0 Observe that this de nition does not presume that each agent actually doex know the others types instead it says that no agent would ever want to deviate from his mixed strategy e ven he knew the complete type vector 0 This form of equilibrium is appealing because it is unaffected by perturbations in the type distribution p0 Said another way an ex pm equilibrium does not ever require any agent to believe that the others have accurate beliefs about his own type distribution Note that a standard BayesiNash equilibrium 171 imply this requirement The ex pm equilibrium is thus similar in avor to equilibria in dominant strategies which do not require agents to believe that other agents act rationally Indeed many dominant strategy equilibria are also ex pm equilibria making it easy to believe that this relationship always holds In fact it does not as the following example shows Consider a twoeplayer Bayesian game where each agent has two actions and two corresponding types ViEN A i H L distributed uniformly ViEN P0 H 05 and with the same utility function for each agent i 10 i1 0 0 0019 2 97175 9i 0 otherwise 68 ESSENTIALS OF GAME THEORY In this game each agent has a dominant strategy of choosing the action that corresponds to his type a 0 An equilibrium in these dominant strategies is not expwt because if either agent knew the other s type he would prefer to deviate to playing the strategy that corresponds to the other agent s type a 0 Finally ex pmt equilibria do share another unfortunate similarity with equilibria in dominant strategiesithey are not guaranteed to exist CHAPTER 8 Coalitional Game Theory So far we have concentrated on what has become the dominant branch of game theory7 the soecalled noncoopemti ve variant We now conclude with an overview of caniiz iannignme tyeaiy also known as cooperative game tyeaiy As was mentioned at the beginning of Chapter 17 when we introduced noncooperative game theory the term cooperative can be misleading It does not mean that each agent is agreeable and will follow arbitrary instructions Rather it means that the basic modeling unit is the group rather than the individual agent More precisely7 in coalitional game theorywe still model the individual preference of agents7 but not their possible actions Instead7 we have a coarser model of the capabilities of different groups 81 COALITIONAL GAMES WITH TRANSFERABLE UTILITY Tn coalitional game theory our focus is on what groups of agents7 rather than individual agents7 can achieve Given a set of agents7 a coalitional game de nes how well each group or coalition of agents can do for itself We are not concerned with how the agents make individual choices within a coalition7 how they coordinate7 or any other such detail y we simply take the payoff1 to a coalition as given In this chapter we will make the tmnxfemble utiZiz y mmmpz ianithat the payoffs to a coalition may be freely redistributed among its members This assumption is satis ed whenever there is a universal currency that is used for exchange in the system7 and means that each coalition can be assigned a single value as its payoff De nition 811 Coalitional game with transferable utility A coalitional game with transe ferable utility i3 npnii N7 1 wyeie n Nix iz niz e2 361 afpinyeix indexed by i and I v 2N gt JR imacim ex witb encb caniiz ian S g N it ienie vniued paya quot vS tmz tbe coalition memnen can dixz iibm e among tyemxelvex Tbe function 1 i3 aim anZea tbe 1Alternatively one might assign cost instead of payoffs to coalitions Throughout this chapter we will focus on the case of payoffs the concepts de ned herein can be extended analogously to the case of costs 2Observe that we consider only nite coalitional games The in nite case is also considered in the literature many but not all of the results from this chapter also hold in this case 7 o ESSENTIALS OF GAME THEORY characteristic function and n canZiz ian x payoff ii ana lem in worth We mmme tmz v 0 Ordinarily coalitional game theory is used to answer two fundamental questions 1 Which coalition will form 2 How should that coalition divide its payoff among its members It turns out that the answer to 1 is often the grand coalition ithe name given to the coalition of all the agents in Nithough this answer can depend on having made the right choice about Before we go any further in answering these questions however we provide a coalitional game example to which we will refer throughout the chapter Example 812 Voting game A parliament is made up of four political parties A B C and D which have 45 25 15 and 15 representatives respectively They are to vote on whether to pass a 100 million spending bill and how much of this amount should be controlled by each of the parties A majority vote that is a minimum of 51 votes is required in order to pass any legislation and if the bill does not pass then every party gets zero to spend More generally in a voting game there is a set ofagents Nand a set ofcoalitions W g 2N that are winning coalitions that is coalitions that are suf cient for the passage of the bill if all its members choose to do so To each coalition S e W we assign vS 1 and to the others we assign vS 0 82 CLASSES OF COALITIONAL GAMES In this section we will de ne a few important classes of coalitional games which have interesting applications as well as useful formal properties We start with the notion of superadditivity a property often assumed for coalitional games De nition 821 Superadditive game Agnme G 2 IV 1 i3 superadditive iffor n S T C N yrsn T n zm su T 2 ms v3939 Superadditivity is justi ed when coalitions can always work without interfering with one another hence the value of two coalitions will be no less than the sum of their individual values Note that superadditivity implies that the value of the entire set of players the grand coalition is no less than the sum of the value of any nonoverlapping set of coalitions In other words the grand coalition has the highest payoff among all coalitional structures The voting example we gave earlier is a superadditive game COALITIONAL GAME THEORY 71 Taking noninterference across coalitions to the extreme when coalitions can never affect one another either positively or negatively then we have additive or inexxem ial games De nition 822 Additive game A game G 2 IV 1 i3 additive or inessential iffm aZZ S TC NifS T z benvSU T vSvT A related class of games is that of constantesum games De nition 823 Constant sum game Agame G 2 IV 1 i3 constant sum iffor a S C IV MS UN 5 MN Note that every additive game is necessarily constant sum but not vice versa As in noncooperative game theory the most commonly studied constantesum games are zemrmm gamer An important subclass of superadditive games are convex games De nition824 Convexgame Agame G 2 IV 1 i3 convex iffonzZZ S T C IV vS U T Z vS vT vS Clearly convexity is a stronger condition than superadditivity While convex games may therefore appear to be a very specialized class of coalitional games these games are actually not so rare in practice Convex games have a number of useful properties as we will discuss in the next section Finally we present a class of coalitional games with restrictions on the values that payoffs are allowed to take De nition 825 Simple game Agame G N 1 it simple iffor a S C IV vS e 0 1 Simple games are useful for modeling voting situations such as those described in Example 812 In simple games we often add the requirement that ifa coalition wins then all larger sets are also winning coalitions ie ifvS 1 then for all T I S vT 1 This condition might seem to imply superadditivity but it does not quite For example the condition is met by a voting game in which only 50 of the votes are required to pass a bill but such a game is not superadditive Consider two disjoint winning coalitions S and when they join to form the coalition S U T they do not achieve at least the sum of the values that they achieve separately as superadditivity requires When simple games are also constant sum they are called proper xim e gamex In this case if S is a winning coalition then N S is a losing coalition Figure 81 graphically depicts the relationship between the different classes of games that we have discussed in this section 72 ESSENTIALS OF GAME THEORY Superadditive 3 Convex Q Additive Constantsum 3 Proper simple Simple 0 FIGURE 81 A hierarchy of coalitional game classes X D Y means that class X is a superclass of class Y 83 ANALYZING COALITIONAL GAMES The central question in coalitional game theory is the division of the payoff to the grand coalition among the agents This focus on the grand coalition is justi ed in two ways First since many of the most widely studied games are superadditive the grand coalition will be the coalition that achieves the highest payoff over all coalitional structures and hence we can expect it to form Second there may be no choice for the agents but to form the grand coalition for example public projects are often legally bound to include all participants If it is easy to decide to concentrate on the grand coalition however it is less easy to decide how this coalition should divide its payoffs In this sectionwe explore avariety of solution concepts that propose different ways of performing this division Before presenting the solution concepts it is helpful to introduce some terminology First let 1 IN gtlt RIM gt IRINI denote a mapping from a coalitional game that is a set of agents Nand a Value function 1 to a vector olel real Values and let 11 N 1 denote the ith such real Value Denote such a vector of real Values as x 6 RIM Each x denotes the share of the grand coalition s payoff that agent i e N receives When the coalitional game IV 1 is understood from context we write x as a shorthand for I7 1 Now we can give some basic de nitions about payoff division De nition 831 Feasible payo Given a caniiz ianni game IV 1 tbe feasible payoff set i3 de nedm x e IRNI 216in 5 In other words the feasible payoff set contains all payoff vectors that do not distribute more than the worth of the grand coalition De nition 832 Pm imputation Given a caniiz iannignme N 1 Me preiimputation set denatedfP i3 de nedm x e IRNI 216in 2 We can View the preiimputation set as the set of feasible payoffs that are ef cient that is they distribute the entire worth of the grand coalition D2 u 0221 I m 39 mmNvlLb r set3im e neaI mx e T I W 6 N7 96 Z W0 COALITIONAL GAME THEORY 73 Under an imputation each agent must be guaranteed a payoff of at least the amount that he could achieve by forming a singleton coalition Now we are ready to delve into different solution concepts for coalitional games 831 The Shapley Value Perhaps the most straightforward answer to the question of how payoffs should be divided is that the division should be zir Let us begin by laying down axioms that describe what fairness means in our context First7 say that agents i and are intereyangea e if they always contribute the same amount to every coalition of the other agents That is7 for all S that contains neither i nor vS U vS U The xymme y axiom states that such agents should receive the same payments Axiom 834 Symmetry For any 1 ifi ana j are intereyangea e tyen WAN v gigN7 v Second7 say that an agent i is a dummy pZayer if the amount that i contributes to any coalition is exactly the amount that i is able to achieve alone That is7 for all S such that i S vS U vS The a ammypZayer axiom states that dummy players should receive a payment equal to exactly the amount that they achieve on their own Axiom 835 Dummyplayer For any 1 ifi ii a a ammypZayer tyen IIN v Finally7 consider two different coalitional game theory problems7 de ned by two different characteristic functions v1 and 1 involving the same set of agents The aa a iz i viz y axiom states that if we remodel the setting as a single game in which each coalition S achieves a payoff of v1S 12S7 the agents payments in each coalition should be the sum of the payments they would have achieved for that coalition under the two separate games Axiom 836 Additivity For any two v1 ana vg we lya vefmanypZayeri tyaz WAN v1 1 WAN U1 WAN v2 quot107676 1 66 g mf M Di v2 i3 61515715617 173 v1 12 U1 125 for every eaaZiz ian S If we accept these three axioms7 we are led to a strong result there is always exactly one preeimputation that satis es them Theorem 837 Given a caniz ianaZgame N7 1 tyere ii a aniqae prerimpataz ion MN7 v MN 1 tyat xatix ex tbe Symmetry Dammyplayer Additivity axiamx Note that our requirement that MN 1 be a preeimputation implies that the payoff division be feasible and ef cient 74 ESSENTIALS OF GAME THEORY What is this unique payoff division MM 1 It is called the SmpZey value and it is de ned as follows De nition 838 Shapley value Given a caaZiz iamngame N 1 tbs Shapley value ef gy i ii given by N v Z S1Nl ISI 1gt1vltsu m um SENVil This expression can be Viewed as capturing the average marginal contribution of agent i where we average over all the different sequences according to which the grand coalition could be built up from the empty coalition More speci cally imagine that the coalition is assembled by starting with the empty set and adding one agent at a time with the agent to be added chosen uniformly at random Within any such sequence of additions look at agent is marginal contribution at the time he is added If he is added to the set S his contribution is 1S U vS Now multiply this quantity by the Sl different ways the set S could have been formed prior to agent is addition and by the S 1 different ways the remaining agents could be added afterward Finally sum over all possible sets S and obtain an average by dividing by INll the number ofpossible orderings of all the agents For a concrete example of the Shapley Value in action consider the voting game given in Example 812 Recall that the four political parties A B C and D have 45 25 15 and 15 representatives respectively and a simple majority 51 votes is required to pass the 100 million spendingbill If we want to analyze how much money it is fair for each party to demand we can calculate the Shapley Values of the game Note that every coalition with 51 or more members has a Value of 100 million3 and others have 0 In this game therefore the parties B C and D are interchangeable since they add the same Value to any coalition They add 100 million to the coalitions 8 C C D 8 D that do not include them already and to 14 they add 0 to all other coalitions The Shapley value of A is given by m 34 112 1gt1100 0 3 4 3113 1luoo 0 1W100 100 2 02 32 4100 wwmo 0 0 25 25 50 million 3Notice that for these calculations we scale the Value function to 100 for winning coalitions and 0 for losing coalitions in order to make it align more tightly with our example COALITIONAL GAME THEORY l The Shapley value for B and by symmetry also for C and D is given by 4 2l2 1 4 3l3 1 3 100 0 100 0 2 2 100 100 0 24 gt24 833 833 1666 million Thus the Shapley values are 50166616661666 which add up to the entire 100 million 832 The Core The Shapley value de ned a fair way of dividing the grand coalition s payment among its members However this analysis ignored questions of stability We can also ask would the agents be wiZZing to form the grand coalition given the way it will divide payments or would some of them prefer to form smaller coalitions Unfortunately sometimes smaller coalitions can be more attractive for subsets of the agents even if they lead to lower value overall Considering the majority voting example while A does not have a unilateral motivation to vote for a different split A and B have incentive to defect and divide the 100 million between themselves eg dividing it 75 25 This leads to the question of what payment divisions would make the agents want to form the grand coalition The answer is that theywould want to do so if and only if the payment pro le is drawn from a set called the core de ned as follows De nition 839 Core Apaya nvectm x ii in tbe core of caaZiz imaZgame IV 1 ifand only vs g N Ex 2 ms ieS Thus a payoff is in the core if and only if no subecoalition has an incentive to break away from the grand coalition share the payoff it is able to obtain independently That is it requires that the sum of payoffs to any group of agents S g N mustbe at least as large as the amount that these agents could share among themselves if they formed a coalition on their own Notice that De nition 839 implies that payoff vectors in the core must always be imputations Since the core provides a concept of stability for coalitional games we can see it as an analog of the Nash equilibrium from noncooperative games However it is actually a stronger notion Nash equilibrium describes stability only with respect to deviation by a single agent 76 ESSENTIALS OF GAME THEORY Instead the core is an analog of the concept of xtrong Naxb equilibrium which requires stability with respect to deviations by arbitrary coalitions of agents As a notion of stability for coalitional games the core is appealing However the alert reader might have two lingering doubts arising due to its implicit de nition through inequalities 1 Is the core always nonempty 2 Is the core always unique Unfortunately the answer to both questions is no Let us consider again the Parliament example with the four political parties The set of minimal coalitions that meet the required 51 votes is 14 B 14 C 14 D and B C D We can see that if the sum of the payoffs to parties B C and D is less than 100 million then this set of agents has incentive to deviate On the other hand if B C and D get the entire payoffof100 million then 14will receive 0 and will have incentive to form a coalition with whichever of B C and D obtained the smallest payoff Thus the core is empty for this game On the other hand when the core is nonempty it may not de ne a unique payoff vector either Consider changing our example so that instead of a simple majority an 80 majority is required for the bill to pass The minimal winning coalitions are now 14 B C and 14 B D Any complete distribution of the 100 million among parties 14 and B nowbelongs to the core since all winning coalitions must have both the support of these two parties These examples call into question the universality of the core as a solution concept for coalitional games We already saw in the context of noncooperative game theory that solution conceptsinotably the Nash equilibriumido not yield unique solutions in general Here we are in an arguably worse situation in that the solution concept may yield no solution at all Luckily there exist several results that allow us to predict the emptiness or nonemptiness of the core based on a coalitional game s membership in one of the classes we de ned in Section 82 Theorem 8310 Every eomtantixam game tyat i3 not additive lyax an empty core We say that a player i is a vetopZayei if vN 0 Theorem 8311 In a ximpZe game tbe core i3 empty tyeie i3 no veto pZayei Iftbere are veto pZayerx tbe core eomixtx ofaZZpayofveeton in wyieb tbe nonveto playerx get zero Theorem 8312 Every convex game bay a nonempty core A nal question we consider regards the relationship between the core and the Shapley value We know that the core may be empty but if it is not is the Shapley value guaranteed COALITIONAL GAME THEORY 77 to lie in the core The answer in general is no7 but the following theorem gives us a suf cient condition for this property to hold We already know from Theorem 8312 that the core of convex games is nonempty The following theorem further tells us that for such games the Shapley Value belongs to that set Theorem 8313 In every convex game tbe SmpZey thue ii in tbe core History and References There exist several excellent technical introductory textbooks for game theory including Osborne and Rubinstein 1994 Fudenberg and Tirole 1991 and Myerson 1991 The reader interested in gaining deeper insight into game theory should consult not only these but also the most relevant strands of the the vast literature on game theory which has evolved over the years In their seminal book von Neumann and Morgenstern 1944 introduced the normal form game the extensive form the concepts of pure and mixed strategies as well as other notions central to game theory and utility theory Schelling 1960 was one of the rst to show that interesting social interactions could usefully be modeled using game theory for which he was recognized in 2005 with a Nobel Prize The literature on Pareto optimality and social optimization dates back to the early twentieth century including seminal work by Pareto and Pigou but perhaps was best established byArrow in his seminal work on social choice Arrow 19 70 John Nash introduced the concept of what would become known as the Nash equilibrium Nash 1950 Nash 1951 without a doubt the most in uential concept in game theory to this date Indeed Nash received a Nobel Prize in 1994 because of this work1 In 1928 von Neumann derived the maximin solution concept to solve zeroesum normal form games von Neumann 1928 Nash s proof that all noncooperative games have equilibria Nash 1950 Nash 1951 opened the oodgates to a series of re nements and alternative solu tion concepts which continues to this day We covered several of these solution concepts The minimax regret decision criterion was rst proposed by Savage 1954 and further developed in Loomes and Sugden 1982 and Bell 1982 Recent work from a computer science per spective includes Hya l and Boutilier 2004 which also applies this criterion to the Bayesian games setting we introduce in Chapter 7 Iterated removal of dominated strategies and the closely related rationalizability enjoy a long history though modern discussion of them is most rmly anchored in two independent and concurrent publications Pearce 1984 and Bernheim 1984 Correlated equilibria were introduced in Aumann 1974 Myerson s quote is taken from Solan and Vohra 2002 Tremblingehand perfection was introduced in Selten 1975 1John Nash was also the topic of the Oscarewinning 2001 movie A Beautfu Mind however the movie had little to do with his scienti c contributions and indeed got the de nition ofNash equilibrium wrong 80 ESSENTIALS OF GAME THEORY The concept of evolutionarily stable strategies ESSs again has a long history but was most explicitly put forward in Maynard Smith and Price 1973iwhich also introduced the HawkiDove gameiand gured prominently a decade later in the seminal Maynard Smith 1982 Experimental work on learning and the evolution of cooperation appears in Axelrod 1984 It includes discussion of a celebrated tournament among computer programs that played a nitely repeated Prisoner s Dilemma game and in which the simple TiteforeTat strategy emerged victorious The earliest gameetheoretic publication is arguably that onermelo7 who in 1913 introe duced the notions of a game tree and backward induction and argued that in principle chess admits a trivial solution Zermelo 1913 It was already mentioned earlier that extensiveeform games were discussed explicitly in von Neumann and Morgenstern 19447 as was backward induction Subgame perfection was introduced by Selten 19657 who received a Nobel Prize in 1994 The Centipede game was introduced by Rosenthal 1981 many other papers discuss the rationality ofbackward induction in such games Aumann7 1995 Binmore7 1996 Aumann7 1996 In 1953 Kuhn introduced extensiveeform games of imperfect information7 including the distinction and connection between mixed and behavioral strategies Kuhn7 1953 Sequential equilibria were introduced by Kreps and Wilson 1982 Here7 as in normaleform games7 the full list of alternative solution concepts and connection among them is long7 and the interested reader is referred to Hillas and Kohlberg 2002 and Govindan and Wilson 2005 for a more extensive survey than is possible here Some of the earliest and most in uential work on repeated games is Luce and Raiffa 1957 and Aumann 1959 Of particular note is that the former provided the main ideas behind the folk theorem and that the latter explored the theoretical differences between nitely and in nitely repeated games Aumann s work on repeated games led to a Nobel Prize in 2005 Our proof of the folk theorem is based on Osborne and Rubinstein 1994 Stochastic games were introduced in Shapley 1953 The state of the art regarding them circa 2003 appears in the edited collection Neyman and Sorin 2003 Filar and Vrieze 1997 provide a rigorous introduction to the topic7 integrating MDPs or singleeagent stochastic games and twoeperson stochastic games Bayesian games were introduced by Harsanyi 19671968 in 1994 he received a Nobel Prize7 largely because of this work In the early days of game theory research7 coalitional game theory was a major focus7 particularly of economists This is partly because the theory is closely related to equilibrium analysis and seemingly bridges a gap between game theory and economics Von Neumann and HISTORYAND REFERENCES 81 Morgenstern for example devoted more than half of their classic text Tyemy ofGamex and Economic Bem viar von Neumann and Morgenstern 1944 to an analysis ofcoalitional games A large body of theoretical work on coalitional game theory has focused on the development of solution concepts possibly in an attempt to explain the behavior of large systems such as markets Solid explanations of the many solution concepts and their properties are given by Osborne and Rubinstein 1994 and Peleg and Sudholter 2003 References Arrow7 K 1970 Social cyoice ana ina i via39aaZ valaex New Haven7 CT Yale University Press Aumann7 R 195 9 Acceptable points in general cooperative neperson games Contributions to tbe Tyeoiy ofGaniex7 47 287324 Aumann7 R 1974 Subjectivity and correlation in randomized strategies journal ofMatber inaticaZEconomicr7 I 67961w 39I3939Iquot uh I Luau T mu39 Aumann7 R 1995 Backward induction and common knowledge of rationality GEE Gamer ana EconomicBeba vioi7 817 619 I H39 ll 1 39 3945quot 39 quot I W 39 Aumann7 R 1996 Reply to Binmore GEE Gamer and Economic Beya vioi7 1717 138146 39 Il39lquotquot 39 4mm 39 quot Iquotquot quot Axelrod7 R 1984 Tbe e voZation ofcoopeiation New York Basic Books Bell7 D E 1982 Regret in decision making under uncertainty Operationx Rereaicb7 30 9617981 Bernheim7 B D 1984 Rationalizable strategic behavior Econoinetiica7 52 100771028 39 Iquot39 quot 1quot l39 quot Binmore7 K 1996 A note on backward induction GEE Gamer ana Economic Beya vioi7 I 717 135137h 39II39 Iquot quot 39HII39 11quot quotquot quot Filar7 J and Vrieze7 K 1997 Competitive Markov a ecixion piocexxex SpringereVerlag Fudenberg7 D7 and Tirole 1991 Game tyeoiy Cambridge7 MA MIT Press Govindan7 S7 and VVilson7 R 2005 Re nements of Nash equilibrium In S Durlauf and L Blume Eds7 Tbe new PaZgia ve a ictionaiy ofecononiics7 vol 11 New York Macmillan Harsanyi7 19671968 Games with incomplete information played by Bayesian players7 parts I H and 111 Management Science7 147 1591827 3203347 486502 Hillas7 J and Kohlberg7 E 2002 Foundations of strategic equilibrium In R Aumann and S Hart Eds7 Handbook ofgame tyeoiy7 vol 1H chapter 427 15971663 Amsterdam Elsevier1I 39In39quot ul II39 III39L Iquotquotquot39 39 Hya l7 N7 and Boutilier7 C 2004 Regret minimizing equilibria and mechanisms for games with strict type uncertainty UAI Proceedingx oftbe Conference on Uncertainty in Artyfciat Intelligence pp 268277 Kreps7 D7 and VVilson7 R 1982 Sequential equilibria Econoi netiica7 50 863894 t lquot II 1quot Kuhn7 H 1953 Extensive games and the problem of information Contributions to tbe Tyeoiy ofGamex pp 193216 Princeton7 NJ Princeton University Press Reprinted in H Kuhn Ed7 Clam icx in Game Tyeoiy7 Princeton7 NJ Princeton University Press7 1997 84 ESSENTIALS OF GAME THEORY Loomes7 G and Sugden7 R 1982 Regret theory An alternative theory of rational choice under uncertainty Economicjournut7 92 805824 39139 j Iquot l ltI39 39 Luce7 R7 and Raiffa7 H 1957 Gurney uno decixzonx New York John Wiley and Sons Maynard Smith7 1982 Evolution uno tbs tyeory ofgumex Cambridge University Press Maynard Smith7 J and Price7 G R 1973 The logic of animal con ict Nuture 2467 1518 ll 39 39quot l 39 39 Myerson7 R 1991 Gums tyeoryAnuyxix ofcon ict Harvard Press Nash7 1950 Equilibrium points in neperson games Proceedingx ofth NutionuZAcuo emy of Sciencex USA7 36 4849 Reprinted in H Kuhn Ed7 CZum39cx in Gums Tyeory Princeton7 5 NJ Princeton University Press7 1997 39 I I quot mm quot39 J Nash7 1951 Nonecooperative games AnnuZx of Mutyemutim 54 286295 zlwquotw Neyman7 A7 and Sorin7 S 2003 Stocbuxtic gumex uno uppticutionx Kluwer Academic Press Osborne7 M J and Rubinstein7 A 1994 A nouns in gums tyeory Cambridge7 MA MIT Press Pearce7 D 1984 Rationalizable strategic behavior and the problem of perfection Econometr 51175210291050 II 1IT l39 I I 39 39 Peleg7 13 and Sudholter7 P 2003 Introduction to tbe tyeory of cooperative gumex Kluwer Academic Publishers Rosenthal7 R 1981 Games of perfect information7 predatory pricing and the chainestore paradoxou7nuofEconomic Tyeory 2517 92100 ilwle39quot quot I39 If I1 Z39 M quotI 39 Savage7 L 1954 Tyefouno utionx ofxtutixticx New York John Wiley and Sons 2nd edition7 Mineola7 NY Dover Press7 1972 Schelling7 T C 1960 T65 xtmtegy ofcon ict Cambridge7 MA Harvard University Press Selten7 R 1965 Spieltheoretische Behandlung eines Oligopolmodells mit Nachfragee traegheit Zeitxcyri fur die gexumte Stuutxwixxenxcbu 12 301324 Selten7 R 1975 Reexamination of the perfectness concept for equilibrium points in extensive games Internationutjournut oqume Tyeory 47 2555 ll39 39I II 397 W I II I Shapley7 L S 1953 Stochastic games Proceedings ofth NutionuZ Acuo emy ofSciencex 39 1095711005 Illquotquot v illw I I I IIquot Shoham7 Y7 and LeytoneBrown7 K 2008 Multiugent xyxtemxAgo7itbmic gamertbeoretic uno ogicutfoundutionL Cambridge UK Cambridge University Press Solan7 E7 and Vohra7 R 2002 Correlated equilibrium payoffs and public signalling in absorbe ing games Internationuljournut oqume Tyeory 3 91121 I II39 39I I x I quot F Iquot II II von Neumann7 1928 Zur Theorie der Gesellschaftsspiele Mutyemutixcbe AlnnuZen7 1007 2957320 39 l I I I39IT Ill39I 39 HMIT von Neumann7 J and Morgenstern7 O 1944 Tyeory ofgumex undeconomic oeuwior Princeton7 NJ Princeton University Press Zermelo7 E F F 1913 Uber eine Anwendung der Mengenlehre auf die Theorie des Schachspiels Fi b Internationut Congm x ofMutyemuticiuns7 H 501504 Index eeNash equilibrium 27 2728 action 1 3 477 10716 19721 23 25727 31733 35 36 38 39 41 42 45 46 50 51 53 54 57 61 62 64767 69 action pro le 3 4 10 13 19 20 53 63 additive game 71 alphaebeta pruning 39 average reward 51 5 255 Backoff game 2 backward induction 35 38 39 50 67 80 Battle ofthe Sexes game 6 7 12 15 24 25 59 Bayes rule 47 BayesiNash equilibrium 64 65 67 ex part xee ex pm equilibrium Bayesian game 357 5859 60 61 6267 79 80 behavioral strategy extensive form game 43 44 45 stochastic game 54 best response 10 11 13 14 16 18 19 21 2324 27 28 30 34 36 44746 52 64 66 in a Bayesian game 64 66 Centipede game 39 50 80 characteristic function 70 73 ChurchiRosser property 22 coalitional game theory 1 6977 81 coalitional game with transferable utility 69 commonepayoffgame 4 5 10 commoneprior assumption 58 constantesum game coalitional 71 76 noncooperative 5 convex game 71 76 77 Coordination game 5 59 core 75 76 77 correlated equilibrium 24 25 26 descendant 32 37 dominant solvable xee solvable by iterated elimination dominant strategy 20 21 43 44 50 67 68 dominated strategy 20 21 22 24 28 79 dummy player 5 73 ef ciency coalitional 72 73 empirical frequency 14 epistemic type 58 61 equilibrium xee solution concept equilibrium in dominant strategies 20 evolutionarily stable strategy 2830 weak 29 expwt equilibrium 67 expected utility 7 8 16 17 25 47 6264 66 ex Wife 63 64 65 ex interim 63 64 expoxz 62 63 expectimax algorithm 67 86 INDEX extensive form game 3148 80 imperfect information 41 4148 60 perfect information 32 3141 80 with chance nodes 6061 67 feasible payoff 55 72 xedepoint theorem 13 folk theorem 5152 55 80 fully mixed strategy 7 26 27 46 47 future discounted reward 51 53 game in matrix form 355 normal form game game in strategic form 355 normal form game game of incomplete information 355 Bayesian game game tree 355 extensive form game perfect information HawkiDove game 29 80 history 32 50 54 imperfecteinformation game 355 extensive form game imperfect information imputation 72 73 75 induced normal form 31 6467 inessential game 71 information set Bayesian game 5962 extensive form game 41 42 4447 interchangeable agents 73 irreducible stochastic game 55 Markov decision problem MDP 53 80 Markov game 53 Markov perfect equilibrium MPE 54 Markov strategy 54 Matching Pennies game 5 6 13 17 18 23 59 matrix form game 355 normal form game maximum regret 19 20 maxmin strategy 15 1619 maxmin value 15 1618 mechanism design 21 62 minimax algorithm 39 minimax regret 18 19 20 79 minmax strategy 15 16 17 18 52 minmax value 16 17 18 52 55 mixed strategy 7 8 1115 17 2022 24 26 28730 34 35 43745 52 62 63 67 79 support of 7111214 30 mixedestrategy pro le 7 8 16 27 46 47 63 64 67 Nash equilibrium 11 976 6 355 eeNash equilibrium Bayes 355 BayerNash equilibrium strict 11 13 weak 11 12 normal form game 3 39 15 31 34 3543 4649 64767 optimal strategy 9 Pareto domination 10 20 Pareto optimality 9 10 20 79 payoff function 3 49 53 61 perfect equilibrium 355 tremblingehand perfect equilibrium perfect recall 44 45 46 48 perfecteinformation game 355 extensive form game perfect information preeimputation 72 73 Prisoner s Dilemma game 2 34 6 20 22 23 35 42 43 49751 59 60 80 proper equilibrium 27 proper simple game 71 pure coordination game 4 5 pure strategy 6 7 1014 17 21 23 24 26 32 33 35 42744 46 62 64 65 pureestrategy pro le 6 7 65 rationaliZable strategy 23 24 regret 19 repeated game 14 16 4953 55 80 Rochambeau game 355 Rock Paper Scissors game Rock Paper Scissors game 5 6 security level 15 sequential equilibrium 46 4548 Shapley value 74 7577 Sharing game 32 33 simple game 71 76 singleecontroller stochastic game 54 solution concept 9 10 11 15 16 20 21 24 26 27 31 48 58 72 73 76 79781 coalitional core 355 core Shapley value 355 Shapley value noncooperative eeNash equilibrium 355 eeNash equilibrium BayesiNash equilibrium 355 BayesiNash equilibrium correlated equilibrium 355 correlated equilibrium INDEX 87 dominant solvable 355 dominant solvable equilibrium in dominant strategies 355 equilibrium in dominant strategies maxmin strategies 355 maxmin strategy minmaX strategies 355 minmaX strategy Pareto optimality 355 Pareto optimality perfect equilibrium 355 tremblingehand perfect equilibrium proper equilibrium 355 proper equilibrium rationaliZable strategies 355 rationaliZable strategy sequential equilibrium 355 sequential equilibrium strong Nash equilibrium me strong Nash equilibrium subgameeperfect equilibrium 355 subgameeperfect equilibrium solvable by iterated elimination 22 stage game 49 50 51 53 stationary strategy 50 54 stochastic game 53 54 55 80 strategic form game 355 normal form game strategy behavioral 355 behavioral strategy Markov 355 Markov strategy mixed 355 mixed strategy pure 355 pure strategy stationary 355 stationary strategy trigger 355 trigger strategy strict domination 20 22 strict Pareto ef ciency 10 355 Pareto optimality strong Nash equilibrium 76 subgame 31 3538 4548 50 53 54 80
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'