### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Class Note for ECON 702 at UMass(4)

### View Full Document

## 11

## 0

## Popular in Course

## Popular in Department

This 16 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 11 views.

## Similar to Course at UMass

## Popular in Subject

## Reviews for Class Note for ECON 702 at UMass(4)

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

VI Repeated games In this chapter we study a class of extensive form games in which a group of players interacts repeatedly with the same strategic game in each period General framework Let G NA7 ul denote a simultaneous move game of complete information If the same game G is played for many periods then we call the corresponding game a repeated game of G and denote it by GT where T is the number of periods the same game is repeated If T is nite then the game is called a finitely repeated game ifT is infinite then it is called an in nitely repeated game G is called the stage game of the repeated game A repeated game is usually analyzed with the discounted sum of the payoff from the T T stage games KO Z5t lulanain where 5 6 01 If the game is repeated 21 T infinitely then Kan lim Z5Hul an a We assume all players have the same Taco 1 discount factor Question For any given stream of payoffs ul a1ul a2ul a3 is there a value of 0 such that the player i is indifferent between the stream ul all u 612 u 613 and 1 67 the constant stream ccc The discounted sum of ccc 1s 6 c 1fT is finite 1 c if T is infinite Thus 0 and VT ifT is nite and c l 5Voo ifT is 6739 infinite We call such 0 the discounted average of the stream ul a1ul a2 ul a3 1 Because the factors 6739 or 1 5 are constant the discounted sum and discounted average represent the same preferences given value of 5 Note People s preferences do not necessarily take this form An alternative form which is widely used is the average payoffs obtained or its limit for a sufficiently long period of time T 1 moo liITn inf ZFMI 61126171 9 21 Analyzing repeated games using a strategy space is quite cumbersome because a strategy usually involves a large tuple of actions We will take a simpler notation Let at be a pro le of actions taken in period t and am a awn at an action taken in period tl by player i Repeated prisoner s dilemma game Consider the following version of Prisoner s Dilemma where y gt x gt 1 C D C D C XX 0y C 22 03 D y0 11 D 30 11 As we know this game has a unique Nash equilibrium D D What happens if the same game is played for a sufficiently long period of time You may recall that there are generally many Nash equilibria in extensive form games The same is true in repeated games Instead of studying all of them we will pay closer attention to a few possible candidates 1 The first notable thing is that the strategy pair in which each player chooses D after every history is still a Nash equilibrium of the repeated game whether the game is repeated finitely or in nitely The reason is simple if one player adopts this strategy then the other player can do no better than to adopt the strategy herself regardless of how she values the future because choosing D is in her shortterm interest and choosing C has no effect on the other player s future behavior 2 Also if the stage game is repeated finitely this is the only outcome path generated by all Nash equilibria1 1 A player s strategy may specify an action other thanD for histories for which the outcome in some period is not DD but the outcome generated by any Nash equilibrium strategy pair is DD in every period Under S1 32 Under Sly Si t C X t1 D t2 D Consider a strategy pair S1 32 that generates an outcome path in which at least one player s action say player l s action is different from D in at least one period Denote by t the last period in which the outcome is not D D and suppose in period t player l s action is C We will show that this strategy pair is not a NE because player 1 can pro tably deviate from S1 Consider for example a strategy pair Si 32 Where 3139 is identical to 91 except that from period t on it chooses D for every history Then the outcome path generated by the strategy pair 3139 Si differs from the outcome path generated by S1 32 only in player l s action in period t which is D rather than C and possibly in player 2 s action in period tl and later Also player 2 s action in periodt is the same under 3132 as it is under 3132 because 3139 differs from S1 only in its prescriptions for period t and later In period tl and later player l s action in both cases is D Thus player l s payoff is the same under both strategy pairs through period tl is higher under 3139 32 than it is under 3132 in period t and is at least as high under Sly Si as it is under 3132 from period tl on This result can be generalized to any nitely repeated game for which the pro le of payoffs to every Nash equilibrium in the stage game is the pro le of the players minimax payoffs Theorem Consider an arbitrary stage game G for which the profile of payoffs to every Nash equilibrium is the profile of the players minimax payoffs Then for any finite T every Nash equilibrium of GT generates the same outcome path in which the outcome in every period is a Nash equilibrium of G Note Unfortunately many stage games have a Nash equilibrium in which at least one player s payoff exceeds her minimax payoff In that case we have the following th eorem Theorem Benoit and Krishna Let G be a stage game that for every player has a Nash equilibrium in which that player s payoff exceeds her minimax payoff Suppose v is a feasible payoff profile of G for which each player s payoff exceeds her minimax payoff Then for every 8 gt 0 there exists T such that if T gt T the repeated game of G has a Nash equilibrium whose average payoff is within 8 of v1 for each i 3 Every SPNE is a Nash equilibrium so we know that every SPNE of a nitely repeated Prisoner s dilemma generates outcome D D in every period But for a SPNE we can further restrict the strategies Indeed a finitely repeated Prisoner s dilemma has a unique SPNE in which each player s strategy chooses D in every period regardless of history Consider for example a two stage prisoner s dilemma For each possible outcome of the first stage game the second stage game has a unique NE D D regardless of the first stage outcome By folding back we have the following game in stage 1 Thus the outcome in stage 1 is again D D By the logic of backward induction this is true for every nite T C D C x5 x5 05 y5 D y6 0d 16 16 Generally we have the following theorem Theorem If the stage game G has a unique NE then for any finite T the repeated game GT has a unique subgame perfect outcome ie the NE of the stage game G is played in every stage Note Some stage games have multiple Nash equilibria In that case we have the following theorem Theorem Benoit and Krishna Let G be a stage game that for every player has a Nash equilibrium in which that player s payoff exceeds her minimax payoff Suppose v is a feasible payoff profile of G for which each player s payoff exceeds her minimax payoff Finally assume that the dimension of the set of feasible payoffs is equal to the number of players Then for every 8 gt 0 there exists T such that if T gt T the repeated game of G has a SPNE whose average payoff is within 8 of v1 for each i 4 But if the stage game is infinitely repeated there are other Nash equilibria in which cooperation is sustained at the equilibrium The main idea here is that a player may be deterred from exploiting her shortterm advantage by the threat of punishment that reduces her long term payoff Consider for example the following strategy Start with C and choose C as long as the other player chooses C39 if in any period the other player chooses D then choose D in eve subseguent period This strategy is called a grim trigger strategy because a single defection by the opponent triggers relentless or grim retaliation How should a player respond if her opponent uses this strategy If she chooses C in every period then the outcome is C C and her payoff is x in every period If she switches to D in some period then she obtains a payoff of y gt x in that period a short term gain and payoff ofl in every subsequent period a longterm loss As long as the value she attaches to future payoffs is not too small compared with the value she attaches to her current payoff the stream of payoffs y l l is worse than the stream x x x so that she may be better off choosing C in every period than she is switching to D in some period The question is how patient should the player be Question What is the intuition behind the fact that an in nitely repeated game may resolve Prisoner s dilemma Why doesn t the possibility of punishment affect the outcome in a finitely repeated game The answer lies in the fact that the backward induction argument applied to a finitely repeated game fails because there is 110 last period in an infinitely repeated game In a finitely repeated game people do not worry about retaliation in the last period In an infinitely repeated game however there is no ending period In the finitely repeated game GT a subgame beginning at stage tl is the repeated game in which G is played Tt times In the infinitely repeated game Goo each subgame beginning at stage tl is identical to the original game Goo Some eguilibriua in repeated games The strategy space in repeated games is very large it is impossible to examine them all We will pay closer attention to the following strategies Our purpose to examine whether a strategy pair using these strategies constitutes a NE in repeated games 1 Continuous defection strategy Start with D and continue to choose whatever the other player chooses 2 Grim trigger strategy Start with C and continue to choose C as long as the other player chooses C Ifthe other player chooses D in any period then choose D in every subsequent period ampC DD ND 3 Modified grim trigger strategy Start with C and choose C as long as both players choose C If either player including himselfherself chooses D in any period then choose D in every subsequent period ampC DD D orD 3 The main difference between this and the grim trigger strategy is the following after any deviation the miscreant chooses D in every period without waiting for the other player to deviate so that her opponent is better off punishing her by choosing D than choosing C In this case a player s strategy punishes her opponent if her opponent does not punish her for the deviation 4 Limited Eunishment strategy Start with C and continue to choose C as long as the other player chooses C Ifthe other player chooses D in any period then choose D for k subsequent periods and come back to C in period kl no matter how the other player behaves during her punishment The grim trigger strategy is a special case of the limited punishment strategy in which 1600 11 C P2 D All All All Pk D All 5 Modi ed limited punishment strategy Start with C and continue to choose C as long as the other player chooses C If either player chooses D in any period then choose D for k subsequent periods and come back to C in period kl no matter how the other player behaves during her punishment P1 C P1 D P2 D All All All a D or D Pk D All 6 Tit for tat strategy Start with C and continue to choose C as long as the other player chooses C Ifthe other player chooses D in any period then choose D in the next period but thereafter choose the action that the other players chose in previous period If the other player continues to choose D then titfortat continues to do so if she reverts to C then titfor tat reverts to C also Thus it makes the length of the punishment depend on the behavior of the player being punished JC k There are many other strategies that involve a certain amount of punishment but we will not pursue them here We know that a Nash equilibrium in an extensive form game may entail threats that are not credible The same may be true in repeated games For the equilibria to be subgame perfect these threats must be credible each player must have an incentive to punish the other player if she deviates A strategy pair in an extensive form game is a SPNE if the strategy pair it induces in every subgame is a NE of the subgame To check this condition in a repeated game is difficult Thus we will use the following result Theorem One deviation property A strategy profile in a repeated game is a SPNE if and only if no player can increase her payoff by changing her action at the beginning of any subgame in which she is the firstmover given the other player s strategies and the rest of her own strategy We have already shown that each player choosing the continuous defection strategy is a NE whether a game is nitely repeated or in nitely repeated The continuous defection strategy is also a SPNE whatever happens each player chooses D so it is optimal for the other player to do likewise We will now discuss other strategies 1 Each player choosing the grim trigger strategy is a NE if both players are sufficiently patient We first show that player 2 s using C after every history is a best response to player l s using the grim trigger strategy Suppose player 1 uses the grim trigger strategy If player 2 uses C after every history then the outcome is C C in every period so that she obtains the stream of payoffs x x x whose discounted average is x If player 2 adopts a strategy under which her action is D in at least one period then in all subsequent periods player 1 chooses D under this situation the best outcome for player 2 is obtained by choosing D in every subsequent period as well Thus the best stream of payoffs from such a deviation is y l 1 whose discounted average is l 5y 5 52 l 5y yl 5 5 Thus player 2 s using always C strategy is a best response to player l s using the grim trigger strategy if x Z yl 5 5Q 52 ylc If y3 and X2 for instance then 5 2 Note that the lower bound of 5 increases as y the short tem gain increases and X decreases This is because a player is more likely to deviate the greater the short term gain and the smaller the payoff from the cooperation Also given the assumption that y gt x gt 1 the lower bound always lies strictly between 0 and 1 Now we show that player 2 s using the grim trigger strategy notjust using always Cstrategy is a best response to player l s using the grim trigger strategy If player 1 uses the grim trigger strategy then the outcome of player 2 s using the grim trigger strategy is the same as the player 2 s using the always C strategy On the other hand if player 2 deviates from the grim trigger strategy involving the unprovoked use of D while player 1 uses the grim trigger strategy then the implication is similar to the deviation from the always Cstrategy Each player choosing the grim trigger strategy is not a SPNE Consider the subgame following the outcome C D Suppose that player 1 adheres to the grim trigger strategy in the subgame Iwill show that subsequent adherence to the grim trigger strategy is not optimal for player 2 If player 2 adheres to the grim trigger strategy then the outcome is D C in the first period of the subgame and in every subsequent period of the subgame the outcome is DD Thus player 2 s discounted average payoff in the subgame is l 50 5 52 5 If player 2 deviates from the grim trigger strategy and chooses D in every period of the subgame then the outcome is DD in every period of the subgame and her discounted average payoff is l tl t tl t2 Player 1 C C D D Player 2 C D C D 2 Each player choosing the modified grim trigger strategy is a NE if both players are suf ciently patient Suppose all the preceding outcomes have been C C Given that player 1 is using the modi ed grim trigger strategy player 2 will adopt the modified grim trigger strategy yielding xx If player 2 adopts a strategy under which her action is D in at least one period then in all subsequent periods player 1 chooses D Since player 1 will use D forever player 2 s best response is also playing D for ever Thus the modi ed grim trigger strategy is a best response to the modified grim trigger strategy if and only if 5 2 y x y l Each player choosing the modified grim trigger strategy is a SPNE if both players are suf ciently patient If both players use this strategy the outcome path in any subgame consists of either C C in every period or D D in every period It suf ces to shows that in each case neither player has an incentive to change her action in the first period of the subgame given the remainder of her strategy and her opponent s strategy Recall the onedeviation peroperty a Consider a subgame starting from C C When both players follow the modified grim trigger strategy in the subgame the outcome is C C in every period of the subgame yielding each player a discounted average payoff of X If player 1 deviates from this strategy and chooses D in the first period of the subgame but otherwise follows the strategy then the outcome is D C in the rst period of the subgame and D D subsequently yielding player 1 the discounted average payoff of l 5y 5 52 yl 5 5 Thus the deviation is not pro table if 5 Z ylc A y similar argument applies to a deviation by player 2 b Consider a subgame starting from D D When both players follow the modified grim trigger strategy in the subgame the outcome is D D in every period yielding each player a discounted average payoff of 1 For any discount factor no player can profitably deviate from this because D D is a Nash equilibrium of the game and choosing C rather than D has no effect on the subsequent outcomes 10 3 Each player choosing the limited punishment strategy is a NE if both players are sufficienth patient and the punishment period is sufficienth long Suppose player 1 uses the limited punishment strategy If player 2 uses C after every history then the outcome is C C in every period so that she obtains the stream of payoffs x x x If player 2 adopts a strategy that generates a different outcome path then in at least one period her action is D Denote by t the first period in which player 2 chooses D The player 1 chooses D from tl through tk regardless of player 2 s choices so that player 2 also chooses D in these periods In period tkl player 1 switches back to C regardless of player 2 s action in period tk and player 2 faces precisely the situation she faced at the beginning of the game t tl tk tkl Player 1 C D D C Player 2 D D D D Thus if this deviation yields her a greater payoff than does the limited punishment strategy it does so from period t to tk The stream of payoffs for player 2 is y l 1 whose discounted sum is y 5 52 5k Thus the limited punishment strategy is a best response to itself if and only if x 6x 52xx5k Z y 6 62 6k which is equivalent to x1 6k12 y1 6 60 W or x 16k y 16y xs 0 Now we show that player 2 s using the limited punishment strategy not just using always Cstrategy is a best response to player l s using the limited punishment strategy If player 1 uses the limited punishment strategy then the outcome of player 2 s using the limited punishment strategy is the same as the outcome of player 2 s using always C strategy On the other hand if player 2 deviates from the limited punishment strategy involving the unprovoked use of D while player 1 uses the limited punishment strategy then the implication is similar to the deviation from the always Cstrategy In the case where y 3 and x 2 this condition is 5 251S 0 If k l then no value of 5 lt1 satisfies the inequality one period punishment is not severe enough to 11 discourage a deviation If k2 then the inequality is satis ed for 5 Z 062 and if k3 it is satisfied for 5 Z 055 One can show that as k gt 00 the lower bound on 5 approaches y x y l the lower bound for the grim trigger strategy Each player choosing the limited punishment strategy is not a SPNE Consider the subgame following the outcome C D Suppose that player 1 adheres to the limited punishment strategy in the subgame We show that subsequent adherence to the limited punishment strategy is not optimal for player 2 If player 2 adheres to the limited punishment strategy then the outcome is D C in the first period of the subgame and in every subsequent period of the subgame the outcome is DD up until period tk Thus player 2 s discounted average payoff in the subgame is y 0 52 5k If player 2 deviates from the limited punishment strategy and chooses D in every period of the subgame then player 2 s discounted average payoffis y552 5k t tl tk tkl Player 1 C D D D C Player 2 D C D D D 4 Each player choosing the modified limited punishment strategy is a NE if both players are sufficiently patient and the punishment period is sufficiently long Proof omitted Each player choosing the limited punishment strategy is a SPNE if both players are sufficiently patient and the punishment period is sufficiently long Proof omitted 5 Each player choosing the titfortat strategy may or may not be a NE Suppose that player 1 adheres to the titfortat strategy If player 2 uses C after every history then her discounted average payoff is X If player 2 adopts a strategy that generates a different outcome path then in at least one period her action is D If player 12 2 s best response to player l s titfortat chooses D in some period then it either alternates between D and C or chooses D in every period2 If player 2 alternates between D and C then her stream ofpayoffs is y 0 y 0 with the discounted average of l 5 y y 1 62 16 If she chooses D in every period her stream of payoffs is y l l with the discounted average of yl 5 5 Thus titfortat is a best response to titfortat if and only if x 2 and 15 x2 yl 5 5 or equivalently 52 y x and 52 y If y l x these two x y inequalities are identical and we have the same lower bound for 5 as in the grim trigger strategy But there are cases in which the titfortat is not a NE If y 2 2x for example y x x then 2l so that the rst inequality is not satisfied for any 5 lt 1 Thus titfortat is not a NE for any value of 5 lt1 in this case Each player choosing the titfortat trigger strategy is not a SPNE in general The behavior in a subgame of a player who uses titfortat depends only on the last outcome in the history that precedes the subgame Thus we need to consider four types of subgame following histories in which the last outcome is C C C D D C and D D a a subgame following a history ending in C C This case is already covered by our analysis of Nash equilibrium Titfortat is a best response to titfortat in such a y x and 52 y x x y l subgame if and only if 5 2 b a subgame following a history ending C D Suppose player 2 adheres to tit fortat If player 1 also adheres to titfortat then the outcome alternates between D C 2 Proof Denote by t the first period in which player 2 chooses D Then player 1 chooses D in period tl and continues to choose D until player 2 reverts to C Thus player 2 has two options from period tl she can revert to C in which case in period t2 she faces the same situation she faced at the beginning of the game or she can continue to choose D in which case player 1 will continue to do so too 13 and C D yielding player l s discounted average payoff in the subgame of l 5y 52y y54 If player instead chooses C in the rst period of the subgame and subsequently adheres to titfortat then the outcome is CC in every period of the subgame yielding player l s discounted average payoff of x Thus player l s titfortat is optimal against player 2 s titfortat if and only 2 x or equivalently Now suppose that player 1 adheres to titfortat If player 2 also adheres to titfortat the outcome alternates between D C and C D yielding player 2 s discounted average payoff in the subgame of l 55y 53yy55 If player 2 instead chooses D in the first period of the subgame and subsequently adheres to titfortat the outcome is D D in every period of the subgame yielding player 2 s discounted average payoff of 1 Thus player 2 s titfortat is optimal against player l s titfortat if and only 5 1 y 2 l or equivalently 5 Z 1 5 y l c a subgame following a history ending D C The argument is the same as for a history ending in C D except that the roles of the player are reversed Thus we have and 5Sy x y l x 52 d a subgame following a history ending D D The outcome is D D in every period if both players adhere to titfortat yielding each player a discounted average payoff of 1 If either player deviates to C in the first period of the subgame and subsequently adheres to titfortat the outcome alternates between C D and D C yielding the deviant a discounted average payoff in the subgame of 1y 66 Thus we need 12 yg or equivalently 5 g 1 y 1 14 y x x Thus a strategy pair titfortat titfortat is SPNE if and only if 5 and 5 or equivalently y x 1 and 5 lx But this condition is very strong this is barely met Folk theorems The Nash equilibria of an infinitely repeated Prisoner s Dilemma that we have discussed so far generate either the outcome of C C in every period or the outcome D D in every period The rst outcome path yields the discounted average of X to each player whereas the second one yields the discounted average of l to each player What other discounted average payoffs are generated by Nash equilibria We first note that for any outcome al a2 of the stage game the outcome path in which al a2 occurs in every period yields the pair of discounted average payoffs u1ala2 u2 61102 Thus xx y 0 0 y and 11 are all feasible discounted average payoffs What are the feasible discounted payoffs if the outcome path altemates between C C C D D C D D This question is hard to answer for an arbitrary discount factor but easy to answer if a discount factor is close to 1 If a discount factor is close to l discounted average payoffs are close to the weighted averages of their payoffs to the outcomes where the weight is proportional to the number of times the outcome occurs Thus the set of feasible discounted average payoffs is a convex hull of xx y 0 0 y and 11 The discussion above determines the set of discounted average pairs of payoffs that can be achieved by some outcome path regardless of whether the path is generated by equilibrium What discounted average pairs of payoffs can be achieved in Nash equilibria of a repeated game We know that D D is a Nash equilibrium thus 1 1 can be achieved at a Nash equilibrium and each player s discounted average payoff must be at least this We now claim that the set of Nash equilibrium payoffs is essentially not otherwise restricted If the discount factor is close to 1 then every feasible pair of payoffs in which each player s 15 payoff is greater than 11 is close to the discounted average payoff pair of a Nash equilibrium 0 0 XX 11 11 We now summarize the result Theorem Nash folk theorem 1 For any 5 6 01 the discounted average payoff of each player is in any NE of Goo is at least ulDD 2 There eXists 5 6 01 such that for any 5 6 51 Goo has a NE in which the discounted average payoff of each player is strictly greater than u DD in the feasible set of payoffs We have a similar folk theorem for SPNE Theorem SPNE folk theorem Friedman 1 For any 5 6 01 the discounted average payoff of each player is in any SPNE of Goo is at least ulDD 2 There eXists 5 6 01 such that for any 56 51 Goo has a SPNE in which the discounted average payoff of each player is strictly greater than u DD Note The folk theorems here are stated in the context of Prisoner s Dilemma For a general repeated game similar folk theorems hold if we replace u DD with the minimaX payoff Also similar theorems hold even if we use the limit average as the payoff function 16

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.