Prisoner's Dilemma (Stanford Encyclopedia of Philosophy) Cite this entry Search the SEP • Advanced Search • Tools • RSS FeedTable of Contents• What's New• Archives• Projected ContentsEditorial Information• About the SEP• Editorial Board• How to Cite the SEP• Special CharactersSupport the SEPContact the SEP ©Metaphysics Research Lab,CSLI,Stanford University Open access to the SEP is made possible by a world-wide funding initiative. Please Read How You Can Help Keep the Encyclopedia FreePrisoner's DilemmaFirst published Thu Sep 4, 1997; substantive revision Mon Oct 22, 2007Tanya and Cinque have been arrested for robbing the Hibernia SavingsBank and placed in separate isolation cells. Both care much more abouttheir personal freedom than about the welfare of their accomplice. Aclever prosecutor makes the following offer to each. “You maychoose to confess or remain silent. If you confess and your accompliceremains silent I will drop all charges against you and use yourtestimony to ensure that your accomplice does serious time. Likewise,if your accomplice confesses while you remain silent, they will gofree while you do the time. If you both confess I get two convictions,but I'll see to it that you both get early parole. If you both remainsilent, I'll have to settle for token sentences on firearms possessioncharges. If you wish to confess, you must leave a note with thejailer before my return tomorrow morning.”The “dilemma” faced by the prisoners here is that,whatever the other does, each is better off confessing than remainingsilent. But the outcome obtained when both confess is worse for eachthan the outcome they would have obtained had both remained silent. Acommon view is that the puzzle illustrates a conflict betweenindividual and group rationality. A group whose members pursuerational self-interest may all end up worse off than a group whosemembers act contrary to rational self-interest. More generally, if thepayoffs are not assumed to represent self-interest, a group whosemembers rationally pursue any goals may all meet less success than ifthey had not rationally pursued their goals individually. A closelyrelated view is that the prisoner's dilemma game and its multi-playergeneralizations model familiar situations in which it is difficult toget rational, selfish agents to cooperate for their common good. Muchof the contemporary literature has focused on identifying conditionsunder which players would or should make the “cooperative”move corresponding to remaining silent. A slightly differentinterpretation takes the game to represent a choice between selfishbehavior and socially desirable altruism. The move corresponding toconfession benefits the actor, no matter what the other does, whilethe move corresponding to silence benefits the other player no matterwhat that player does. Benefiting oneself is not always wrong, ofcourse, and benefiting others at the expense of oneself is not alwaysmorally required, but in the prisoner's dilemma game both playersprefer the outcome with the altruistic moves to that with the selfishmoves. This observation has led David Gauthier and others to take thePrisoner's Dilemma to say something important about the nature ofmorality.Puzzles with the structure of the prisoner's dilemma were devised anddiscussed by Merrill Flood and Melvin Dresher in 1950, as part of theRand Corporation's investigations into game theory (which Rand pursuedbecause of possible applications to global nuclear strategy). Thetitle “prisoner's dilemma” and the version with prisonsentences as payoffs are due to Albert Tucker, who wanted to makeFlood and Dresher's ideas more accessible to an audience of Stanfordpsychologists. Although Flood and Dresher didn't themselves rush topublicize their ideas in external journal articles, the puzzleattracted widespread attention in a variety of disciplines. ChristianDonninger reports that “more than a thousand articles”about it were published in the sixties and seventies. A bibliography(Axelrod and D'Ambrosio) of writings between 1988 and 1994 thatpertain to Robert Axelrod's research on the subject lists 209entries. Since then the flow has shown no signs of abating.The sections below provide a variety of more precise characterizationsof the prisoner's dilemma, beginning with the narrowest and surveysome connections with similar games and some applications inphilosophy and elsewhere. Particular attention is paid to“evolutionary” versions of the game in which members of apopulation play one another repeatedly and those who get higherpayoffs “reproduce” more rapidly than those who get lowerpayoffs. ‘Prisoner's dilemma’ is abbreviated as‘PD’.1. Symmetric 2×2 PD With Ordinal Payoffs2. Asymmetry3. Multiple Moves4. Multiple Players5. Single Person Interpretations6. Cardinal Payoffs7. The PD with Replicas and Causal Decision Theory8. The Stag Hunt and the PD9. Asynchronous Moves10. Transparency11. Finite Iteration12. The Centipede and the Finite IPD13. Infinite Iteration14. Indefinite Iteration15. Iteration With Error16. Evolution17. Spatial PDs18. PDs and Social Networks19. Group Selection and the Haystack PDBibliographyOther Internet ResourcesRelated Entries1. Symmetric 2×2 PD With Ordinal PayoffsIn its simplest form the PD is a game described by the payoff matrix: CDCR, RS, TDT, SP, Psatisfying the following chain of inequalities: PD1)T>R>P>SThere are two players, Row and Column. Each has two possible moves,“cooperate” or “defect,” corresponding,respectively, to the options of remaining silent or confessing in theillustrative anecdote above. For each possible pair of moves, thepayoffs to Row and Column (in that order) are listed in theappropriate cell. R is the “reward” payoff thateach player receives if both cooperate. P is the“punishment” that each receives if both defect. Tis the “temptation” that each receives if he alone defectsand S is the “sucker” payoff that he receives ifhe alone cooperates. We assume here that the game is symmetric, i.e.,that the reward, punishment, temptation or sucker payoff is the samefor each player, and payoffs have only ordinal significance, i.e.,they indicate whether one payoff is better than another, but tell usnothing about how much better. It is now easy to see that we have thestructure of a dilemma like the one in the story. Suppose Columncooperates. Then Row gets R for cooperating and Tfor defecting, and so is better off defecting. Suppose Columndefects. Then Row gets S for cooperating and P fordefecting, and so is again better off defecting. The moveD for Row is said to strictly dominate themove C: whatever his opponent does, he is better offchoosing D than C. By symmetryD also strictly dominates C forColumn. Thus two “rational” players will defect andreceive a payoff of P, while two “irrational”players can cooperate and receive greater payoff R. Instandard treatments, game theory assumes rationality and commonknowledge. Each player is rational, knows the other is rational, knowsthat the other knows he is rational, etc. Each player also knows howthe other values the outcomes. But since D strictlydominates C for both players, the argument fordilemma here requires only that each player knows his own payoffs.(The argument remains valid, of course, under the stronger standardassumptions.) It is also worth noting that the outcome(D, D) of both players defecting isthe game's only strong nash equilibrium, i.e., it is the only outcomefrom which each player could only do worse by unilaterally changingits move. Flood and Dresher's interest in their dilemma seems to havestemmed from their view that it provided a counterexample to the claimthat the nash equilibria of a game constitute its natural“solutions”.If there can be “ties” in rankings of the payoffs,condition PD1 can be weakened without destroying the nature of thedilemma. For suppose that one of the following conditions obtains:PD2)T>R>P≥S, orT≥R>P>SThen, for each player, although D does not strictlydominate C, it still weakly dominates in thesense that each player always does at least as well, and sometimesbetter, by playing C. Under these conditions it stillseems rational to play D, which again results in thepayoff that neither player prefers. Let us call a game that meets PD2a weak PD. Note that in a weak PD that does not satisfy PD1mutual defection is no longer a nash equilibrium in the strong sensedefined above. It is still, however, the only nash equilibrium in theweaker sense, that neither player can improve its position byunilaterally changing its move. Again, one might suppose that if thereis a unique nash equilibrium of this weaker variety, rationalself-interested players would reach it.2. AsymmetryWithout assuming symmetry, the PD can be represented by usingsubscripts r and c for the payoffs to Row and Column. CDCRr, RcSr, TcDTr, ScPr, PcIf we assume that the payoffs are ordered as before for each player,i.e., thatTi>Ri>Pi>Siwhen i=r, c, then, as before,D is the strictly dominant move for both players, butthe outcome (D, D) of both playersmaking this move is worse for each than (C,C). The force of the dilemma can now also be feltunder weaker conditions, however. Consider the following three pairsof inequalities:PD3)a.Tr>Rr andPr>Srb.Tc>Rc andPc>Scc.Rr>Pr andRc>PcIf these conditions all obtain the argument for dilemma goes throughas before. Defection strictly dominates cooperation for each player,and (C, C) is strictly preferred byeach to (D, D). If one of the two> signs in each of the conditions a - c is replaced by a weakinequality sign ≥ we have a weak PD. D weaklydominates C for each player (i.e., Dis as good as C in all cases and better in some) and(C, C) weakly better than(D, D) (i.e., it is at least as goodfor both players and better for one). Since none of the clausesrequires comparisons between r's payoffs and c's, weneed not assume that > has any “interpersonal”significance.Now suppose we drop the first inequality of either a orb (but not both). A game that meets the resulting conditionsmight be termed a common knowledge PD. As long as each playerknows that the other is rational and each knows the other's orderingof payoffs, we still feel the force of the dilemma. For suppose aholds. Then D is the dominant move for Row. Column,knowing that Row is rational, knows that Row will defect, and so, bythe remaining inequality in b, will defecthimself. Similarly, if b holds Column will defect, and Row,realizing this, will defect herself. By c, the resulting(D, D) is again worse for both than(C, C).3. Multiple MovesSpeaking generally, one might say that a PD is a game in which a“cooperative” outcome obtainable only when every playerviolates rational self-interest is unanimously preferred to the“selfish” outcome obtained when every player adheres torational self-interest. We can characterize the selfish outcomeeither as the result of each player pursuing its dominant (stronglydominant) strategy, or as the unique weak (strong) nashequilibrium. In a two move game the two characterizations come to thesame thing — a dominant move pair is a unique equilibrium and aunique equilibrium is a dominant move pair. As the payoff matrix belowshows, however, the two notions diverge in a game with more than twomoves.CDNCR, RS, TT, SDT, SP, PR, SNS, TS, RS, SHere each player can choose “cooperate”,“defect” or “neither” and the payoffs areordered as before. Defection is no longer dominant, because eachplayer is better off choosing C thanD when the other chooses N. Nevertheless(D, D) is still the uniqueequilibrium. Let us label a game like this in which the selfishoutcome is the unique equilibrium an equilibrium PD, and onein which the selfish outcome is a pair of dominant moves adominance PD. As will be seen below, attempts to“solve” the PD by allowing conditional strategies cancreate multiple-move games that are themselves equilibrium PDs.4. Multiple PlayersMost of those who maintain that the PD illustrates something importantabout morality seem to believe that the basic structure of the game isreflected in situations that larger groups, perhaps entire societies,face. The most obvious generalization from the two-player to themany-player game would pay each player R if all cooperate,P if all defect, and, if some cooperate and some defect, itwould pay the cooperators S and the defectors T. Butit is unlikely that we face many situations of this structure.A common view is that a multi-player PD structure is reflected in whatGarret Hardin popularized as “the tragedy of the commons.”Each member of a group of neighboring farmers prefers to allow his cowto graze on the commons, rather than keeping it on his own inadequateland, but the commons will be rendered unsuitable for grazing if it isused by more than some threshold number use it. More generally, thereis some social benefit B that each member can achieve ifsufficiently many pay a cost C. We might represent the payoffmatrix as follows:more than n choose Cn or fewer choose CCC+BCDB0The cost C is assumed to be a negative number. The“temptation” here is to get the benefit without the cost,the reward is the benefit with the cost, the punishment is to getneither and the sucker payoff is to pay the cost without realizing thebenefit. So the payoffs are ordered B >(B+C) > 0 > C. As in the two-playergame, it appears that D strongly dominatesC for all players, and so rational players wouldchoose D and achieve 0, while preferring thateveryone would choose C and obtainC+B.Unlike the more straightforward generalization, this matrix doesreflect common social choices — between depleting and conservinga scarce resource, between using polluting and non-polluting means ofmanufacture or disposal, and between participating and notparticipating in a group effort towards some common goal. Whenn is small, it represents a version of what has been calledthe “volunteer dilemma”. A group needs a few volunteers,but each member is better off if others volunteer. (Notice, however,that in a true volunteer dilemma, where only one volunteer is needed,n is zero and the top right outcome is impossible. Underthese conditions D no longer dominatesC and the game loses its PD flavor.) A particularlyvexing manifestation of this game occurs when a vaccination known tohave serious risks is needed to prevent the outbreak of a fataldisease. If enough of her neighbors get the vaccine, each person maybe protected without assuming the risks.The tragedy of the commons game diagramed above has a somewhatdifferent character than the two-player PD. First, even if eachplayer's moves are entirely independent of the others, thealternatives represented by the columns in the commons matrix aboveare no longer independent of the alternatives represented by therows. My choosing C necessarily increases the chancesthat more than n people will choose C. Toensure independence we should really redraw the matrix as follows:more than n others chooseCn others choose Cfewer than n others chooseCCC+BC+BCDB00But now we see that move D does not dominateC. When we are at the threshold of adequatecooperation, where exactly n others choose C, I ambetter off cooperating. Provided that n is large, however, itwould seem that this effect could be ignored and we could assume, forpractical purposes, that the payoff matrix is like the previous one. Similarly, whereas we saw in the original PD that mutual defection wasthe only nash equilibrium, this game has two equilibria. One isuniversal defection, since any player unilaterally departing from thatoutcome will move from payoff 0 to C. But a second is thestate of minimally effective cooperation, where the number ofcooperators just exceeds the threshold. A defector who unilaterallydeparts from that outcome will move from B toB+C and a cooperator who unilaterally departs willmove from B+C to 0. This might suggest that thetragedy of the commons is less tragic than the PD, but in real lifesituations, it would seem unlikely that the participants would knowwhen they are at the equilibrium point of minimally effectivecooperation.Furthermore, in the ordinary PD, universal cooperation is apareto-optimal outcome, i.e., there is no outcome in which each playeris at least as well off and some are better off. But in the commonsgame the only pareto optimal outcomes are those of minimally effectivecooperation. Whether universal cooperation is nevertheless desirablemay depend on the nature of the choices involved. In the medicalexample it may seem best to vaccinate everyone. In the agriculturalexample, however it seems foolish to stipulate that nobody use thecommons. Someone who avoids vaccination in the former case is seen asa “free rider”. An underused commons in the latter seemsto exemplify “surplus cooperation.”The two-person version of the tragedy of the commons game (withthreshold of one) produces a matrix presenting considerably less of adilemma.CDCB+C, B+CC,0D0, C0,0This game captures David Hume's example of a boat with one oarsman onthe port side and another on the starboard (provided we assume thatHume's oarsmen must make their choices between rest and exertionsimultaneously). Mutual cooperation is identical to minimally effectivecooperation and therefore is both an equilibrium outcome and a paretooptimal outcome. Games of this sort will be discussed below under thelabel “Stag Hunt.” The above representations of the tragedy of the commons make thesimplifying assumptions that the costs and benefits of cooperation arethe same for each player, that the cost of cooperation is independentof the number of players who cooperate, and that the size of thebenefit (0 or B) depends only on whether the number ofcooperators exceeds the threshold. A somewhat more general accountwould replace C and B by functionsC(i, j) and B(i,j) representing the cost of cooperation to player iwhen he is one of exactly j players who cooperate and thebenefit to player i when exactly j playerscooperate. We suppose that there is some threshold t forminimally effective cooperation so that B(i,j) is not defined unless j>t. We mayalso assume additional cooperation never reduces the benefiti gets from effective cooperation, i.e.,B(i, j+1)≥B(i,j) when j>t and that additionaldefection never reduces the cost i bears in cooperating,i.e., C(i, j+1)≥C(i,j). Now suppose, in addition, that, once the threshold ofeffective cooperation has been exceeded, any benefit one gets fromfrom the presence of an additional cooperator is exceeded by one'scost of cooperation and that the costs of ineffective cooperation aregenuine, i.e., for all players i, B(i,j)>(B(i,j+1)+C(i, j+1)) when j isgreater than t and 0>C(i, j)when j is less than or equal to t. Finally, supposethat the benefits to each player i, of effective cooperationexceed the costs, i.e., for j>t,B(i, j)+C(i,j)>0 We then have a tragedy of the commons game, whichpresents a familiar dilemma: defection benefits an individual in everycircumstance (except the one where exactly t otherscooperate) but everybody is better off in any state of effectivecooperation than in any state without it. This account could beeasily be modified to allow threshold of minimally effectivecooperation to differ from one individual to another (i'sclean water requirements might be more stringent than j's forexample) or to allow B to be defined everywhere (thuseliminating the threshold, so that we always benefit from another'scooperation). The resulting game would still have its PD flavor.Phillip Pettit has pointed out that examples that might be representedas many-player PDs come in two flavors. The examples discussed abovemight be classified as free-rider problems. My temptation is to enjoysome benefits brought about by burdens shouldered by others. The otherflavor is what Pettit calls “foul dealer” problems. Mytemptation is to benefit myself by hurting others. Suppose, forexample, that a group of people are applying for a single job, forwhich they are equally qualified. If all fill out their applicationshonestly, they all have an equal chance of being hired. If one lies,however, he can ensure that he is hired while, let us say, incurring asmall risk of being exposed later. If everyone lies, they again havean equal chance for the job, but now they all incur the risk ofexposure. Thus a lone liar, by reducing the others' chances ofemployment from slim to none, raises his own chances from slim tosure. As Pettit points out, when the minimally effective level ofcooperation is the same as the size of the population, there is noopportunity for free-riding (everyone's cooperation is needed), and sothe PD must be of the foul-dealing variety. But (Pettit's contraryclaim notwithstanding) not all foul-dealing PDs seem to have thisfeature. Suppose, for example, that two applicants in the story abovewill be hired. Then everyone gets the benefit (a chance of employmentwithout risk of exposure) unless two or more playerslie. Nevertheless, the liars seem to be foul dealers rather than freeriders. A better characterization of the foul-dealing dilemma might bethat every defection from a generally cooperative state strictlyreduces the payoffs to the cooperators, i.e., for every playeri and every j greater than the threshold, eitherB(i, j+1)+ C(i,j+1)>B(i,j)+C(i, j). A free-rider'sdefection benefits himself but does not, by itself, hurt thecooperators. A foul-dealer's defection benefits himself and hurts thecooperators.The game labeled a many-person PD in Schelling, Molander 1992 andelsewhere requires that the payoff to each co-operator and defectorincreases strictly with the number of cooperators and that the sum ofthe payoffs to all parties increases with the number of cooperators(so that one party's switching from defection to cooperation alwaysraises the sum). Neither of these conditions is met by the formulationand the examples discussed above. They may, however, hold“locally,” i.e., for j close to the thresholdt for minimally effective cooperation, it may be reasonableto assume that:for every individual i, B(i, j+1)+C(i, j+1)>B(i, j)+C(i, j) forj>t,for every individual i, C(i, j+1)>C(i, j) for j≤t, andB(1, j+1)+C(1, j+1)+…+B(j+1, j+1)+C(t+1, j+1)+B(j+2, j+1))+…+B(n, j+1)> B(1, j)+C(1, j)+…+B(j, j)+C(j, j)+B(j+1, j)+…+B(n, j).By requiring that cooperation of others always benefits each and all,the Schelling and Molander formulations of the n-person PDfail to model the surplus cooperation/free rider phenomenon that seemsto infuse “Tragedy of the Commons” situations. Theirconditions might, however be a plausible model for certain publicgood dilemmas. It is not unreasonable to suppose thatany contribution towards public health, national defense,highway safety, or clean air is valuable to everybody, no matter howlittle or how much we already have, but that the cost to eachindividual for his own contribution to those goods always exceeds thebenefit that he derives from that contribution. This outlook has theadvantage of focusing attention on the PD quality of thegame. Defection dominates cooperation, while universal cooperation isunanimously preferred to universal defection. Michael Taylor goeseven further in this direction. His version of the many-person PDrequires only the two PD-conditions just mentioned and the oneadditional condition that defectors are always better off when somecooperate than when none do. (Taylor's main concern is with theiterated version of this game, a topic that will be addressed infuture editions of this entry.)5. Single Person InterpretationsThe PD is usually thought to illustrate conflict between individualand collective rationality, but the multiple player form (or somethingvery similar) has also been interpreted as demonstrating problemswithin standard conceptions of individual rationality. One suchinterpretation, elucidated in Quinn, derives from an example ofParfit's. A medical device enables electric current to be applied to apatient's body in increments so tiny that there is no perceivabledifference between adjacent settings. You are attached to the deviceand given the following choice every day for ten years: advance thedevice one setting and collect a thousand dollars, or leave it whereit is and get nothing. Since there is no perceivable differencebetween adjacent settings, it is apparently rational to advance thesetting each day. But at the end of ten years the pain is so greatthat a rational person would sacrifice all his wealth to return to thefirst setting.We can view the situation here as a multi-player PD in which each“player” is the temporal stage of a single person. Soviewed, it has at least two features that were not discussed inconnection with the multi-player examples. First, the moves of theplayers are sequential rather than simultaneous (and each player hasknowledge of preceding moves). Second, there is the matter ofgradation. Increases in electric current between adjacent settings areimperceptible, and therefore irrelevant to rational decision-making,but sums of a number such increases are noticeable and highlyrelevant. Neither of these features, however, is peculiar toone-person examples. Consider, for example, the choice between apolluting and non-polluting means of waste disposal. Each resident ofa lakeside community may dump his or her garbage in the lake or use aless convenient landfill. It is reasonable to suppose that each actsin the knowledge of how others have acted before. (See“Asynchronous Moves” below.) It is also reasonable tosuppose that addition of one can of garbage to the lake has noperceptible effect on water quality, and therefore no effect on thewelfare of the residents. The fact that the dilemma remains suggeststhat PD-like situations sometimes involve something more than aconflict between individual and collective rationality. In theone-person example, our understanding that we care more about ouroverall well-being than that of our temporal stages does not (byitself) eliminate the argument that it is rational to continue toadjust the setting. Similarly, in the pollution example, a decision tolet collective rationality override individual rationality may noteliminate the argument for excessive dumping. It seems appropriate,however, to separate this issue from that raised in the standard PD.Gradations that are imperceptible individually, but weighty en massegive rise to intransitive preferences. This is a challenge to standardaccounts of rationality whether or not it arises in a PD-likesetting.A second one-person interpretation of the PD is suggested in Kavka,1991. On Kavka's interpretation, the prisoners are not temporalstages, but rather “subagents” reflecting differentdesiderata that I might bring to bear on a decision. Let us imaginethat I am hungry and considering buying a snack. The options open tome are:Buy a scoop of chocolate gelato.Buy a scoop of orange sherbet.Buy a granola bar.Buy nothing.My health-conscious side, “Arnold,” orders these optionsin the following order: c, b, d,a. My taste-conscious side, “Eppie,” ranks them:a, b, d, c. Such inner conflictamong preferences might often be resolved in ways consistent withstandard views about individual choice. My overall preferenceordering, for example, might be determined from a weighted average ofthe utilities that Arnold and Eppie assign to each of the options. Itis also possible, Kavka suggests, that my inner conflicts are resolvedas if they were a result of strategic interaction among rationalsubagents. In this case, Arnold and Eppie can each choose either toinsist on getting their way (I) or toacquiesce to a compromise (A). Theinteraction between subagents can then be represented by the followingpayoff matrix, where Arnold plays row and Eppie plays column.AIAbaIcdExamination of the table and preference orderings confirms that weagain have an intrapersonal PD. Kavka argues that a story like thismight “provide a psychologically plausible picture of howinternal conflict can lead to suboptimal action.” It alsoundermines a standard view that choices reflect values in favor of onethat they partially reflect, “the structure of innerconflict”.6. Cardinal PayoffsIf the game specifies absolute (as opposed to relative) payoffs, thenuniversal cooperation may not be a pareto optimal outcome even in thetwo person PD. For under some conditions both players do better byadopting a mixed strategy of cooperating with probabilityp and defecting with probability (1-p). This pointis illustrated in the graphs below. Figure 1Here the x and y axes represent the utilities of Rowand Column. The four outcomes entered in the matrix of the secondsection are represented by the labeled dots. Conditions PD3a and PD3bensure that (C, D) and(D, C) lie northwest and southeastof (D, D), and PD3c is reflected inthe fact that (C, C) lies northeastof (D, D). Suppose first that(D, D) and (C,C) lie on opposite sides of the line between(C, D) and (D,C), as in the graph on the left. Then the fourpoints form a convex quadrilateral, and the payoffs of the feasibleoutcomes of mixed strategies are represented by all the points on orwithin this quadrilateral. Of course a player can really only get oneof four possible payoffs each time the game is played, but the pointsin the quadrilateral represent the expected values of thepayoffs to the two players. If Row and Column cooperate withprobabilities p and q (and defect with probabilitiesp*=1-p and q*=1-q), for example,then the expected value of the payoff to Row isp*qT+pqR+p*q*P+pq*S. Arational self-interested player, according to a standard view, shouldprefer a higher expected payoff to a lower one. In the graph on theleft the payoff for universal cooperation (with probability one) ispareto optimal among the payoffs for all mixed strategies. In thegraph on the right, however, where both (D,D) and (C, C) liesouthwest of the line between (C, D)and (D, C), the story is morecomplicated. Here the payoffs of the feasible outcome lie within afigure bounded on the northeast by three distinct curve segments, twolinear and one concave. Notice that (C,C) is now in the interior of the region bounded bysolid lines, indicating that there are mixed strategies that provideboth players a higher expected payoff than (C,C). It is important to note that we are talking aboutindependent mixed strategies here. Row and Column use privaterandomizing devices and have no communication. If they were able tocorrelate their mixed strategies, so as to ensure, say(C, D) with probability pand (D, C) with probabilityp*, the set of feasible solutions would extend up to (andinclude) the dotted line between (C,D) and (D, C). Thepoint here is that, even confined to independent strategies, there aresome games satisfying PD3 in which both players can both do betterthan they do with universal cooperation. A PD in which universalcooperation is pareto optimal may be called a pure PD. (Thisphenomenon is identified in Kuhn and Moresi and applied to moralphilosophy in Kuhn 1996.) A pure PD is characterized by adding to PD3the following condition.P)(Tr-Rr)(Tc-Rc)≤(Rr-Sr)(Rc-Sc)In a symmetric game P reduces to the simpler condition RCA)R≥1/2(T+S)(named after the authors Rapoport, Chammah and Axelrod who employedit). 7. The PD with Replicas and Causal Decision TheoryOne controversial argument that it is rational to cooperate in a PDrelies on the observation that my partner in crime is likely to thinkand act very much like I do. (See, for example, Davis 1977 and 1985for a sympathetic presentation of one such argument and Binmore1994,Chapters 3.4 and 3.5, for a reformulation and extended rebuttal.)In the extreme case, my accomplice is an exact replica of mewho is wired just as I am so that, of necessity, we do the same thing.It would then seem that the only two possible outcomes are where bothplayers cooperate and where both players defect. Since the rewardpayoff exceeds the punishment payoff, I should cooperate. Moregenerally, even if my accomplice is not a perfect replica, the odds ofhis cooperating are greater if I cooperate and the odds of hisdefecting are greater if I defect. When the correlation between ourbehaviors is sufficiently strong or the differences in payoffs issufficiently great, my expected payoff (as that term isusually understood) is higher if I cooperate than if I defect. Thecounter argument, of course, is that my action is causallyindependent of my replica's. Since I can't affect what myaccomplice does and since, whatever he does, my payoff is greater if Idefect, I should defect. These arguments closely resemble thearguments for two positions on the Newcomb Problem, a puzzlepopularized among philosophers in Nozick. (The extent of theresemblance is made apparent in Lewis.) The Newcomb Problem asks us toconsider two boxes, one transparent and one opaque. In the transparentbox we can see a thousand dollars. The opaque box may contain either amillion dollars or nothing. We have two choices: take the contents ofthe opaque box or take the contents of both boxes. We know beforechoosing that a reliable predictor of our behavior has put a milliondollars in the opaque box if he predicted we would take the firstchoice and left it empty if he predicted we would take the second. Tosee that each player in a PD faces a Newcomb problem, consider thefollowing payoff matrix.CDCm, m0, m+tDm+t,0t, tBy “cooperating” (choosing the opaque box), each playerensures that the other gets a million dollars (and a thousand extrafor defecting). By “defecting” (choosing both boxes) eachplayer ensures that he will get thousand dollars himself (and amillion more if the other cooperates). As long asm>t>0, the structure of this game is anordinary two-player, two-move PD (and any such PD can be representedin this form). Furthermore, the arguments for “one-boxing”and “two-boxing” in a Newcomb problem are the same as thearguments for cooperating and defecting in a Prisoner's dilemma wherethere is positive correlation between the moves of the players. Twoboxing is a dominant strategy: two boxes are better than onewhether the first one is full or empty. On the other hand, if thepredictor is reliable, the expected payoff for one-boxing isgreater than the expected payoff for two-boxing. [See Hurley, however,for an argument that the two puzzles are significantly different. In aPD (of either the ordinary or the Newcombized variety) each playerknows that the other is rational and that the other ranks the outcomesin the ways described. This, Hurley argues, opens a possibility forcooperative joint action that is absent in the original Newcombproblem.]The intuition that two-boxing is the rational choice in a Newcombproblem, or that defection is the rational choice in the PD withpositive correlation between the players' moves, seems to conflictwith the idea that rationality requires maximizing expectation. Thisapparent conflict has led some to suggest that standard decisiontheory needs to be refined in cases in which an agent's actionsprovide evidence for, without causing, the contextin which he is acting. In the case of the PD, standard (evidential)decision theory asks Player One to compare his expected utilities ofcooperation and defection, which can be written asp(C2|C1)×R+p(D2|C1)×Sandp(C2|D1)×T+p(D2|D1)×P(where, for example,p(C2|C1)is the conditional probability that player Two cooperates given thatPlayer One cooperates). If the players' moves are strongly correlatedthenp(C2|C1)andp(D2|D1)willbe close to one andp(C2|D1)andp(D2|C1)will be close to zero. On the suggested revision, these conditionalprobabilities should be replaced by some kind of causally conditionalprobabilities, which might (on some accounts) be expressed by phraseslike “the probability that if One were to cooperate, Two wouldalso cooperate.” When the moves are causality independent thiswould just be the probability that Two cooperates.The rather far-fetched scenario described in Newcomb's Probleminitially led some to doubt the importance of the distinction betweencausal and evidential decision theory. Lewis argues that the link tothe PD suggests that situations where the two decisions diverge arenot so unusual, and recent writings on causal decision theory containmany examples far less bizarre than Newcomb's problem. (See Joyce, forexample.)It might be noted that what is here called “PD betweenreplicas” is usually called “PD with twins” in theliterature. One reason for the change in nomenclature is todistinguish these ideas from an experimental literature reporting onPD games played with real (identical or fraternal) twins. (See, forexample, Segal and Hershberger.) It turns out that twins aremore likely to cooperate in a PD than strangers, but there seems to beno suggestion that the reasoning that leads them to do so follows thecontroversial arguments presented above.8. The Stag Hunt and the PD The idea mentioned in the introduction that the PD models a problemof cooperation among rational agents is sometimes criticized because,in a true PD, the cooperative outcome is not a nash equilibrium. Any“problem” of this nature, the critics contend, would be anunsolvable one. (See for example, Sugden or Binmore 2005, Chapter4.5.) By changing the payoff structure of the PD slightly, so that thereward payoff exceeds the temptation payoff, we obtain a game wheremutual cooperation, as well as mutual defection, is a nashequilibrium. This game is known as the Stag Hunt. It might provide abetter model for situations where cooperation is difficult, but stillpossible, and it may also be a better fit for other roles sometimesassigned to the PD. More specifically, a Stag Hunt is a two player,two move game with a payoff matrix like that for the PD pictured abovewhere the conditions PD1 are replaced: SH)a.R>Tb.R>Pc.P>S.The fable dramatizing the game and providing its name, gleaned from apassage in Rousseau's Discourse on Inequality, concerns ahunting expedition rather than a jail cell interrogation. Two huntersare are looking to bag a stag. Success is uncertain and, if it comes,require the efforts of both. On the other hand, either hunter canforsake his partner and catch a hare with a good chance of success. Atypical payoff matrix is shown below.CDC4,40,3D3,03,3Here the “cooperative” move is hunting stag with one'spartner and “defection” is hunting hare by oneself. The“temptation” payoff in a Stag Hunt is no longer much of atemptation, but we retain the payoff terminology for ease ofexposition. In this case the temptation and punishment penalties areidentical, perhaps reflecting the fact that my partner's actions haveno effect on my hunt for hare. Alternatively we could have temptationexceeding punishment, perhaps because hunting hare is more rewardingtogether than alone (though still less rewarding, of course, thanhunting stag together), or we could have punishment exceedingtemptation, perhaps because a second hare hunter represents unhelpfulcompetition. Either way, the essence of the Stag Hunt remains. Thereare two equilibria, one unanimously preferred to the other. The StagHunt becomes a “dilemma” when rationality dictates thatboth players choose the action leading to the inferior equilibrium. Itis clear that if I am certain that my partner will hunt stag I shouldjoin him and that if I am certain that he will hunt hare I should hunthare as well. For this reason games with this structure are sometimescalled games of “assurance” or “trust.” If Ido not know what my partner will do, standard decision theory tells meto maximize expectation. This requires, however, that I estimate theprobability of my partner playing C orD. If I lack information to form any such estimates,then one putative principle of rationality(“indifference”) suggests that I ought to treat alloptions as equally likely. By this criterion I ought to hunt hare ifand only if the following condition is met:SHD) T+P>R+SWhen SHD obtains, hare hunting is said to be the “risk-dominant”equilibrium. Let us call a Stag Hunt game where this condition is met aStag Hunt Dilemma. The matrix above provides one example.Another proposed principle of rationality (“maximin”)suggests that I ought to consider the worst payoff I could obtainunder any course of action, and choose that action that maximizes thisvalue. Since the sucker payoff is the worst payoff in a Stag Hunt,this principle suggests that any Stag Hunt presents adilemma. Maximin, however, makes more sense as a principle ofrationality for zero sum games, where it can be assumed that arational opponent is trying to minimize my score, than for games likeStag Hunt, where a rational opponent may be quite happy to see me dowell, as long as he does so as well.The Stag Hunt can be generalized in the obvious way to accommodateasymmetric and cardinal payoffs. The quadrilateral formed by thegames' graphical representation is convex, so the pure/impuredistinction no longer applies. (I.e., in a Stag Hunt no mixedstrategies are ever preferred to mutual cooperation.) The most obviousway to generalize the game to many players would retain the conditionthat there be exactly two equilibria, one unanimously preferred to theother. This might be a good model for cooperative activity in whichsuccess requires full cooperation. Imagine, for example, that a singlepolluter would spoil a lake, or a single leak would thwart aninvestigation. If many agents are involved and, by appeal toindifference or for other reasons, we estimate a fifty-fifty chance ofcooperation from each, then these examples would represent Stag HuntDilemmas in an extreme form. Everyone would benefit if all cooperate,but only a very trusting fool would think it rational to cooperatehimself. Perhaps some broader generalization to the many-person casewould also be useful, but that matter will not be taken up here.The cooperative outcome in the Stag Hunt can be assured by many of thesame means as are discussed here for the PD. As might be expected,cooperation is somewhat easier to come by in the two person Stag Huntthan in the two person PD. Details will not be given here, but theinterested reader may consult Skyrms 2004, which is responsible for arecent resurgence of interest in this game.9. Asynchronous MovesIt has often been argued that rational self-interested players canobtain the cooperative outcome by making their moves conditional onthe moves of the other player. Peter Danielson, for example, favors astrategy of reciprocal cooperation: if the other player wouldcooperate if you cooperate and would defect if you don't, thencooperate, but otherwise defect. Conditional strategies like this areruled out in the versions of the game described above, but they may bepossible in versions that more accurately model real world situations.In this section and the next, we consider two such versions. Here weeliminate the requirement that the two players move simultaneously.Consider the situation of a firm whose sole competitor has justlowered prices. Or suppose the buyer of a car has just paid the agreedpurchase price and the seller has not yet handed over the title. Wecan think of these as situations in which one player has to choose tocooperate or defect after the other player has already made a similarchoice. The corresponding game is an asynchronous orextended PD.Careful discussion of an asynchronous PD example, as Skyrms (1998) andVanderschraaf recently note, occurs in the writings of David Hume,well before Flood and Dresher's formulation of the ordinary PD. Humewrites about two neighboring grain farmers:Your corn is ripe today; mine will be so tomorrow.’Tis profitable for us both, that I shou'd labour with youto-day, and that you shou'd aid me to-morrow. I have no kindness foryou, and know you have as little for me. I will not, therefore, takeany pains on your account; and should I labour with you upon my ownaccount, in expectation of a return, I know I shou'd be disappointed,and that I shou'd in vain depend upon your gratitude. Here then Ileave you to labour alone: You treat me in the same manner. Theseasons change; and both of us lose our harvests for want of mutualconfidence and security.In deference to Hume, Skyrms and Vanderschraaf refer to this kind ofasynchronous PD as the “farmer's dilemma.” It isinstructive to picture it in a tree diagram. Figure 2Here, time flows to the right. The node marked by a square indicatesPlayer One's choice point, those marked by circles indicate PlayerTwo's. The moves and the payoffs to each player are exactly as in theordinary PD, but here Player Two can choose his move according to whatPlayer One does. Tree diagrams like Figure 2 are said to beextensive-form game representations, whereas the payoffmatrices given previously are normal-form representations. AsHume's analysis indicates, making the game asynchronous does notremove the dilemma. Player One knows that if he were to chooseC on the first move, Player Two would chooseD on the second move (since she prefers thetemptation to the reward), so he would himself end up with the suckerpayoff. If Player One were to choose D, Player Twowould still choose D (since she prefers thepunishment to the sucker payoff), and he would end up with thepunishment payoff. Since he prefers the punishment payoff to thesucker payoff, Player One will choose D on the firstmove and both players will end up with the punishment payoff. Thiskind of “backward” reasoning, in which the players firstevaluate what would happen on the last move if various game historieswere realized, and use this to determine what would happen onpreceding moves applies quite broadly to games in extensive form, anda more general version of it will be discussed under finiteiteration below.The farmer's dilemma can be represented in normal form byunderstanding Player One to be choosing between C andD and Player Two to be (simultaneously) choosingamong four conditional moves: cooperate unconditionally(Cu), defect unconditionally (Du),imitate Player One's move (I), and do the opposite ofPlayer One's move (O). The result is a two playergame with the following matrix.CuDuIOCR, RS, TR, RS, TDT, SP, PP, PT, SThe reader may note that this game is a (multiple-move) equilibriumdilemma. The sole (weak) nash equilibrium results when Player Onechooses D and Player Two chooses Du,thereby achieving for themselves the inferior payoffs of Pand P. The game is not, however, a dominance PD. Indeed,there is no dominant move for either player. It is commonly believedthat rational self-interested players will reach a nash equilibriumeven when neither player has a dominant move. If so, the farmer'sdilemma is still a dilemma.To preserve the symmetry between the players that characterizes theordinary PD, we may wish to modify the asynchronous game. Let us takeextended PD to be played in stages. First each player chooses a firstmove (C or D) and a second move (Cu, Du, I orO). Next a referee determines who moves first, givingeach player an equal chance. Finally the outcome is computed in theappropriate way. For example, suppose Row plays (D,O) (meaning that he will defect if he moves first anddo the opposite of his opponent if he moves second) and Column plays(C, Du). Then Row will getP if he goes first and T if he goes second, whichimplies that his expected payoff is 1/2(P+T). Columnwill get S if she goes first and P if she goessecond, giving her an expected payoff of1/2(P+S). It is straightforward, but tedious, tocalculate the entire eight by eight payoff matrix. After doing so, thereader may observe that, like the farmer's dilemma, the symmetric formof the extended PD is an equilibrium PD, but not a dominance PD. Thesole nash equilibrium occurs when both players adopt the strategy(D, Du), thereby achieving theinferior payoffs of (P, P).It may be worth noting that an asynchronous version of the Stag Hunt,unlike the PD, presents few issues of interest. If the first playerdoes his part in the hunt for stag on day one, the second should doher part on day two. If he hunts hare on day one, she should dolikewise on day two. The first player, realizing this, should huntstag on day one. So rational players should have no difficultyreaching the cooperative outcome in the asynchronous Stag Hunt.10. TransparencyAnother way that conditional moves can be introduced into the PD is byassuming that players have the property that David Gauthier haslabeled transparency. A fully transparent player is one whoseintentions are completely visible to others. Nobody holds that wehumans are fully transparent, but the observation that we can oftensuccessfully predict what others will do suggests that we are at least“translucent.” Furthermore agents of larger scale, likefirms or countries, which may have to publicly deliberate beforeacting, may be more transparent than we are. Thus there may be sometheoretical interest in investigations of PDs with transparentplayers. Such players could presumably execute conditional strategiesmore sophisticated than those of the (non- transparent) extended gameplayers, strategies, for example that are conditional on theconditional strategies employed by others. There is some difficulty,however, in determining exactly what strategies are feasible for suchplayers. Suppose Row adopted the strategy “do the same asColumn” and Column adopted the strategy “do the oppositeof Row”. There is no way that both these strategies could besatisfied. On the other hand, if each adopted the strategy“imitate the other player”, there are two ways thestrategies could be satisfied, and there is no way to determine whichof the two they would adopt. Nigel Howard, who was probably the firstto study such conditional strategies systematically, avoided thisdifficulty by insisting on a rigidly typed hierarchy of games. At thebase level we have the ordinary PD game, where each player choosesbetween C and D. For any gameG in the hierarchy we can generate two new gamesRG and CG. InRG, Column has the same moves as in game Gand Row can choose any function that assigns C orD to each of Column's possible moves. Similarly inCG, Row has the same moves as in G andColumn has a new set of conditional moves. For example, if [PD] is thebase level game, then C[PD] is the game in which Column canchoose from among the strategies Cu,Du, I and Omentioned above. Howard observed that in the two third level gamesRC[PD] and CR[PD] (and in everyhigher level game) there is an equilibrium outcome giving each playerR. In particular, such an equilibrium is reached when oneplayer plays I and the other cooperates when hisopponent plays I and defects when his opponent playsCu, Du or O. Noticethat this last strategy is tantamount to Danielson's reciprocalcooperation.The lesson of all this for rational action is not clear. Suppose twoplayers in a PD were sufficiently transparent to employ theconditional strategies of higher level games. How do they decide whatlevel game to play? Who chooses the imitation move and who choosesreciprocal cooperation? To make a move in a higher level game ispresumably to form an intention observable by the other player. Butwhy should either player expect the intention to be carried out ifthere is benefit in ignoring it?Conditional strategies have a more convincing application when we takeour inquiry as directed, not towards playing the PD, but as designingagents who would play it well with a variety of likely opponents. Thisis the viewpoint of Danielson. (See also J.V. Howard for an earlierenlightening discussion of this viewpoint.) A conditional strategy isnot an intention that a player forms as a move in a game, but adeterministic algorithm defining a kind of player. Indeed, one of thelessons of the PD may be that transparent agents are better off ifthey can form irrevocable “action protocols” rather thanalways following the intentions they may form at the time of action.Danielson does not limit himself a priori to strategieswithin Howard's hierarchy. An agent is simply a computer program,which can contain lines permitting other programs to read and executeit. We could easily write two such programs, each designed todetermine whether its opponent plays C orD and to do the opposite. What happens when these twoplay a PD depends on the details of implementation, but it is likelythat they will be “incoherent,” i.e., they will enterendless loops and be unable to make any move at all. To be successfula program should be able to move when paired with a varietyof other programs, including copies of itself, and it should be ableto get valuable outcomes. Programs implementing I andO in a straightforward way are not likely to succeedbecause when paired with each other they will be incoherent. Programsimplementing Du are not likely to succeed becausethey get only P when paired with their clones. Thoseimplementing Cu are not likely to succeed becausethey get only S when paired with programs that recognize andexploit their unconditionally cooperative nature. There is somevagueness in the criteria of success. In Howard's scheme we couldcompare a conditional strategy with all the possible alternatives ofthat level. Here, where any two programs can be paired, that approachis senseless. Nevertheless, certain programs seem to do well whenpaired with a wide variety of players. One is a version of thestrategy that Gauthier has advocated as constrainedmaximization. The idea is that a player j shouldcooperate if the other would cooperate if j did, and defectotherwise. As stated, this appears to be a strategy for theRC[PD] or CR[PD] games. It is notclear how a program implementing it would move (if indeed it doesmove) when paired with itself. Danielson is able to construct anapproximation to constrained maximization, however, that doescooperate with itself. Danielson's program (and other implementationsof constrained maximization) cannot be coherently paired witheverything. Nevertheless it does move and score well against familiarstrategies. It cooperates with Cu and itself and itdefects against Du. If it is coherently paired itseems guaranteed a payoff no worse than P.A second successful program models Danielson's reciprocalcooperation. Again, it is not clear that the strategy (asformulated above) allows it to cooperate (or make any move) withitself, but Danielson is able to construct an approximation that does.The (approximate) reciprocal cooperation does as well as(approximate) constrained maximization against itself,Du and constrained maximization. AgainstCu it does even better, getting T whereconstrained maximization got only R.11. Finite IterationMany of the situations that are alleged to have the structure of thePD, like defense appropriations of military rivals or price settingfor duopolistic firms are better modeled by an iterated version of thegame in which players play the PD repeatedly, retaining access at eachround to the results of all previous rounds. In these iterated PDs(hence forth IPDs) players who defect in one round can be“punished” by defections in subsequent rounds and thosewho cooperate can be rewarded by cooperation. Thus the appropriatestrategy for rationally self-interested players is no longerobvious. The theoretical answer to this question, it turns out,depends strongly on the definition of IPD employed and the knowledgeattributed to rational players.An IPD can be represented in extensive form by a tree diagram likethe one for the farmer's dilemma above. Figure 3Here we have an IPD of length two. The end of each of the two roundsof the game is marked by a dotted vertical line. The payoffs to eachof the two players (obtained by adding their payoffs for the tworounds) are listed at the end of each path through the tree. Therepresentation differs from the previous one in that the two nodes oneach branch within the same division mark simultaneous choices by thetwo players. Since neither player knows the move of the other at thesame round, the IPD does not qualify as one of the game theorist'sstandard “games of perfect information.” If the playersmove in succession rather than simultaneously (which we might indicateby removing the dotted vertical lines), the resulting game is aniterated farmer's dilemma, which does meet the game theorist'sdefinition and which shares many of the features that make the IPDinteresting.Like the farmer's dilemma, an IPD can, in theory, be represented innormal form by taking the players' moves to be strategiestelling them how to move if they should reach any node at the end of around of the game tree. The number of strategies increases veryrapidly with the length of the game so that it is impossible inpractice to write out the normal form for all but the shortestIPD's. Every pair of strategies determines a “play” of thegame, i.e., a path through the extensive-form tree.In a game like this, the notion of nash equilibrium loses some of itsprivileged status. Recall that a pair of moves is a nash equilibriumif each is a best reply to the other. Let us extend the notation usedin the discussion of the asynchronous PD and let Dube the strategy that calls for defection at every node of an IPD. Itis easy to see that Du and Du form anash equilibrium. But against Du, a strategy thatcalls for defection unless the other player cooperated at, say, thefifteenth node, would determine the same play (and therefore the samepayoffs) as Du itself does. The components that callfor cooperation never come into play, because the other player doesnot cooperate on the fifteenth (or any other) move. Similarly, astrategy calling for cooperation after the second cooperation byitself does equally well. Thus these strategies and many others formnash equilibria with Du. There is a sense in whichthese strategies are clearly not equally rational. Although they yieldthe same payoffs at the nodes along the path representing the actualplay, they would not yield the same payoffs if other nodes had beenreached. If Player One had cooperated in the past, that wouldstill provide no good reason for him to cooperate now. A nashequilibrium requires only that the two strategies are best replies toeach other as the game actually develops. A stronger solution conceptfor extensive-form games requires that the two strategies would stillbe best replies to each other no matter what node on the game treewere reached. This notion of subgame-perfect equilibrium isdefined and defended in Selten 1975. It can be expressed by sayingthat the strategy-pair is a nash equilibrium for every subgame of theoriginal game, where a subgame is the result of taking a node of theoriginal game tree as the root, pruning away everything that does notdescend from it.Given this new, stronger solution concept, we can ask about thesolutions to the IPD. There is a significant theoretical difference onthis matter between IPDs of fixed, finite length, like the onepictured above, and those of infinite or indefinitely finitelength. In games of the first kind, one can prove by an argument knownas backward induction that Du,Du is the only subgame perfect equilibrium. Supposethe players know the game will last exactly n rounds. Then,no matter what node have been reached, at round n-1 theplayers face an ordinary (“one-shot”) PD, and they willdefect. At round n-2 the players know that, whatever they donow, they will both defect at the next round. Thus it is rational forthem to defect now as well. By repeating this argument sufficientlymany times, the rational players deduce that they should defect atevery node on the tree. Indeed, since at every node defection is abest response to any move, there can be no other subgame-perfectequilibria.In practice, there is not a great difference between how people behavein long fixed-length IPDs (except in the final few rounds) and thoseof indeterminate length. This suggests that some of the rationalityand common knowledge assumptions used in the backward inductionargument (and elsewhere in game theory) are unrealistic. There is aconsiderable literature attempting to formulate the argumentcarefully, examine its assumptions, and to see how relaxingunrealistic assumptions might change the rationally acceptablestrategies in the PD and other games of fixed length. (For a smallsample, see Bovens, Kreps et al, Kreps and Wilson, Pettit andSugden, Sobel 1993 and Binmore 1997).Player One's belief that there is a slight chance that Two mightpursue an “irrational” strategy other than continualdefection could make it rational for her to cooperate frequentlyherself. Indeed, even if One were certain of Two's rationality, One'sbelief that there was some chance that Two believed she harbored suchdoubts could have the same effect. Thus the argument for continualdefection in the IPD of fixed length depends on complex iteratedclaims of certain knowledge of rationality. An even more unrealisticassumption, noted by Rabinowicz and others, is that each playercontinue to believe that the other will choose rationally on the nextmove even after evidence of irrational play on previous moves. Forexample, it is assumed that, at the node reached after a long seriesof moves (C, C),…,(C, C), Player One willchoose D despite never having done so before.Some have used these kinds of observation to argue that the backwardinduction argument shows that standard assumptions about rationality(with other plausible assumptions) are inconsistent or self-defeating.For (with plausible assumptions) one way to ensure that a rationalplayer will doubt one's own rationality is to behave irrationally. Inthe fixed-length IPD, for example, Player One may be able to deducethat, if she were to follow an appropriate “irrational”strategy, Player Two would rationally react so that they can achievemutual cooperation in almost all rounds. So our assumptions seem toimply both that Player One should continually defect and that shewould do better if she didn't. (See Skyrms 1990, pp. 125-139 andBicchieri 1989.)12. The Centipede and the Finite IPDMany of the issues raised by the fixed-length IPD can be raised ineven starker form by a somewhat simpler g¨ï·¨ï·ed weak broadstability) (rwb-stability) if, when evolution proceeds accordingto the proportional fitness rule and the native population is playings, any (possibly heterogeneous) group of invaders ofsufficiently small size will fail to drive the natives toextinction. This condition turns out to be equivalent to a weakenedversion of MS identified by Bendor and Swistak.BS) For all strategies j,V(i, i) >V(j, i) or bothV(i,i)=V(j, i)and V(i,j)≥V(j,j) BS and rwb-stability are non-trivial conditions in the more generalevolutionary framework: strategies for the EPD that satisfyrwb-stability do exist. This does not particularly vindicate any ofthe strategies discussed above, however. Bendor and Swistak prove aresult analogous to the folk theorem mentioned previously: If theshadow of the future is sufficiently large, there are rwb-stablestrategies supporting any degree of cooperation from zero to one. Oneway to distinguish among the strategies that meet BS is by the size ofthe invasion required to overturn the natives, or, equivalently, bythe proportion of natives required to maintain stability. Bendor andSwistak show that this number, the minimal stabilizingfrequency, never exceeds 1/2: no population can resist everyinvading group as large as itself. They maintain that this result doesallow them to begin to provide a theoretical justification forAxelrod's claims. They are able to show that, as the shadow of thefuture approaches one, any strategy that is nice (meaning that it isnever first to defect) and retaliatory (meaning that it always defectsimmediately after it has been defected against) has a mn sense and experimental evidence suggest that realplayers rarely act in this way and this leads to questions aboutexactly what assumptions this kind of argument requires and whetherthey are realistic. (In addition to the sample mentioned in thesection on finitely iterated PDs, see, for example, Aumann 1998,Selten 1978, and Rabinowicz.) The centipede also raises the samequestions about cooperation and socially desirable altruism as doesthe PD and it is a favorite tool in empirical investigations of gameplaying.13. Infinite IterationOne way to avoid the dubious conclusion of the backward inductionargument without delving too deeply into conditions of knowledge andrationality is to consider infinitely repeated PDs. No human agentscan actually play an infinitely repeated game, of course, but theinfinite IPD has been considered an appropriate way to model a seriesof interactions in which the participants never have reason to thinkthe current interaction is their last. In this setting a pair ofstrategies determines an infinite path through of the game tree. Ifthe payoffs of the one-shot game are positive, their total along anysuch path is infinite. This makes it somewhat awkward to comparestrategies. If we confine ourselves to those strategies that can beimplemented by mechanical devices (with finite memories and speeds ofcomputation), however, it turns out that the sequence of payoffs toeach player will always, after a finite number of rounds, cyclerepeatedly through a particular finite sequence of payoffs. Therelative value of such infinite sequences of payoffs can then beidentified with the average value of the payoffs in one cycle. Thisvalue reflects the limit of average payoff per round as the number ofrounds increases. (See Binmore 1992, page 365 for furtherjustification.) Since there is no last round, it is obvious thatbackward induction does not apply to the infinite IPD.14. Indefinite IterationMost contemporary investigations the IPD take it to be neitherinfinite nor of fixed finite length but rather of indeterminatelength. This is accomplished by including in the game specification aprobability p (the “shadow of the future”) suchthat at each round the game will continue with probabilityp. Alternatively, a “discount factor” pis applied to the payoffs after each round so that present payoffs arevalued more highly than future. Mathematically, it makes littledifference whether p is regarded as a probability ofcontinuation or a discount on payoffs. The value of cooperation at agiven stage in an IPD clearly depends on the odds of meeting one'sopponent in later rounds. (This has been said to explain why the levelof courtesy is higher in a village than a metropolis and why customerstend leave better tips in local restaurants than distant ones.) Asp approaches zero, the IPD becomes a one-shot PD, and thevalue of defection increases. As p approaches one the IPDbecomes an infinite IPD, and the value of defection decreases. It isalso customary to insist that the game has the property labeled RCAabove, so that (in the symmetric game) players do better bycooperating on every round than they would do by “takingturns” — you cooperate while I defect and then I cooperatewhile you defect.There is an observation, apparently originating in Kavka, 1983, andgiven more mathematical form in Carroll, that the backward inductionargument applies as long as an upper bound to the length of the gameis common knowledge. For if b is such an upper bound, then,if the players were to get to stage b, they would know thatit was the last round and they would defect; if they were to get tostage b-1, they would know that their behavior on this roundcannot affect the decision to defect on the next, and so they woulddefect; and so on. It seems an easy matter to compute upper bounds onthe number of interactions in real-life situations. For example, sinceshopkeeper Jones cannot make more than one sale a second and since hewill live less than a thousand years, he and customer Smith cancalculate (conservatively) that they cannot possibly conduct more than1012 transactions. It is instructive to examine thisargument more closely in order to dramatize the assumptions made instandard treatments of the indefinite IPD and other indefinitelyrepeated games. Note first that, in an indefinite IPD as describedabove, there can be no upper bound on the length of the game. Thereis, instead, some fixed probability p that, at any time inwhich the game is still being played, it will continue to be playedwith probability p. If the interaction of Smith and Joneswere modeled as an indefinite IPD, therefore, the probability of theirinteracting in a thousand years would not be zero, but rather somenumber greater than pk where pis the probability of their interacting again now and k isthe number of seconds in a thousand years. A more realistic way tomodel the interaction might be to allow the value of p todecrease as the game progressed. As long as p always remainsgreater than zero, however, it remains true that there can be no upperbound on the number of possible interactions, i.e., no time at whichthe probability of future interactions becomes zero. Suppose, on theother hand, that there was a number n such that thatthere was zero probability of the game's continuing to stagen. Let p1, …,pn be the probabilities that gamecontinues after stage 1, …, stage n. Then there mustbe a smallest i such that pibecomes 0. (It would happen at i=n if not sooner.)Given the standard common knowledge assumptions that we have beenmaking, the players would know this value of i, and the IPDwould be one of fixed length, and not an indefinite IPD at all. In thecase of the shopkeeper and his customer, we are to suppose that bothknow today that their last interaction will occur, let's say, at noonon June 10th, 2020. The very plausible idea that we began with, viz.,that some upper bounds on the number of interactions arecommon knowledge, even though the smallest upper bound is not, isincompatible with the assumption that we know all the continuationprobabilities pi from the start.As Becker and Cudd astutely observe, we don't need an upper bound onthe number of possible iterations to make a backward inductionargument for defection possible. If the players know all the values ofpi from the outset, then, as long as thevalue of pi becomes and remainssufficiently small, they (and we) can compute a stage k atwhich the risk of future punishment and the chance of future reward nolonger outweighs the benefit of immediate defection. So they knowtheir opponent will defect at stage k, and the inductionbegins. This modification of the Kavka/Carroll argument, however, onlyfurther exposes the implausibility of its assumptions. Not only areSmith and Jones expected to believe that there is non-zero probabilitythat they will be interacting in a thousand years, each is expected tobe able to compute the precise day on which future interactions willbecome and remain so unlikely that their expected future return isoutweighed by that day's payoff. Furthermore each is expected tobelieve that the other has made this computation, and that the otherexpects him to have made it, and so on.The iterated version of the PD was discussed from the time the gamewas devised, but interest accelerated after influential publicationsof Robert Axelrod in the early eighties. Axelrod invited professionalgame theorists to submit computer programs for playing IPDs. All theprograms were entered into a tournament in which each played everyother (as well as a clone of itself and a strategy that cooperated anddefected at random) hundreds of times. It is easy to see that in agame like this no strategy is “best” in the sense that itsscore would be highest among any group of competitors. If the otherstrategies never consider the previous history of interaction inchoosing their next move, it would be best to defectunconditionally. If the other strategies all begin by cooperating andthen “punish” any defection against themselves bydefecting on all subsequent rounds, then a policy of unconditionalcooperation is better. Nevertheless, as in the transparent game, somestrategies have features that seem to allow them to do well in avariety of environments. The strategy that scored highest in Axelrod'sinitial tournament, Tit for Tat (henceforth TFT),simply cooperates on the first round and imitates its opponent'sprevious move thereafter. More significant than TFT'sinitial victory, perhaps is the fact that it won Axelrod's secondtournament, whose sixty three entrants were all given the results ofthe first tournament. In analyzing the his second tournament, Axelrodnoted that each of the entrants could be assigned one of five“representative” strategies in such a way that astrategy's success against a set of others can be accurately predictedby its success against their representative. As a furtherdemonstration of the strength of TFT, he calculatedthe scores each strategy would have received in tournaments in whichone of the representative strategies was five times as common as inthe original tournament. TFT received the highestscore in all but one of these hypothetical tournaments.Axelrod attributed the success of TFT to fourproperties. It is nice, meaning that it is never the first todefect. The eight nice entries in Axelrod's tournament were the eighthighest ranking strategies. It is retaliatory, making itdifficult for it to be exploited by the rules that were not nice. Itis forgiving, in the sense of being willing to cooperate evenwith those who have defected against it (provided their defectionwasn't in the immediately preceding round). An unforgiving rule isincapable of ever getting the reward payoff after its opponent hasdefected once. And it is clear, presumably making it easierfor other strategies to predict its behavior so as to facilitatemutually beneficial interaction.Suggestive as Axelrod's discussion is, it is worth noting that theideas are not formulated precisely enough to permit a rigorousdemonstration of the supremacy of TFT. One doesn'tknow, for example, the extent of the class of strategies that mighthave the four properties outlined, or what success criteria might beimplied by having them. It is true that if one's opponent is playingTFT (and the shadow of the future is sufficientlylarge) then one's maximum payoff is obtained by a strategy thatresults in mutual cooperation on every round. SinceTFT is itself one such strategy this implies thatTFT forms a nash equilibrium with itself in the spaceof all strategies. But that does not particularly distinguishTFT, for Du, Du isalso a nash equilibrium. Indeed, a “folk theorem” ofiterated game theory (now widely published — see, for example,Binmore 1992, pp. 373-377) implies that, for any p,0≤p≤1 there is a nash equilibrium in which pis the fraction of times that mutual cooperation occurs. IndeedTFT is, in some respects, worse than many ofthese other equilibrium strategies, because the folk theorem can besharpened to a similar result about subgame perfectequilibria. TFT is, in general, not subgameperfect. For, were one TFT player (perimpossible) to defect against another in a single round, thesecond would have done better as an unconditional cooperator.15. Iteration With ErrorIn a survey of the field several years after the publication of theresults reported above, Axelrod and Dion, chronicle several successesof TFT and modifications of it. They conclude that“research has shown that many of Axelrod's findings…canbe generalized to settings that are quite different from the originaltwo-player iterated Prisoner's Dilemma game.” But in severalreasonable settings TFT has serious drawbacks. Onesuch case, noted in the Axelrod and Dion survey, is when attempts aremade to incorporate the plausible assumption that players are subjectto errors of execution and perception. There are a number of ways thiscan be done. Bendor, for example, considers “noisypayoffs.” When a player cooperates while its opponent defects,its payoff is S+e, where e is a randomvariable whose expected value is 0. Each player infers the other'smove from its own payoff, and so if e is sufficiently highits inference may be mistaken. Sugden (pp 112-115) considers playerswho have a certain probability of making an error of execution that isapparent to them but not their opponents. Such players can adoptstrategies by which they “atone” for mistaken defectionsby being more cooperative on later rounds than they would be afterintended defection. Assuming that players themselves cannotdistinguish a mistaken move or observation from a real one, however,the simplest way to model the inevitability of error is simply toforbid completely deterministic strategies like TFT,replacing them with “imperfect” counterparts, like“imitate the other player's last move with 99% probability andoppose it with 1% probability.” Imperfect TFTis much less attractive than its deterministic sibling, because whentwo imperfect TFT strategies play each other, an“error” by either one will set off a long chain of movesin which the players take turns defecting. In a long iterated gamebetween two imperfect TFT's with any probabilityp of error, 0 < p < 1/2, players will approachthe same average payoffs as in a game between two strategies thatchoose randomly between cooperation and defection, namely1/4(R+P+S+T). That is considerablyworse than the payoff of R, that results when p=0.The predominant view seems to be that, when imperfection isinevitable, successful strategies will have to be more forgiving ofdefections by their opponents (since those defections might well beunintended). Molander 1985 demonstrates that strategies that mixTFT with Cu do approach a payoff ofR as the probability of error approaches zero. When thesemixes play each other, they benefit from higher ratios ofCu to TFT, but if they become toogenerous, they risk exploitation by “stingy” strategiesthat mix TFT with defection. Molander calculates thatwhen the mix is set so that, following a defection, one cooperateswith probability g(R, P, T,S)=min{1-(T-R)/(R-S),(R-P)/(T-P)}, the generousstrategies will get the highest score with each other that is possiblewithout allowing stingy strategies to do better against them than TFTdoes. Following Nowak and Sigmund, we label this strategygenerous TFT, or GTFT.The idea that the presence of imperfection induces greater forgivenessor generosity is only plausible for low levels of imperfection. As thelevel of imperfection approaches 1/2, Imperfect TFTbecomes indistinguishable from the random strategy, for which the veryungenerous Du is the best reply. A simulation byKollock seems to confirm that at high levels of imperfection, morestinginess is better policy than more forgiveness. But Bendor, Kramerand Swistak note that the strategies employed in the Kollocksimulation are not representative and so the results must beinterpreted with caution.A second idea is that an imperfect environment encourages strategiesto observe their opponent's play more carefully. In a tournamentsimilar to Axelrod's (Donninger) in which each player's moves weresubject to a 10% chance of alteration, TFT finishedsixth out twenty-one strategies. As might have predicted on thedominant view, it was beaten by the more generousTit-for-Two-Tats (which cooperates unless defectedagainst twice in a row). It was also beaten, however, by two versionsof Downing, a program that bases each new move on itsbest estimate how responsive its opponent has been to its previousmoves. In Axelrod's two original tournaments, Downinghad ranked near the bottom third of the programs submitted. Bendor(1987) demonstrates deductively that against imperfect strategiesthere are advantages to basing one's probability of defection onlonger histories than does TFT.One clever implementation of the idea that a strategies in animperfect environment should pay attention to their previousinteractions is the family of “Pavlovian” strategiesinvestigated by Kraines and Kraines. For each natural numbern, n-Pavlov, orPn, adjusts its probability ofcooperation in units of 1/n, according to how well it faredon the previous round. More precisely, ifPn was cooperating withprobability p on the last round, then on this round it willcooperate with probability p[+](1/n) if it receivedthe reward payoff on the previous round,p[−](1/n) if it received the punishmentpayoff, p[+](2/n)if it received the temptationpayoff, and p[−](2/n) if it received thesucker payoff. [+] and [−] are bounded addition and subtraction,i.e., x[+]y is the sum x+y unlessthat number exceeds one, in which case it is one (or as close to oneas the possibility of error allows), and x[−]yis similarly either x-y or close to zero. Strictlyspeaking, Pn is not fullyspecified until an initial probability of cooperation is given, butfor most purposes the value of that parameter becomes insignificant insufficiently long games and can be safely ignored. It may appear thatPn requires far morecomputational resources to implement than, say,TFT. Each move for the latter depends on only on itsopponent's last move, whereas each move forPn is a function of the entirehistory of previous moves of both players.Pn, however, can always calculateits next move by tracking only its current probability of cooperationand its last payoff. As its authors maintain, this seems like “anatural strategy in the animal world.” One can calculate thatfor n>1, Pn doesbetter against the random strategy than doesTFT. More generally,Pn does as well or better thanTFT against the generous unresponsive strategiesCp that always cooperate with fixed probabilityp≥1/2 (because an occasional temptation payoff can teachit to exploit the unresponsive strategies.) In these cases the“slow learner” versions of Pavlov with higher values ofn do slightly better than the “fast learners”with low values. Against responsive strategies, like other Pavlovianstrategies and TFT,Pn and its opponent eventuallyreach a state of (almost) constant cooperation. The total payoff isthen inversely related to the “training time,” i.e., thenumber of rounds required to reach that state. Since training time ofPn varies exponentially withn, Kraines and Kraines maintain thatP3 or P4 areto be preferred to other Pavlovian strategies, and are close to“ideal” IPD strategies. It should be noted, however, thatwhen (deterministic) TFT plays itself, no trainingtime at all is required, whereas when a Pavlovian strategy playsTFT or another Pavlov, the training time can belarge. Thus the cogency of the argument for the superiority of Pavlovover TFT depends on the observation that itsperformance shows less degradation when subject to imperfections. Itis also worth remembering that no strategy is best in everyenvironment, and the criteria used in defense of various strategies inthe IPD are vague and heterogeneous. One advantage of the evolutionaryversions of the IPD discussed in the next section is that they permitmore careful formulation and evaluation of success criteria.16. EvolutionPerhaps the most active area of research on the PD concernsevolutionary versions of the game. A population of players employingvarious strategies play IPDs among themselves. The lower scoringstrategies decrease in number, the higher scoring increase, and theprocess is repeated. Thus success in an evolutionary PD (henceforthEPD), requires doing well with other successful strategies, ratherthan doing well with a wide range of strategies.The initial population in an EPD can be represented by a set of pairs{(p1, s1),…(pn,sn)} where p1,…, pn are the proportions of thepopulation playing strategies s1,…, sn, respectively. Thedescription of EPDs given above does not specify exactly how thepopulation of strategies is to be reconstituted after each IPD. Theusual assumption, and the most sensible one for biologicalapplications, is that a score in any round indicates the relativenumber of “offspring” in the next. It is assumed that thesize of the entire population stays fixed, so that births of moresuccessful strategies are exactly offset by deaths of less successfulones. This amounts to the condition that the proportionpi* of each strategysi in the successor population isdetermined by the equationpi*=pi(Vi/V),where Vi is the score ofsi in the previous round and Vis the average of all scores in the population. Thus every strategythat scores above the population average will increase in number andevery one that scores below the average will decrease. This kind ofevolution is referred to as “replicator dynamics” orevolution according to the “proportional fitness”rule. Other rules of evolution are possible. Bendor and Swistak arguethat, for social applications, it makes more sense to think of theplayers as switching from one strategy to another rather than ascoming into and of existence. Since rational players would presumablyswitch only to strategies that received the highest payoff in previousrounds, only the highest scoring strategies would increase innumbers. A variety of other possible evolutionary dynamics aredescribed and discussed in Kuhn 2004. Discussion here, however, willprimarily concern EPDs with the proportional fitness rule.Axelrod, borrowing from Trivers and Maynard Smith, includes adescription of the EPD with proportional fitness, and a brief analysisof the evolutionary version of his IPD tournament. For Axelrod, theEPD provides one more piece of evidence in favor ofTFT:TIT FOR TAT had a very slight lead in the originaltournament, and never lost this lead in simulated generations. By theone-thousandth generation it was the most successful rule and stillgrowing at a faster rate than any other rule.Axelrod's EPD tournament, however, incorporated several features thatmight be deemed artificial. First, it permitted deterministicstrategies in a noise-free environment. As noted above,TFT can be expected to do worse under conditions thatmodel the inevitability of error. Second, it began with only the 63strategies from the original IPD tournament. Success againststrategies concocted in the ivory tower may not imply success againstall those that might be found in nature. Third, the only strategiespermitted to compete at a given stage were the survivors from theprevious stage. A more realistic model, one might argue, would allownew “mutant” strategies to enter the game at anystage. Changing this third feature might well be expected to hurtTFT. For a large growth in the TFTpopulation would make it possible for mutants employing more naivestrategies like Cu to regain a foothold, and thepresence of these naifs in the population might favor nastierstrategies like Du over TFT.Nowak and Sigmund simulated two kinds of tournaments that avoid thethree questionable features. The first examined the family of“reactive” strategies. For any probabilities y,p, and q, R(y, p,q) is the strategy of cooperating with probability yin the first round and thereafter with probability p if theother player has cooperated in the previous round, and withprobability q if she has defected. This is a broad family,including many of the strategies alreadyconsidered. Cu, Du,TFT, and Cp areR(1,1,1), R(0,0,0),R(1,1,0), and R(p,p, p). To capture the inevitability of error, Nowakand Sigmund exclude the deterministic strategies, where p andq are exactly 1 or 0, from their tournaments. As before, ifthe game is sufficiently long (and p and q are notintegers), the first move can be ignored and a reactive strategy canbe identified with its p and q values. Particularattention is paid to the strategies close to Molander'sGTFT described above, where p=1 andq=min{1-(T-R)/(R-S),(R-P)/(T-P). The first series ofNowak and Sigmund's EPD tournaments begin with representative samplesof reactive strategies. For most such tournaments, they found thatevolution led irreversibly to Du. Those strategiesR(p, q) closest toR(0,0) thrived while the others perished. When one ofthe initial strategies is very close to TFT, however,the outcome changes.TFT and all other reciprocating strategies (near (1,0))seem to have disappeared. But an embattled minority remains and fightsback. The tide turns when ‘suckers’ are so decimated thatexploiters can no longer feed on them. Slowly at first, but gatheringmomentum, the reciprocators come back, and the exploiters nowwane. But the TFT-like strategy that caused thisreversal of fortune is not going to profit from it: having eliminatedthe exploiters, it is robbed of its mission and superseded by thestrategy closest to GTFT. Evolution then stops. Evenif we introduce occasionally 1% of another strategy it willvanish.On the basis of their tournaments among reactive strategies, Nowak andSigmund conjectured that, while TFT is essential forthe emergence of cooperation, the strategy that actually underliespersistent patterns of cooperation in the biological world is morelikely to be GTFT.A second series of simulations with a wider class of strategies,however, forced them to revise their opinion. The strategiesconsidered in the second series allowed each player to base itsprobability of cooperation on its own previous move as well as itsopponent's. A strategy can now be represented asS(p1, p2,p3, p4) wherep1, p2,p3, p4 are the probabilitiesof defecting after outcomes (C, C),(C, D), (D,C), and (D, D),respectively i.e., after receiving the reward, sucker, temptation andpunishment payoffs. (Again, we can ignore the probability ofdefecting on the first move as long as thepis are not zero or one.) The initialpopulation in these tournaments all play the random strategyS(.5, .5, .5, .5) and after every 100 generations asmall amount of a randomly chosen (non-deterministic) mutant isintroduced, and the population evolves by proportional fitness. Theresults are quite different than before. After 107generations, a state of steady mutual cooperation was reached in 90%of the simulation trials. But less than 8.3% of these states werepopulated by players using TFT orGTFT. The remaining 91.7% were dominated bystrategies close to S(1,0,0,1). This is the just thePavlovian strategy P1 of Kraines andKraines, which cooperates after receiving R or T anddefects after receiving P or S. Kraines and Kraineshad been somewhat dismissive of P1. Theyrecall that Rapoport and Chammah, who identified it early in thehistory of game theory had labeled it “simpleton” andremark that “the appellation is well deserved”. Indeed,P1 has the unfortunate characteristic oftrying to cooperate with Du on every other turn, andagainst TFT it can get locked into the inferiorrepeating series of payoffs T, P, S,T, P, S, … . But Nowak and Sigmund'ssimulations suggest that these defects do not matter very much inevolutionary contexts. One reason may be thatP1 helps to make its environmentunsuitable for its enemies. Du does well in anenvironment with generous strategies, like Cu orGTFT. TFT, as we have seen, allowsthese strategies to flourish, which could pave the way forDu. Thus, although TFT fares lessbadly against Du than P1does, P1 is better at keeping itsenvironment free of Du.Simulations in a universe of deterministic strategies yield resultsquite different than those of Nowak and Sigmund. Bruce Linster (1992and 1994) suggests that natural classes of strategies and realisticmechanisms of evolution can be defined by representing strategies assimple Moore machines. For example,P1 is represented by the machine picturedbelow. Figure 5This machine has two states, indicated by circles. It begins in theleftmost state. The C in the left circle means thatthe machine cooperates on the first move. The arrow leading from theleft to the right circle indicates that machine defects (enters theD) after it has cooperated (been in theC state) and its opponent has defected (the arrow islabeled by d). Linster has conducted simulations ofevolutionary PD's among the strategies that can be represented bytwo-state Moore machines. It turns out that these are exactly thedeterministic versions of the S strategies of Nowakand Sigmund. Since the strategies are deterministic, we mustdistinguish between the versions that cooperate on the first round andthose that defect on the first round. Among the first roundcooperators, S(0,0,0,0), S(0,0,0,1),S(0,0,1,0) and S(0,0,1,1) allrepresent the strategy Cu of unconditionalcooperation. Similarly, four of the first-round defectors allrepresent Du. Each of the otherS(p1, p2,p3, p4) wherep1, p2,p3, p4 are either zero or onerepresent unique strategies. By deleting the six duplicates from thethirty-two deterministic versions of Nowak and Sigmund's strategies,we obtain the twenty-six “two-state” strategies consideredby Linster.Linster simulated a variety of EPD tournaments among the two-statestrategies. Some used “uniform mutation” in which eachstrategy in the population has an equal probability m ofmutating into any of the other strategies. Some used “stylizedmutation” in which the only mutations permitted are those thatcan be understood as the result of a single “broken link”in the Moore machine diagrams. In some, mutations were assumed tooccur to a tiny proportion of the population at each generation; inothers the “mutants” represented an invading forceamounting to one percent of the original population. In some, apenalty was levied for increased complexity in the form of reducedpayoffs for machines requiring more states or more links. As one mightexpect, results vary somewhat depending on conditions. There are somestriking differences, however, between all of Linster's results andthose of Nowak and Sigmund. In Linster's tournaments, no singlestrategy ever dominated the surviving populations in the way thatP1 and GTFT did in Nowakand Sigmund's. The one strategy that did generally come to compriseover fifty percent of the population was the initially-cooperatingversion of S(0,1,1,1). This is a strategy whoseimperfect variants seem to have been remarkably uncompetitive forNowak and Sigmund. It has been frequently discussed in the game theoryliterature under the label GRIM orTRIGGER. It cooperates until its opponent hasdefected once, and then defects for the rest of the game. According toSkyrms (1998) and Vanderschraaf, both Hobbes and Hume identified it asthe strategy that underlies our cooperative behavior in importantPD-like situations. The explanation for the discrepancy betweenGRIM's performance for Linster and for Nowak andSigmund probably has to do with its sharp deterioration in thepresence of error. In a match between two imperfectGRIMs, an “erronious” defection by eitherleads to a long string of mutual defections. Thus, in the long runimperfect GRIM does poorly against itself. The otherstrategies that survived (in lesser numbers) Linster's tournaments areTFT, P1,Cu, and the initially-cooperativeS(0,1,1,0). (Note that imperfectGRIM is also likely to do poorly against imperfectversions of these.) The observation that evolution might lead to astable mix of strategies (perhaps each serving to protect othersagainst particular types of invaders) rather than a single dominantstrategy is suggestive. Equally suggestive is the result obtainedunder a few special conditions in which evolution leads to a recurringcycle of population mixes.One might expect it to be possible to predict the strategies that willprevail in EPDs meeting various conditions, and to justify suchpredictions by formal proofs. Until recently, however, mathematicalanalyses of the EPD have been plagued by conceptual confusions about“evolutionary stability,” the condition under which, asNowak and Sigmund say, “evolution stops”. Axelrod andAxelrod & Hamilton claim to show that TFT isevolutionarily stable. Selten 1983, includes an example of a gamewith no evolutionarily stable strategy, and Selten's argument thatthere is no such strategy clearly applies to the EPDand other evolutionary games. Boyd and Lorberbaum and Farrell and Warepresent still different proofs demonstrating that no strategies forthe EPD are evolutionarily stable. Unsurprisingly,the paradox is resolved by observing that the three groups of authorseach employ slightly different conceptions of evolutionarystability. The conceptual tangle is unraveled in a series of papers byBendor and Swistak. Two stability concepts are described and appliedto the EPD below. Readers who wish to compare these with some othersthat appear in the literature may consult the following briefguide:Concepts of Stability in Evolutionary Games.A strategy s for an evolutionary game hasuniversal strong narrow stability(“usn-stability”) if a population playing strategys will, under any rule of evolution, drive toextinction any sufficiently small group of invaders all of which playthe same strategy. An evolutionary game has usn-stability just in caseit meets a simple condition on payoffs identified by MaynardSmith:MS) For all strategies j,V(i, i) >V(j, i) or bothV(i,i)=V(j, i)and V(i,j)>V(j,j) (Here, and in what follows, the notationV(i, j) indicates thepayoff to strategy i when i playsj.) MS says that any invaders do strictly worseagainst the natives than the natives themselves do against the nativesor else they get exactly the same payoff against the natives as thenatives themselves do, but the native does better against the invaderthan the invader himself does.For any strategy i in the IPD (or indeed in anyiterated finite game), however, there are strategiesj different from i such thatj mimics the way i plays when itplays against i or j. The existenceof these “neutral mutants” implies that MS cannot besatisfied and so no EPD has usn-stability. This argument, of course,uses the assumption that any strategy in the iterated game is apossible invader. There may be good reason to restrict the availablestrategies. For example, if the players are assumed to have noknowledge of previous interactions, then it may be appropriate torestrict available strategies to the unconditional ones. Since a pairof players then get the same payoffs in every round of an iteratedgame, we may as well take each round of the evolutionary game to beone-shot games between every pair of players rather than iteratedgames. Indeed, this is the kind of evolutionary game that MaynardSmith himself considered. In this framework, any strategyS such that (S, S)is a strict nash equilibrium in the underlying one-shot game(including unconditional defection in the PD) meets the MScondition. Thus MS and usn-stability are non-trivial conditions insome contexts.A strategy s has restricted weak broadstability) (rwb-stability) if, when evolution proceeds accordingto the proportional fitness rule and the native population is playings, any (possibly heterogeneous) group of invaders ofsufficiently small size will fail to drive the natives toextinction. This condition turns out to be equivalent to a weakenedversion of MS identified by Bendor and Swistak.BS) For all strategies j,V(i, i) >V(j, i) or bothV(i,i)=V(j, i)and V(i,j)≥V(j,j) BS and rwb-stability are non-trivial conditions in the more generalevolutionary framework: strategies for the EPD that satisfyrwb-stability do exist. This does not particularly vindicate any ofthe strategies discussed above, however. Bendor and Swistak prove aresult analogous to the folk theorem mentioned previously: If theshadow of the future is sufficiently large, there are rwb-stablestrategies supporting any degree of cooperation from zero to one. Oneway to distinguish among the strategies that meet BS is by the size ofthe invasion required to overturn the natives, or, equivalently, bythe proportion of natives required to maintain stability. Bendor andSwistak show that this number, the minimal stabilizingfrequency, never exceeds 1/2: no population can resist everyinvading group as large as itself. They maintain that this result doesallow them to begin to provide a theoretical justification forAxelrod's claims. They are able to show that, as the shadow of thefuture approaches one, any strategy that is nice (meaning that it isnever first to defect) and retaliatory (meaning that it always defectsimmediately after it has been defected against) has a minimalstabilizing frequency approaching one half. TFT hasboth these properties and, in fact, they are the first two of the fourproperties Axelrod cited as instrumental to TFT'ssuccess. There are, of course, many other nice and retaliatorystrategies, and there are strategies (likeP1) that are not retaliatory but stillsatisfy rwb-stability. But Bendor and Swistak are at least able toshow that any “maximally robust” strategy, i.e., anystrategy whose minimum stabilizing frequency approaches one half,chooses cooperation on all but a finite number of moves in aninfinitely repeated PD.Bendor and Swistak's results must be interpreted with some care.First, one should keep in mind that no probabilistic or noise-sensitive strategies can fit the definitions of either“nice” or “retaliatory”strategies. Furthermore, imperfect versions of TFT donot satisfy rwb-stability. They can be overthrown by arbitrarily smallinvasions of deterministic TFT or, indeed, byarbitrary small invasions of any less imperfectTFT. Second, one must remember that the results aboutminimal stabilizing frequencies only concern weak stability. If thenumber of generations is large compared with the original population(as it often is in biological applications), a population that isinitially composed entirely of players employing the same maximallyrobust strategy, could well admit a sequence of small invading groupsthat eventually reduces the original strategy to less than half of thepopulation. At that point the original strategy could beoverthrown.It is likely that both of these caveats play some role in explainingan apparent discrepancy between the Bendor/Swistak results and theNowak/Sigmund simulations. One would expect Bendor/Swistak's minimalstabilizing frequency to provide some indication of the length of timethat a population plays a particular strategy. A strategy requiring alarge invasion to overturn is likely to prevail longer than a strategyrequiring only a small invasion. A straightforward calculation revealsthat P1 has a relatively low minimumstabilizing frequency. It is overturned by invasions of unconditionaldefectors exceeding 10% of the population. Yet in the Nowak/Sigmundsimulations, P1-like strategies predominateover TFT-like strategies. Since the simulationsrequired imperfection and since they generated a sequence of mutantsvastly larger than the original population, there is no realcontradiction here. Nevertheless the discrepancy suggests that we donot yet have a theoretical understanding of EPDs sufficient to predictthe strategies that will emerge under various plausible conditions.Like usn-stability, the concept of rwb-stability can be morediscriminating if it is relativized to a particular set of strategies.Molander's 1992 investigation of Schelling's many-person version ofthe PD, for example, restricts attention to the family{S1, …,Sn} of TFT-likestratgies. A player adopting Sicooperates on the first round and on every subsequent round after atleast i others cooperate. By construing stability asresistance to invasions by other family members, Molander is able toshow that there are conditions under which a particular mix of two ofthe Si's (one equivalent toDu) is uniquely stable. The significance of resultslike these, however, depends on the plausibility of such limitationson the set of permissible strategies.17. Spatial PDsA previous section discussed a controversial argument that cooperationis rational in a PD when each player knows that the other is enoughlike himself to make it likely that they will choose the same move. Ananalog of this argument in the evolutionary context is more obviouslycogent. If agents are not paired at random, but rather are more likelyto play others employing similar strategies, then cooperative behavioris more likely to emerge.There are at least three mechanisms by which this kind of“association” among players can be achieved. One suchmechanism in evolutionary PDs has been widely studied under the label“spatial PD.” Players are arranged in some“geographical” arrangement. This may be an array with arectangular boundary, for example, or a circle, or surface of a sphereor surface of a torus with no boundary. From the geographicalarrangement two (possibly identical) kinds of neighborhoods areidentified for each player. Agents meet only those in their“interaction” neighborhood, and the evolutionary dynamicsconsiders only the payoffs to those in their “comparison”neighborhood. Generally, the evolutionary dynamics employed is one of“winner imitation” within the interactionneighborhood. (This can model either the idea that each player isinvaded by its most successful neighbor or the idea that each playeradopts the most successful strategy that it sees.) Because bothevolution and interaction are “local,” players are morelikely (after the first round) to meet those playing strategies liketheir own in an SPD than they would be in an ordinary evolutionarygame. In addition to the “association” effects, one shouldalso keep in mind that the outcome of SPDs may be influenced by winnerimitation dynamics, which may drive to extinction strategies thatmight survive—and eventually predominate—with thereplicator dynamics more commonly employed in ordinary EPDs.As usual, the impetus for looking at spatial SPDs seems to come fromAxelrod. Four copies of each of the 63 strategies submitted toAxelrod's tournament were arranged on a grid with a spherical geometryso that each cell had four neighbors for both interaction andcomparison. For every initial random distribution, the resulting SPDeventually reached a state where the strategy in every cell wascooperating with all its neighbors, at which point no furtherevolution is possible. Only about ten of the 63 original strategiesremained in these end-states, and they were no longer randomlydistributed, but segregated into clumps of various sizes. Axelrod alsoshowed that under special conditions evolution in an SPD can createsuccessions of complex symmetrical patterns that do not appear toreach any steady-state equilibrium.To get an idea of why cooperative behavior might spread in this andsimilar frameworks, consider two agents on either side of a frontierbetween cooperating and non-cooperating subpopulations. Thecooperative agent sees a cooperative neighbor whose four neighbors allcooperate, and who therefore gets four times the reward payoff afterplaying them all. So he will imitate this neighbor's strategy andremain cooperative. The non-cooperating agent, on the other hand, seeshis cooperative counterpart, who gets three reward payoffs from hiscooperative neighbors and one sucker payoff. He compares this to thepayoffs of his non-cooperatiave neighbors. The best these can do is toget three punishments and a temptation. So, as long as3R+S exceeds 3P+T, thenon-coperative agent on the frontier will adopt the strategy of hiscooperative neighbor. Axelrod's payoffs of 5,3,1 and 0 for T,R, P and S, do meet this condition.Nowak and May have investigated in greater detail SPDs in which theonly permitted strategies are Cu andDu. (These are the strategies appropriate amongindividuals lacking memory or recognition skills.) They find that, fora variety of spatial configurations, and distributions of strategiesevolution depends on relative payoffs in a uniform way. When thetemptation payoff is sufficiently high, clusters ofDu grow and those of Cu shrink; whenit is sufficiently low, the Du clusters shrink andthe Cu ones grow. For a narrow range of intermediatevalues, we get more of the complicated patterns noted above. Theevolving patterns exhibit great variety. For a given spatialconfiguration, however, the ratio of the two strategies seems toapproach the same constant value for all initial distributions ofstrategies and all payoffs within the special range. More recently,Nowak, Boenhoeffer and May have observed similar phenomena under avariety of error-conditions identified by Mukherji and Rajan, althoughcooperators seem to require lower relative temptation values to thriveunder these conditions, and the level of error must be sufficientlylow.Grim, Mar and St Denis report a number of SPD simulations with agreater variety of initial strategies. In general their observationsconfirm the plausible conjecture that cooperative outcomes are morecommon in SPDs than ordinary EPDs. Simulations starting with all ofthe pure reactive strategies of Nowak and Sigmund (i.e., all of thestrategies R(y, p, q)described above where y, p and q are either0 or 1.), all ended with TFT (i.e., withR(1,1,0)) as the only survivor (though otheroutcomes—including one in which Du is the solesurvivor and ones in which Cu andTFT are intermixed—are clearly possible.)Simulations of the 64 possible pure strategies in which a move maydepend on the opponent's previous two moves, ended with mixedpopulations of survivors all of whom defect after two defections,cooperate after two cooperations and cooperate in the second round ofthe game. (Again, other outcomes are possible.) Simulations with many(viz., 100) evenly distributed samples of Nowak and Sigmunds mixedreactive strategies, tended to be taken over byR(.99, .1), which is a version of generousTFT with slightly less generosity thanGTFT. Simulations beginning with a random selectionof a few (viz., 8) of these strategies tended to evolve to a mixedstable or cyclic pattern dominated by a single version of generousTFT with considerably more generosity thanGTFT. R(.99, .6) seems to have beena frequent victor.The “geographical” aspect of SPD's need not be taken tooliterally. For social applications, and probably even for biologicalones, there seems to be no motivation for any particular geometricalarrangement. (Why not a “honeycomb,” for example, whereeach agent has six neighbors, rather than a grid where each agent hasfour?) The interest in SPDs presumably lies in the insight that my“neighborhood” of interaction and my“neighborhood” of comparison, is much smaller than thepopulation as a whole even if it turns out not to be limited bydetails of physical geography. Nevertheless SPD models of theevolution of cooperation in particular geometrical arrangements havegiven us some suggestive and pretty pictures to contemplate. Severalexamples are accessible through the links at the end of thisentry.18. PDs and Social NetworksOne way to make the idea of local interaction more realistic for someapplications is to let the agents choose the partners withwhom to interact, based on payoffs in past interactions. Skyrms 2004considers iterated PDs among a population of unconditional cooperatorsand defectors. Initially, as usual, each agent chooses a partner atrandom from the remaining members of the population. For subsequentinteractions, however, the odds of choosing that partner are adjusted,according to either the payoffs from previous times that partner waschosen, or (more realistically) the payoffs from previous times thatthere has been interaction with that partner (regardless of which onewas “chooser”). In a typical PD, where the payoffs fortemptation, reward, punishment and sucker are 3,2,1 and 0, bothcooperators and defectors eventually choose only cooperators. Sincethe cooperators are chosen by both cooperators and defectors, theyplay more often than the defectors who play only when they arechosen. If we assume that there is an equal division betweencooperators and defectors, then cooperators can expect a return of tworeward payoffs, while defectors can expect a return of one temptationpayoff. So with payoff structure indicated, cooperators do better,even with this “one-way” association. The story mayunfold somewhat differently in what Skyrms calls an“attenuated” PD, where the payoffs are, let us say, 2.01,2, 1.98, and 0. (We might think of this as the “Just Don't Be aSucker” game.) Here, as before, the cooperators quickly learnnot to choose defectors as partners. The defectors get roughly thesame payoffs whether they choose cooperators or defectors aspartners. Since they rapidly cease being chosen by cooperators,however, their returns from interactions with cooperators will be lessthan returns from defectors and they will soon limit their choices toother defectors. (It is important to understand here that the learningalgorithm that determines the probability I will interact with agenta depends on total returns from interactingwith a (or total recent returns frominteracting with a) rather than averagereturns from interacting with a.) So in theattenuated game we end up with perfect association: defectors playdefectors and cooperators play cooperators. Since the reward payoffslightly exceeds the punishment payoff, the cooperators again dobetter than the defectors.The social network games considered above are not really evolutionaryPDs in the sense described above. The patterns of interaction evolve,but the strategy profile of the population remains fixed. It isnatural to allow both strategies and probabilities of interaction toevolve simultaneously as payoffs are distributed. Whether cooperationor defection (or neither) comes to dominate the population under suchconditions depends on a multitude of factors: the values of thepayoffs, the initial distribution of strategies, the relative speed ofthe adjustments in strategy and interaction probabilities, and otherproperties of those two evolutionary dynamics. Skyrms 2004 contains ageneral discussion and a number of suggestive examples, but it doesnot provide (or aim to provide) a comprehensive account of socialnetwork PDs or a careful analysis of precise formulations to properlymodel particular phenomena. Much remains unknown.19. Group Selection and the Haystack PDA third mechanism by which players can be made more likely to meetthose like themselves is to consider a more sophisticated dynamics ofevolution that operates on groups of players as well as on theindividuals within those groups. There has been a heated debate amongbiologists and philosophers of biology about the appropriate“units of selection” on which natural selectionoperates. The idea that in many cases it makes sense to take theseunits to be groups of individuals (instead of, or in addition to,genes or individuals) has recently been resuscitated as a respectableand plausible viewpoint. (See Sober and Wilson or Wilson and Sober fora history and impassioned defense of this resuscitation.) For culturalevolution, the idea is equally plausible—within-group behaviormay be in equilibrium, but the equilibria reached by different groupsmay b | |