A new strategy for the iterated prisoner’s dilemma allows, over the very long run, one player to unilaterally claim an unfair share of the rewards.
The prisoner’s dilemma, in which two prisoners, separately offered a deal, must decide whether to cooperate or betray the other, is a study of cooperation and game theory because
if both players keep quiet, each gets a brief sentence. But if one betrays the other, the snitch gets off scot-free while their partner suffers a long sentence. If both players betray each other, each gets a medium sentence. As a united pair, players do better if they both keep shtum. But crucially, if criminal A thinks B won’t blab, it is in A’s best interest to snitch, as he will then walk free – at B’s expense.
Usually, the best strategy for repeated plays is to collaborate, because if your partner betrays you this time, you will betray them next, and you will both end up with longer sentences overall.
Now, New Scientist reports, Bill Press and Freeman Dyson have published a paper in which they claim to prove that “unexpectedly”, strategies exist “whereby one player can enforce a unilateral claim to an unfair share of rewards”. The abstract explains that
in particular, a player X who is witting of these strategies can (i) deterministically set her opponent Y’s score, independently of his strategy or response, or (ii) enforce an extortionate linear relation between her and his scores. Against such a player, an evolutionary player’s best response is to accede to the extortion. Only a player with a theory of mind about his opponent can do better, in which case Iterated Prisoner’s Dilemma is an Ultimatum Game.
The New Scientist article doesn’t explain the strategy. In the comments, though, author Michael Marshall does an admirable job of attempting to engage with readers queries. Commenter Eric Kvaalen gives an example of the strategy from the paper:
Say that the medium jail sentence is 4 years, the long one is 5 years, and if both prisoners keep quiet they both get 2 years. Then the strategy is that in the present game I cooperate with a probability $p$ which depends on what happened in the last round:
I cooperated, you cooperated, $p=\frac{11}{13}.$
I cooperated, you defected, $p=\frac{1}{2}.$
I defected, you cooperated, $p=\frac{7}{26}.$
We both defected, $p=0.$
If I adopt this strategy, your best defense is to always cooperate and you’ll get on average about 3.1 years, whereas I’ll get on average about 1.3 years. If you also adopt the strategy, you’ll do worse — we will get into a rut of constantly defecting, so we’ll both get a 4 year sentence on each round.
Also in the comments, Michael Marshall points out that the strategy
plays out over very many turns of the game. So it’s rather akin to spread-betting. The player who uses this strategy does better in the long run, but may take a pounding on many individual turns in the process.
Source: A dirty twist on beating the prisoner’s dilemma.
Paper: Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent.