From the prisoner’s dilemma to collective AI behavior risks
How rational agents make decisions when outcomes depend on each other.
1949 — The Soviets have the bomb.
“With the Russians it is not a question of whether but of when. If you say why not bomb them tomorrow, I say why not today? If you say today at 5 o’clock, I say why not one o’clock?”
— John von Neumann, LIFE Magazine, 1957

Two players choose to cooperate or defect. Defecting is always the rational choice, yet mutual defection is worse than mutual cooperation. “If I don’t do it, someone else will.”
| Someone else | |||
|---|---|---|---|
| Cooperate | Defect | ||
| Me | Cooperate | Reward | Sucker |
| Defect | Temptation | Punishment | |
Robert Axelrod invited experts to submit strategies for a repeated prisoner’s dilemma. We’ll start with Random as the baseline, then add student strategies one by one.
| Someone else | |||
|---|---|---|---|
| Cooperate | Defect | ||
| Me | Cooperate | Reward | Sucker |
| Defect | Temptation | Punishment | |
In the real world, noise — misunderstandings, accidents — can trigger a death spiral of endless retaliation.
Stanislav Petrov, 1983: a Soviet officer correctly identified a “noise” error in a missile detection system, preventing nuclear war.
One possible repair: forgive some defections to restore cooperation. Under the classic payoff matrix, a stable forgiveness rate can be about one third.
Agents only copy the neighbor who earned the highest payoff. No one is altruistic, yet cooperative clusters can survive and spread.
This is not chess, where one must lose for the other to win.
In a non-zero-sum world, you don’t “win” by beating the other player — you win by extracting the most reward from the environment.
Cooperation unlocks rewards that defection cannot reach.
Why some games have no “average” outcome — and what that means for races.
Normal distributions have a scale: most outcomes cluster around the average.
Log-normal distributions come from multiplicative growth. Power laws go further: on log-log axes, the tail becomes a straight line.
A coin is flipped until heads. The payout doubles each round.
E=∑n=1∞2n1⋅2n=∑n=1∞1=∞
The expected value is infinite — yet no rational person would pay $1,000 to play.
This is the mathematical backbone of power laws: tiny probability × massive payout skews the entire system.
Heat a magnet to its Curie temperature. At the critical point:
At criticality, technical details stop mattering — only universality classes remain.
Some systems drive themselves to the critical point.
Forest fires: suppress all small fires → the forest grows too dense → a single lightning strike causes a mega-fire. The cause of a small event and a catastrophe is the same — only the state of the system determines the outcome.
Sandpiles: add grains one by one. The pile self-organizes to a critical slope where avalanches follow a power law.
In networks, new nodes connect to already-popular nodes. The rich get richer.
This “snowball effect” creates power-law distributions where a few hubs (Google, YouTube) dominate the entire network.
The same dynamic applies to AI labs, capital flows, and talent concentration.
| Feature | Normal | Power Law |
|---|---|---|
| Randomness | Additive | Multiplicative |
| Scale | Has inherent scale | Scale-free (fractal) |
| Outliers | Mathematically rare | Dominate the average |
| Examples | Height, IQ | Earthquakes, wealth, citations |
| Strategy | Be consistent | Be persistent, take many bets |
VCs aren’t playing a normal-distribution game. They’re playing St. Petersburg.
One mega-winner pays for 99 failures. The rational strategy is to fund everything that could be huge, regardless of individual risk.
This is the same logic as “if I don’t do it, someone else will” — but now with multiplicative stakes.
The prisoner’s dilemma meets the power law.
What happens when game theory and power-law incentives collide in AI.
As competition intensifies, safety investment drops.
Limited interactions promote cooperation.
When everyone competes with everyone, cooperation collapses.
Network structure determines whether norms can survive.
High-frequency local feedback can overpower low-frequency grounding.
When every agent forms a private dyad, context fragments: the population loses shared reality while coordinated players retain an advantage.
Discussion, questions, and feedback are welcome!