Game Theory & AI Safety

From the prisoner’s dilemma to collective AI behavior risks

Part I: Game Theory

How rational agents make decisions when outcomes depend on each other.

The Nuclear Stakes

1949 — The Soviets have the bomb.

“With the Russians it is not a question of whether but of when. If you say why not bomb them tomorrow, I say why not today? If you say today at 5 o’clock, I say why not one o’clock?”

— John von Neumann, LIFE Magazine, 1957

The Prisoner’s Dilemma

Two players choose to cooperate or defect. Defecting is always the rational choice, yet mutual defection is worse than mutual cooperation. “If I don’t do it, someone else will.”

		Someone else
		Cooperate	Defect
Me	Cooperate	Reward	Sucker
Me	Defect	Temptation	Punishment

Me Someone else

Games = 25 I defect = 50% They defect = 50%

Axelrod’s Tournament (1980)

Robert Axelrod invited experts to submit strategies for a repeated prisoner’s dilemma. We’ll start with Random as the baseline, then add student strategies one by one.

		Someone else
		Cooperate	Defect
Me	Cooperate	Reward	Sucker
Me	Defect	Temptation	Punishment

0 0 0

Show literature strategies Rounds = 200 Noise = 0%

The Noise Problem

In the real world, noise — misunderstandings, accidents — can trigger a death spiral of endless retaliation.

Stanislav Petrov, 1983: a Soviet officer correctly identified a “noise” error in a missile detection system, preventing nuclear war.

One possible repair: forgive some defections to restore cooperation. Under the classic payoff matrix, a stable forgiveness rate can be about one third.

Emergence of Cooperation

Agents only copy the neighbor who earned the highest payoff. No one is altruistic, yet cooperative clusters can survive and spread.

Cooperative moves 0%

Generation 0

Dominant -

good mixed bad

Paint strategy Temptation = 1.50 Match rounds = 20 Mutation = 0.2%

Four Traits of Successful Strategies

Nice — never be the first to defect
Retaliatory — if the opponent defects, defect back immediately
Forgiving — if the opponent returns to cooperating, stop retaliating
Clear — your strategy should be predictable so others can learn to trust you

Life Is Non-Zero-Sum

This is not chess, where one must lose for the other to win.

In a non-zero-sum world, you don’t “win” by beating the other player — you win by extracting the most reward from the environment.

Cooperation unlocks rewards that defection cannot reach.

Part II: Venture Capital

Why some games have no “average” outcome — and what that means for races.

Normal, Log-Normal, Power Law

Normal distributions have a scale: most outcomes cluster around the average.

Log-normal distributions come from multiplicative growth. Power laws go further: on log-log axes, the tail becomes a straight line.

Linear Log x Linear Log-log

Power-law exponent = 2.400 mean = 3.5

The St. Petersburg Paradox

A coin is flipped until heads. The payout doubles each round.

$E = \sum_{n=1}^{\infty} \frac{1}{2^n} \cdot 2^n = \sum_{n=1}^{\infty} 1 = \infty$

The expected value is infinite — yet no rational person would pay $1,000 to play.

This is the mathematical backbone of power laws: tiny probability × massive payout skews the entire system.

Criticality & Phase Transitions

Heat a magnet to its Curie temperature. At the critical point:

The system becomes scale-free and fractal
A single atom flipping can cascade across the entire material
The system is maximally unpredictable

At criticality, technical details stop mattering — only universality classes remain.

temperature 2.40

magnetization 0.00

state near critical

T_c = 2.269

Self-Organized Criticality

Some systems drive themselves to the critical point.

Forest fires: suppress all small fires → the forest grows too dense → a single lightning strike causes a mega-fire. The cause of a small event and a catastrophe is the same — only the state of the system determines the outcome.

Sandpiles: add grains one by one. The pile self-organizes to a critical slope where avalanches follow a power law.

forest cover 0.00

last fire 0

largest fire 0

growth

Preferential Attachment

In networks, new nodes connect to already-popular nodes. The rich get richer.

This “snowball effect” creates power-law distributions where a few hubs (Google, YouTube) dominate the entire network.

The same dynamic applies to AI labs, capital flows, and talent concentration.

nodes 0

top degree 0

hub share 0%

pace

Distribution Comparison

Feature	Normal	Power Law
Randomness	Additive	Multiplicative
Scale	Has inherent scale	Scale-free (fractal)
Outliers	Mathematically rare	Dominate the average
Examples	Height, IQ	Earthquakes, wealth, citations
Strategy	Be consistent	Be persistent, take many bets

How Venture Capital Thinks

VCs aren’t playing a normal-distribution game. They’re playing St. Petersburg.

One mega-winner pays for 99 failures. The rational strategy is to fund everything that could be huge, regardless of individual risk.

This is the same logic as “if I don’t do it, someone else will” — but now with multiplicative stakes.

The prisoner’s dilemma meets the power law.

The VC Power Law Curve A few winners produce most of the returns

>10x 5-10x 2-5x 1-2x <1x

Venture investments by return bucket

Deals done

Cost of deals

60%

Share of total returns

Schematic after Horsley Bridge / a16z

Part III: Automation Velocity

What happens when game theory and power-law incentives collide in AI.

Alignment between Capital and Labor

scenario peak acceleration

a_{peak}

(pp/yr) = time scale

σ

(yr) = institutional catch-up

k

(pp/yr²) =

valley length — peak gap — integrated cost —

Safety at Equilibrium

As competition intensifies, safety investment drops.

N

(teams) = 5

e

(enmity) = 0.3

Network Topology Matters

Limited interactions promote cooperation.

When everyone competes with everyone, cooperation collapses.

Network structure determines whether norms can survive.

Technological Folie à Deux

High-frequency local feedback can overpower low-frequency grounding.

When every agent forms a private dyad, context fragments: the population loses shared reality while coordinated players retain an advantage.

lambda = 1.20

shared component 100%

clusters 1

Thank You

Discussion, questions, and feedback are welcome!