Defining the paradox#

After introducing tools like Claude Code, Copilot, Cursor — commits per person grow 30–60%, but Lead Time for Changes often stays flat, and Change Failure Rate ticks up a few percentage points.

Three mechanisms#

1. Code review bottlenecks#

The same team that used to produce 8 PRs a day now produces 18. Reviewers still work at the old pace — the queue grows linearly, while average review wait time grows exponentially.

2. Test surface explosion#

Auto-generated code runs, but hasn’t necessarily passed the author’s local mental verification. CI becomes the only filter; if it’s slow or flaky, defects slip through.

3. Ownership decay#

When the code is “my AI’s, not mine”, engineers are less willing to stay with it in production. MTTR grows because nobody knows why it ever worked.

Counter-measures#

AI in the review loop — a bot flags risky changes, suggests tests, blocks commits without assertions.
Smaller batch size — a hard PR line limit (e.g. 400) forces decomposition.
Failure budget — CFR above 10% halts merges for the day.
Authorship rotation — every AI-generated PR has a “human reviewer of record” accountable for 14 days.

What to measure#

If you’re rolling out an AI assistant, instrument these four metrics simultaneously:

PRs / engineer / week (adoption proxy),
Lead Time for Changes (bottleneck proxy),
CFR (quality proxy),
MTTR (ownership proxy).

Without that quartet, the “AI productivity” conversation is anecdotal — and the board doesn’t buy anecdotes.