Defining the paradox#
After introducing tools like Claude Code, Copilot, Cursor — commits per person grow 30–60%, but Lead Time for Changes often stays flat, and Change Failure Rate ticks up a few percentage points.
Three mechanisms#
1. Code review bottlenecks#
The same team that used to produce 8 PRs a day now produces 18. Reviewers still work at the old pace — the queue grows linearly, while average review wait time grows exponentially.
2. Test surface explosion#
Auto-generated code runs, but hasn’t necessarily passed the author’s local mental verification. CI becomes the only filter; if it’s slow or flaky, defects slip through.
3. Ownership decay#
When the code is “my AI’s, not mine”, engineers are less willing to stay with it in production. MTTR grows because nobody knows why it ever worked.
Counter-measures#
- AI in the review loop — a bot flags risky changes, suggests tests, blocks commits without assertions.
- Smaller batch size — a hard PR line limit (e.g. 400) forces decomposition.
- Failure budget — CFR above 10% halts merges for the day.
- Authorship rotation — every AI-generated PR has a “human reviewer of record” accountable for 14 days.
What to measure#
If you’re rolling out an AI assistant, instrument these four metrics simultaneously:
- PRs / engineer / week (adoption proxy),
- Lead Time for Changes (bottleneck proxy),
- CFR (quality proxy),
- MTTR (ownership proxy).
Without that quartet, the “AI productivity” conversation is anecdotal — and the board doesn’t buy anecdotes.