Who Reviewed the Reviewer?
This post is about that realization. About what happens when AI becomes the default reviewer, and starts learning from its own reflections. We’re not just debugging code anymore. We’re debugging the system that teaches itself how to review.
I didn’t set out to write this.
I was just trying to fix a bug.
It was a subtle one — buried in an AI-generated PR that looked perfectly reasonable on the surface. But when I traced it, I realized the flaw had been introduced by the same AI that was now helping me review it.
That’s when the loop hit me.
I was writing code.
The AI was generating code.
Then another AI was reviewing that code.
And then I was reviewing the reviewer.
Not just for quality — for judgment, for context, for pattern recognition.
It felt like a hall of mirrors.
Reviewers reviewing reflections of themselves.
Rules enforced by patterns the system had learned… from itself.
A bug introduced by a model, reviewed by another model, approved by a pipeline.
Then me, standing at the end of it — wondering where the intent had gone.
And that’s when the question landed:
What happens when code review is no longer a human ritual — but an autonomous layer of the machine?
We Used to Call This a Conversation
Code review used to feel like a conversation.
Sometimes painful, sometimes enlightening.
But undeniably human.
It’s where you picked up tricks. Learned unwritten norms. Saw how someone else solved a problem you didn't even know was hard yet.
I’ve been watching that ritual shift. Not all at once — but piece by piece. First it was the linters. Then the test coverage gates. Now the AI reviewer chimes in before the human does — or sometimes, instead of one entirely.
And look — I get it.
Code reviews were slow. Inconsistent. Burdened by personality, bandwidth, and Friday afternoon energy.
But they were also where the team became a team.
Now that function is being automated — at speed, at scale, without blinking.
And I’m not sure we’re ready for what that changes.
What I’m Seeing (And Maybe You Are Too)
We’re deep into what TechRadar calls “Gen‑3” AI tooling — where agents don’t just autocomplete, they review, score, explain, and even rewrite.
I’ve watched GitHub Copilot fix a bug it introduced. I’ve asked one AI to double-check another’s PR. I’ve had systems explain themselves in human tone, with confidence levels, citations, and next-step links.
And that’s the moment I had to stop and ask:
Are we still reviewing the code?
Or are we reviewing the review system itself?
What We Gain Feels Real
- Speed: No more waiting for someone to be “back online.”
- Coverage: Every logic branch, every pattern — scanned without fatigue.
- Consistency: The same rules applied, every time, without negotiation.
What We’re Losing Feels Familiar
- Context Collapse: The AI flags a pattern without understanding the fragile legacy database it touches.
- Tribal Amnesia: There’s no sidebar. No “we tried that and it broke prod.” Just comments in a vacuum.
- Reviewer Fatigue: When the bot leaves 43 notes and only two matter, you start ignoring all of them.
- Trust Drift: You fix the bug the AI introduced… by training the AI how to catch it next time. And wonder if this is what teaching means now.
We're trading human wisdom, with all its flaws, for machine velocity, with all its blind spots.
The Dilution Question
Here’s what really keeps me thinking:
Right now, the AI systems reviewing our code are still grounded in data written by humans. They’ve learned from our styles, our patterns, our bugs, and our best practices.
But what happens when that ratio flips?
When the majority of code — and the reviews of that code — come from AI?
When the machine starts learning primarily from its own output?
Every time an AI writes code, and another AI reviews it, and that result becomes part of a future model’s training set… the signal gets blurrier. The context thins. The intent dissolves.
Eventually, you’re not learning from real software decisions anymore.
You’re learning from an echo of an echo.
🔎 Sidebar: What the Industry Calls This
Researchers have a name for what I’m describing here:Model Collapse: When generative systems are trained too heavily on synthetic data, they degrade — losing clarity, creativity, and groundedness.Synthetic Data Drift: When AI-generated output becomes the norm, and the training loop amplifies its own noise.
In code, that means patterns might start to reinforce themselves without ever being questioned. Rules harden into dogma. Bugs get re-learned. And no one remembers where the decision came from.
(See: The Curse of Recursion, arXiv:2305.17493)
So maybe the long-term question isn’t just who reviews the reviewer, but:
What happens when there’s no human reviewer left in the loop?
Do we shift toward rigid specs and formal verification to restore signal?
Do we accept statistical software as “good enough”?
Or do we draw a line — not to stop the system, but to remind it where it came from?
A New Job Description Is Taking Shape
I’m not a reviewer anymore.
I’m a reviewer-of-reviewers.
Sometimes I’m debugging the bug.
Sometimes I’m debugging the process that let the bug through.
Sometimes I’m debugging the AI that reviewed the process that let the bug through.
And weirdly — it’s starting to make sense.
The layers aren’t going away.
They’re compounding.
And the only way forward is to accept that the system is now part of the team.
📡 This Is What I Think Happens Next:
- AI reviewers will escalate comments to humans when confidence is low
- “Reviewer dashboards” will track who — or what — reviewed each change
- We’ll have governance reviews of the AI reviewer itself
- PRs will include a trace of not just what changed, but who made the decision — and whether that “who” was a person or a model
Final Thought
I used to ask: "Who wrote this?"
Now I ask:
“Who reviewed the reviewer?”
And who trained the reviewer who reviewed the reviewer?”
The recursion isn’t the problem.
It’s the loss of original input.
The dilution of human intent under layers of AI-generated assumptions.
So maybe the real work isn’t just writing code.
Maybe it’s holding the line on what clarity looks like.
Maybe we’re not just debugging software anymore.
We’re debugging the lineage of logic — the chain of reviewers, reflections, and rules that shape what gets shipped.
Because if we stop seeding the system with real thinking,
all we’ll get back is the system, staring into a mirror.
A hall of reflections.
Clear. Precise. And hollow.