Pull request comments used to be the slowest, most annoying, and most underrated training program in software engineering. A senior would leave a note about a race condition. A staff engineer would ask why this service owned that data. A tech lead would push back on a naming choice and force a thirty-message thread about domain boundaries. Nobody enjoyed it. Everybody learned from it. Then agentic coding tools showed up, started generating diffs faster than humans could read them, and the ritual quietly collapsed.
Ankit Jain, cofounder of Aviator and the engineer behind The Hangar DevEx community, has been tracking what happens next. His argument, laid out in a recent piece for CIO, is blunt: the code review as a knowledge transfer mechanism is effectively dead, and most engineering organizations have not noticed yet because their dashboards still show green (CIO). Throughput is up. Cycle time is down. Senior engineers are reviewing fewer lines per merge and signing off faster. That looks like a win until you ask who still understands the system.
The hidden ritual that PR comments actually served
Adrienne Braganza Tacke wrote a whole book called Looks Good to Me arguing that code review was never really about catching bugs. Linters and tests catch bugs. Code review was about socializing context: why a module exists, which invariants matter, which past incident shaped the current shape of an interface. The comments were the curriculum. The diff was just the prompt.
When the author of a PR is a human who struggled for two days to make the change work, a reviewer's question lands somewhere. The author has a model in their head, the reviewer challenges it, and the model updates. When the author is an agent that produced 800 lines in 90 seconds and the human submitter has only skimmed it, the reviewer's question lands nowhere. There is no mental model to update on either side, because nobody built one in the first place. The PR thread becomes a polite ceremony around code that no human in the loop fully owns.
Cognitive debt is the new kind of tech debt
Margaret-Anne Storey at the University of Victoria has been using the phrase cognitive debt for the gap that opens up when developers ship code they did not have to reason through. It compounds the same way tech debt does, except it lives in people instead of repositories, which makes it much harder to pay down. You cannot refactor a team's missing intuition about why the payments service retries on a 409.
An Anthropic study of 52 engineers using AI assistants put numbers on the effect. The AI-assisted group scored 17 percent lower on comprehension questions about code they had just shipped than the control group did about code they had written themselves. They were faster. They were also measurably less able to explain what the code did, why it was structured that way, or how it would fail. Multiply that by a few quarters of staffing changes and you get an organization where the people on call do not know the systems they are paged about.
Decisions over diffs: moving the checkpoint upstream
The teams adapting fastest are not trying to revive line-by-line review. They are moving the human checkpoint upstream, to the point where decisions get made instead of the point where code gets written. The reviewable artifact is the spec, the interface contract, the test plan, the migration strategy. Once those are agreed, the agent is allowed to produce the diff and the PR review becomes a verification step rather than a teaching moment.
This is closer to how hardware teams have always worked, and closer to how the broader work-management stack is starting to be rebuilt. As CIO noted in a parallel piece, most ticketing and review workflows were designed around human latency, with handoffs and queues that assume a person is the bottleneck at every step. When the producer is an agent, those queues stop making sense, and the meaningful checkpoints move to wherever a human still needs to commit to a direction.
Three patterns engineering orgs are trying
First, mandatory design docs for any change above a threshold of complexity, with the doc reviewed by humans before any agent is allowed to touch the repo. The doc becomes the artifact juniors learn from. The PR becomes a compliance check against the doc.
Second, paired agent sessions where a senior and a junior drive the agent together, talking through prompts and rejections out loud. It looks like pair programming from 2015 with one of the chairs replaced by a model. The knowledge transfer happens in the verbal layer around the tool, not in the diff the tool produces.
Third, comprehension audits. Random samples of merged PRs get pulled into a weekly session where the nominal author has to walk through the change without the agent open. Teams running this report uncomfortable first months and meaningful recovery of system understanding by month three. It also surfaces who is shipping code they cannot defend, which is information worth having before the on-call rotation surfaces it for you.
None of these are free. All of them slow down the headline throughput numbers that made leadership fall in love with agentic coding in the first place. That tension is the actual management problem, and it is going to get worse as the underlying models keep getting cheaper and faster, a dynamic we covered when Microsoft and Google opened a price war on AI coding models.
What we'd implement Monday to keep knowledge flowing through agentic teams
For engineering organizations above 30 people, three concrete moves are worth putting on the board this quarter.
One: institute a design-doc gate for any change touching a service boundary, a data model, or a public API. The doc is human-written, human-reviewed, and explicitly the place where context gets transferred. Agents are allowed downstream of the doc, not upstream. Measure adoption by the ratio of merged PRs that link to an approved doc, and make that ratio a staff-engineer-level metric, not a junior compliance task.
Two: run a weekly comprehension review. Pick three merged PRs at random, put the authors in a room, and have them whiteboard the change cold. Track who can and who cannot, but do not punish the failures in the first quarter. Use the data to decide where to invest in deliberate practice and where the agent is being asked to carry work the team should be doing itself. This also doubles as a hiring-bar signal that resists the kind of monoculture problems Lattice and others have flagged in AI-driven hiring pipelines.
Three: budget for AI usage at the team level, not the seat level, and publish the burn weekly. Cognitive debt grows fastest where AI consumption is invisible to the people whose systems are accumulating it. Make the spend legible and the conversation about whether a team is using the tool to learn faster or to avoid learning becomes a normal staff meeting topic instead of a year-end surprise.
The bill is already arriving
If you think the spend conversation is theoretical, Uber just made it concrete. The company capped employee AI spending this week after blowing through its annual budget in four months, according to TechCrunch. The interesting part is not the overrun. The interesting part is that nobody inside Uber could say, with confidence, what the company got for the money. Throughput numbers, sure. Shipped features, sure. A workforce that understands its own codebase better than it did in January, nobody was measuring. That is the shape of the next two years of engineering leadership: the orgs that survive agentic coding will be the ones that treated comprehension as a first-class metric while everyone else was still counting merged PRs.



