Back to blog
8 min read

How Agent Driven Coding is Breaking the PR Review Process

AI coding agents are accelerating development velocity to unprecedented levels, but the traditional PR review process wasn't designed for this pace. Here's what's breaking and how teams can adapt.

LH

Long Horizon Team

Engineering

The rise of AI coding agents has fundamentally changed how software gets written. Tools like Claude Code, Cursor, and GitHub Copilot are enabling developers to ship features at unprecedented speeds. But there's a growing problem that teams are starting to notice: the traditional pull request review process wasn't designed for this velocity.

The Velocity Gap

When a developer can generate hundreds of lines of working code in minutes, the bottleneck shifts dramatically. What used to be a balanced workflow—write code, submit PR, get review, iterate—has become lopsided. The code generation phase has been compressed from hours to minutes, but the review phase remains stubbornly human-paced.

Consider a typical scenario: a developer uses an AI agent to implement a new feature. The agent writes the component, adds the API integration, handles error states, and even writes some tests. Total time: 15 minutes. The PR sits in the review queue for two days.

What's Actually Breaking

The traditional PR review process assumes certain things that no longer hold true in an agent-driven workflow:

  • The author deeply understands every line. When code is AI-generated, the developer may have a high-level understanding but hasn't manually reasoned through every implementation detail.
  • Code volume is manageable. AI agents can produce large, coherent changesets that would take humans days to write. Reviewing 2,000 lines of AI-generated code is cognitively different from reviewing 2,000 lines a human wrote over a week.
  • The review catches what testing misses. With AI-generated code, subtle bugs can hide in plain sight. The code looks reasonable, follows patterns, and passes linting—but may have edge cases the reviewer won't catch by reading alone.

The Trust Problem

There's an uncomfortable truth emerging: many PR approvals have become rubber stamps. Reviewers, faced with large AI-generated changesets and pressure to maintain velocity, often do a cursory review and approve. They trust that the AI probably got it right, and that tests will catch any issues.

This isn't laziness—it's a rational response to an impossible situation. You can't deeply review code at the rate it's being produced. Something has to give.

Evidence Over Inspection

The solution isn't to slow down code generation or to hire more reviewers. It's to change what we're reviewing. Instead of asking "does this code look correct?", we should be asking "does this code demonstrably work?"

This is where agentic testing becomes essential. When the same AI agent that writes the code also plans and executes comprehensive tests, reviewers get something they've never had before: evidence.

  • Screenshots showing the feature actually renders correctly
  • Execution logs proving the happy path works
  • Network traces showing API integrations behave as expected
  • Error state coverage demonstrating graceful failure handling

A New Review Workflow

Imagine a PR review where instead of reading through hundreds of lines of code, you start by reviewing an execution log of the feature being used. You see the test plan that was executed, the edge cases that were covered, and the specific scenarios that passed.

Now your review becomes focused: Are the right things being tested? Are there scenarios missing? Does the execution evidence match what the feature should do? This is a fundamentally different—and more valuable—use of reviewer time.

The Path Forward

The PR review process isn't going away, but it needs to evolve. Teams that adapt will find they can maintain quality while embracing the velocity that AI agents enable. Those that don't will either slow down (losing competitive advantage) or rubber-stamp (accumulating technical debt and bugs).

The key insight is this: in an AI-driven development world, the artifact being reviewed should shift from "code that looks correct" to "evidence that the feature works." This isn't about trusting AI blindly—it's about verifying AI output through execution rather than inspection.

At Long Horizon, we're building tools to make this workflow practical. Our agentic testing platform lets your coding agent plan, write, and execute tests, producing shareable execution reports that give reviewers the evidence they need to approve with confidence.

The future of code review isn't reading more code—it's seeing more proof.

Read More