PR / Diff Security Reviewer OWASP / IAM / auth patterns Sonnet Architecture Reviewer coupling / separation / design Sonnet Style Reviewer naming / formatting / docs Haiku Reflection Pass each reviewer reads the others' findings upgrade / downgrade / withdraw own findings Unified Review (severity-sorted)
Three specialists, one reflection pass. The cross-examination catches what individuals miss.

Why fan-out alone isn't enough

The naive version of multi-agent code review is simple: run three reviewers in parallel, collect their findings, return the union. This is Pattern 3 (Parallelisation) applied mechanically.

It's better than a single agent, but it misses something. A security reviewer focused on injection and auth will look at a function differently from an architecture reviewer looking at coupling and separation of concerns. Both might notice the same vulnerable pattern -- one will call it an injection risk, the other will call it missing validation. Without a cross-examination pass, you get duplicate findings at different severity levels and no way to know which assessment is right.

The reflection pass solves this. Each reviewer sees the other two's findings and is asked: do any of these change your assessment? Are there findings you missed? Would you upgrade or downgrade any of your own findings in light of what your colleagues found?

The five-step pipeline

Step 1: Parse input (PR URL or raw diff)

Step 2: Fan-out (parallel)
  Security reviewer    (Sonnet) -- injection, auth, secrets, crypto
  Architecture reviewer (Sonnet) -- SOLID, coupling, error handling, API design
  Style reviewer       (Haiku)  -- naming, docs, complexity, dead code

Step 3: Collect findings
  Graceful degradation: if one reviewer fails, the other two continue

Step 4: Reflection pass (parallel)
  Each reviewer reads the other two's findings
  Returns: list of amendments (upgrade / downgrade / retract / add)

Step 5: Synthesis (Sonnet)
  Deduplicate overlapping findings
  Apply amendments
  Order by severity
  Produce unified verdict: approve / request_changes / needs_discussion

Steps 2 and 4 use ThreadPoolExecutor with max_workers=3. Step 3 is a simple collection pass with error checking. Step 5 is sequential -- the Synthesiser needs all amended findings before it can deduplicate.

Graceful degradation

One of the patterns the spec requires is graceful degradation if a reviewer fails (Pattern 12: Exception Handling). In practice this means:

This is a real design choice. The alternative is to fail the entire pipeline if any single reviewer fails. That's simpler but it means a Sonnet timeout on the security reviewer throws away perfectly good architecture and style findings. The graceful degradation approach delivers partial value instead of nothing.

The finding schema

All three reviewers are given the same output schema. Structured output is Pattern 18 (Guardrails) -- the schema is the guardrail that ensures findings are comparable:

{
  "category":   "security | architecture | style",
  "severity":   "critical | high | medium | low | info",
  "file":       "app/auth.py",
  "line":       13,
  "title":      "SQL injection in login function",
  "description": "User input concatenated directly into SQL query.",
  "suggestion": "Use parameterised queries via %s placeholders.",
  "confidence": 0.95
}

The confidence field is the most useful part. A security reviewer with 0.65 confidence on a potential SSRF should be treated differently from one with 0.95 confidence on a clear SQL injection. The Synthesiser uses confidence to decide whether borderline findings make the unified review, and the frontend can filter below a threshold if needed.

Before reflection After reflection Security • SQL injection risk (HIGH) • Missing auth check (MED) Architecture • Missing input validation (MED) • Tight coupling (LOW) Style • Redundant null check (LOW) no overlap awareness Security (amended) • SQL injection risk (HIGH) • Missing auth check (MED) + Input validation gap (HIGH) ↑ Architecture • Missing input validation (MED) • Tight coupling (LOW) Style (amended) • Redundant null check withdrawn: intentional guard
Reflection changes the output. Findings get upgraded, downgraded, or withdrawn.

What the reflection prompt looks like

Each reviewer in the reflection pass gets a prompt structured like this:

YOUR FINDINGS:
[...their original findings as JSON...]

COLLEAGUE 1 (security reviewer):
[...security findings...]

COLLEAGUE 2 (style reviewer):
[...style findings...]

Review these cross-findings. Respond with a JSON array of AMENDMENTS.
An amendment has: action (add|upgrade|downgrade|retract),
original_title, updated_severity, reason, new_finding.

The amendment types are worth explaining:

The Synthesiser sees both the original findings and the amendments. It applies them before deduplication. A finding upgraded by two different reviewers after reflection is a strong signal that the Synthesiser should include it prominently.

Model routing and cost

The cost model is explicit in the config:

Style findings are low-stakes. A misnamed variable is not worth $0.02. Haiku is fast and cheap and produces perfectly adequate style analysis. Security and architecture findings can block a deployment or introduce a production incident. Sonnet's extra capacity is worth the cost for those categories.

At 20 reviews per day (the global daily limit), total LLM spend is about $1.80/day. The SSM kill switch at 20/day ensures this doesn't drift.

What the patterns are actually doing

This demo implements seven of the 21 Gulli patterns:

Try it: The Code Review Pipeline is live. Paste a diff or enter a public GitHub PR URL. Three specialists will review it, cross-examine each other, and produce a unified verdict.
Reference: Antonio Gulli, Agentic Design Patterns with Claude (Anthropic, 2025). Patterns demonstrated: 3, 4, 7, 12, 15, 16, 18.
Related:
Structured deliberation: when agents argue, you get better answers The 21 agentic patterns mapped to ticketyboo.dev Planning and parallelisation

ticketyboo brings governed AI development to your pull request workflow. 5 governance runs free, one-time welcome grant. No card required.

View pricing Start free →