The problem with asking one model

Ask a large language model "should we migrate from a monolith to microservices?" and it will almost certainly say yes, then hedge. The model is pattern-matching on the framing of the question. "Should we X?" implies the asker is considering X, and the model obliges.

This is not a hallucination problem. It's a sycophancy problem. The model is optimised to produce responses that feel useful and agreeable. Asking it to steelman opposing positions in the same response produces a weaker version of both -- the model is hedging, not arguing.

Structured deliberation fixes this by separating the roles. Three agents, three distinct system prompts, three different jobs:

The three-round structure

The deliberation runs in three rounds. Rounds 1 and 2 are parallel -- Advocate and Critic run concurrently using Python's ThreadPoolExecutor. Round 3 is sequential -- the Synthesiser only runs after all four debate arguments are available.

Round 1 (parallel):
  Advocate opens -- argues FOR the proposition
  Critic opens   -- argues AGAINST the proposition

Round 2 (parallel):
  Advocate rebuts the Critic's Round 1 argument
  Critic rebuts the Advocate's Round 1 argument

Round 3 (sequential):
  Synthesiser reads all four arguments
  Produces recommendation + confidence + trade-offs

The cross-rebuttal in Round 2 is the key insight. The Advocate's Round 2 prompt explicitly includes the Critic's Round 1 argument: "The Critic made the following arguments. Respond to their strongest points. Where are they right? Where are they wrong? What did they miss?" The Critic gets the mirror image.

This surfaces the strongest objections on both sides. A Critic that has to respond to "the benefits outweigh the costs because X" produces more targeted counterarguments than a Critic operating in isolation.

Why model routing matters

Advocate and Critic use a small, fast model. They're generating directional arguments: persuasive, specific, but not requiring deep synthesis. A small model is fast and cheap: around $0.001 per Advocate call at typical token counts.

Synthesiser uses a mid-tier model. It's reading 3,000 to 5,000 tokens of debate and producing a nuanced recommendation. That requires more capacity. The mid-tier model costs roughly 10x more per token, but there's only one Synthesiser call per deliberation.

Total cost per deliberation at typical token counts:

At 20 deliberations per day -- generous for a portfolio site -- that's $1/day. A kill switch in SSM Parameter Store lets the demo be paused if costs spike.

The architecture

POST /                   start a deliberation
GET  /session/{id}        fetch a completed session
GET  /recent              return 5 most recent sessions

Lambda handler
  → orchestrator.run_deliberation(question)
       Round 1: ThreadPoolExecutor(advocate.generate, critic.generate)
         Each writes to DynamoDB immediately on completion
       Round 2: ThreadPoolExecutor(advocate.generate, critic.generate)
         Advocate gets critic_round_1 in context
         Critic gets advocate_round_1 in context
       Round 3: synthesiser.generate(all four arguments)
         Sonnet reads all four, returns structured recommendation

DynamoDB: deliberation-sessions table
  SESSION#{id}  META     -- question, status, cost, timestamps
  SESSION#{id}  ROUND#1#AGENT#advocate
  SESSION#{id}  ROUND#1#AGENT#critic
  SESSION#{id}  ROUND#2#AGENT#advocate
  SESSION#{id}  ROUND#2#AGENT#critic
  SESSION#{id}  ROUND#3#AGENT#synthesiser
  RATELIMIT#{ip}  {ts}  -- 1-hour TTL, max 5/IP/hour
  DAILY#usage    {date} -- atomic counter, daily limit

The Lambda is exposed via Lambda Function URL -- no API Gateway. The deliberation Lambda runs synchronously: one POST call, one response containing all five arguments. The frontend populates the three columns with a small visual delay between rounds to make the debate structure legible.

What the Synthesiser prompt asks for

The Synthesiser receives a structured prompt with all four arguments labelled by round and agent. It's asked to produce six things:

  1. Recommendation (1-2 sentences)
  2. Confidence: low / medium / high (and why)
  3. Strongest arguments FOR
  4. Strongest arguments AGAINST
  5. Key trade-offs
  6. Conditions that would change the recommendation

The conditions section is the most useful part. A recommendation of "defer microservices migration until the team exceeds 20 engineers" is only useful if it also tells you what changes when the team hits 20. The structured output forces this.

What doesn't work

The deliberation is only as good as the question. Vague questions produce vague arguments. "Should we use AI?" is too broad to deliberate productively. "Should a 50-person e-commerce company adopt AI coding assistants for a team that has no existing ML infrastructure?" is much better.

The Advocate and Critic are role-locked but they're still language models. On genuinely one-sided questions -- "should we write code with no tests?" -- the Advocate will struggle to make a compelling case and both agents know it. The quality of the debate tracks the quality of the question.

Parallelisation adds latency variance. Round 1 completes when the slower of the two agents finishes. On a bad network day, one Haiku call can take 8 seconds. The total deliberation time ranges from 30 to 90 seconds depending on model latency.

Advocate Critic Synthesiser Question Round 1: Opening argues FOR · Haiku Round 1: Opening argues AGAINST · Haiku Round 2: Rebuttal reads Critic R1 · Haiku Round 2: Rebuttal reads Advocate R1 · Haiku Round 3: Synthesiser reads all 4 arguments Sonnet · structured output Round 1 Round 2 Round 3
Three rounds, three agents. The synthesis is better than any single opinion.
Advocate R1 (Haiku) Critic R1 (Haiku) Advocate R2 (Haiku) Critic R2 (Haiku) Synthesiser (Sonnet) ~$0.001 ~$0.001 ~$0.002 ~$0.002 ~$0.04 Total: ~$0.05 per deliberation Synthesiser is 80% of the cost. It's also 80% of the value.
Model routing in action. Cheap models argue, the expensive model decides.

The patterns in use

This demo implements ten of the 21 Gulli agentic patterns:

None of these patterns are exotic. They're composable primitives. The deliberation engine is what you get when you stack ten of them on top of a question and a cost budget.

Try it: The Deliberation Engine is live. Enter any technical proposition. Advocate and Critic will argue both sides. Synthesiser will tell you what to do -- and what would change the answer.
Reference: Antonio Gulli, Agentic Design Patterns with Claude (Anthropic, 2025). Patterns demonstrated: 1, 3, 4, 6, 7, 8, 15, 16, 17, 20.
Related:
The 21 agentic patterns mapped to ticketyboo.dev Planning and parallelisation Multi-model reasoning

Scan any public GitHub repo for dependency risk, secrets, and code quality issues — free, no account needed.

Scan a repo free See governance agents →