Building on a budget with humans in the loop

Summary

Who it's for Engineers and architects building agentic systems under real resource constraints, and anyone thinking about where to draw the autonomy boundary in AI-assisted workflows.

Key observations

The AWS Free Tier constraint forced better architecture. Lambda over Lightsail, DynamoDB over RDS, SSM over Secrets Manager, on-demand over provisioned. Every constrained choice is also a simpler choice.
Resource-aware optimization is not just about money. It applies to any scarce resource: API rate limits, LLM token budgets, human review time. The pattern is the same: make the resource constraint explicit, then design around it.
Human-in-the-Loop exists on a spectrum from fully supervised (approve every action) to fully autonomous (approve nothing). Where a workflow sits on that spectrum is a deliberate architectural decision, not a default. Most systems accidentally land at the wrong point.
The data-draft feature flag is the simplest possible HITL implementation. One attribute on a body tag. A human removes it before publish. Costing $0 to implement and preventing every premature-publish incident.

~9 min read

Resource-Aware Optimization (pattern 16 in the Gulli taxonomy) and Human-in-the-Loop (pattern 13) address opposite ends of the agentic system design space. Resource awareness asks: how do you do more with less? Human-in-the-loop asks: where should a human be in the loop, and at what granularity? They are not obviously related patterns, but they share an underlying principle: constraints, whether financial or procedural, produce better systems when they are treated as first-class design inputs rather than afterthoughts.

This article covers both through the lens of ticketyboo.dev's actual constraints: an AWS Free Tier budget, a one-person operation, and a commitment to not publishing anything without human review.

The Free Tier constraint as architectural clarity

Every architectural decision on ticketyboo.dev is bounded by the AWS Free Tier. 1 million Lambda requests per month. 25GB DynamoDB storage. 5GB S3. 10 custom CloudWatch metrics. The constraint is real and non-negotiable.

The interesting thing about hard constraints is what they eliminate. When you cannot use RDS, you use DynamoDB. When you cannot use Secrets Manager ($0.40/secret/month), you use SSM Parameter Store SecureString (free). When you cannot use NAT Gateways ($32+/month), your Lambda functions go in the default VPC or outside it entirely. When you cannot provision capacity, you use on-demand billing.

In each case, the constrained choice is also the simpler choice. DynamoDB on-demand requires no capacity planning. SSM Parameter Store requires no monthly budget allocation. Lambda outside a VPC has no networking complexity. The Free Tier constraint, paradoxically, produces an architecture with fewer moving parts than an unconstrained design would.

Free Tier constraint mapped to service decisions. The route 53 hosted zone is the only paid exception ($0.50/month) because ALIAS records at the apex domain require it.

Resource-aware optimization beyond money

The Free Tier framing makes resource-aware optimization concrete. But the pattern applies to any scarce resource. The same design discipline that asks "does this Lambda need 512MB or will 128MB do?" also asks:

Token budget: does this prompt need to include the full file, or will a relevant excerpt do? Sending the minimum context that answers the question is not just economical. It reduces noise and improves output quality.
Model tier: does this task need the heaviest model, or will a lighter one produce acceptable results? The fixer-bot routes Simple tier tasks to a cheaper, faster model. Most simple tasks do not benefit from extra capability.
API rate limits: does this workflow need to call the GitHub API on every run, or can results be cached? The scanner caches rate-limited responses. Cache misses are paid in API calls. Cache hits are free.
Human review time: does every change need human review, or only changes above a risk threshold? Routing all changes to human review squanders the reviewer's time on routine work and trains them to approve without reading.

The common thread: make the resource constraint explicit. Once you have named the constraint, you can design around it deliberately. An unnamed constraint gets violated accidentally.

Model tier routing: resource-awareness in the fixer-bot

The fixer-bot's three-tier classifier is a resource-aware routing system. The routing decision is: what is the minimum capability level this task requires? Not: which model is available? Not: which model produces the best output? The minimum capability question is the resource-aware question.

# Resource-aware model selection (fixer-bot, simplified)
MODEL_TIERS = {
    "simple":  {"model": "claude-3-haiku",  "max_tokens": 2000, "review": False},
    "medium":  {"model": "claude-3-sonnet", "max_tokens": 4000, "review": False},
    "complex": {"model": "claude-sonnet-4", "max_tokens": 8000, "review": True},
}

def get_execution_config(tier: str) -> dict:
    """Return model, token budget, and review requirement for tier."""
    config = MODEL_TIERS.get(tier, MODEL_TIERS["medium"])
    return {
        "model":       config["model"],
        "max_tokens":  config["max_tokens"],
        "human_review": config["review"],
        "cost_weight": {"simple": 1, "medium": 6, "complex": 20}.get(tier, 6),
    }

The cost_weight field is the resource accounting mechanism. A Simple task costs 1 unit. A Complex task costs 20. When the monthly token budget is tracking high, the classifier can raise the threshold for Medium-to-Complex routing without changing the tier definitions. Resource awareness is baked into the execution path, not bolted on after the fact.

Human-in-the-Loop: the autonomy spectrum

The Human-in-the-Loop pattern is often framed as a safety mechanism for high-risk actions. That framing is incomplete. HITL is an architectural decision about where to set the autonomy boundary for every workflow in the system. The right boundary is not always "approve everything" and it is not always "approve nothing."

The spectrum runs from fully supervised (a human approves every action before it executes) to fully autonomous (the agent executes without any human checkpoint). Neither extreme is universally correct. A fully supervised agent that generates 50 actions per day for human approval will have those approvals rubber-stamped within a week, defeating the purpose. A fully autonomous agent operating on a production system with no checkpoints is a liability.

Ticketyboo.dev workflows plotted on the autonomy spectrum. Publishing content sits at the supervised end. Scanner analysis and DynamoDB writes sit at the autonomous end. The position is a deliberate choice for each workflow, not a default.

Where each ticketyboo workflow sits on the spectrum

Publishing content (supervised): every new page has data-draft="true" on its body tag. It does not appear in navigation. It does not appear in the sitemap. A human reads it, checks the quality rules, and removes the flag before it goes live. The blast radius of a bad publish is low in absolute terms (it is a static site), but the autonomy boundary is set at supervised because the content represents the platform's public voice.

Security reviews (supervised): when a Gatekeep check identifies a high-severity finding, it routes to a human review queue. The agent does not approve its own security findings. A human makes the final call. This reflects the asymmetry in security decisions: false positives cost time, false negatives cost credibility. Human review at the high-severity threshold is the right trade-off.

Sprint planning (human reads before execution): an AI agent produces the sprint plan. A human reads it and decides whether to proceed. The agent does not start building until the human has approved the plan. This is not approval of individual actions. It is approval of the strategy before any implementation begins. One review, covering many subsequent autonomous actions.

Fixer-bot (tier-dependent): Simple tier tasks auto-merge after passing automated tests. Medium tier tasks require test passage plus code review. Complex tier tasks require human review before merge. The autonomy level scales with the risk and complexity of the change.

Ops agents (autonomous): the four ops agents run on schedules, check metrics, and write results to DynamoDB. No human approves these writes. The autonomy is justified by the blast radius: a wrong CloudWatch metric check produces a stale dashboard reading. It does not change production infrastructure. The risk is bounded, so the autonomy is justified.

The data-draft flag: HITL for $0

The simplest Human-in-the-Loop implementation on this platform cost nothing to build. A single HTML attribute on every new page's body tag:

<!-- Every new page starts as draft -->
<body data-draft="true">

<!-- The site's CSS uses this attribute to hide draft pages from nav -->
[data-draft="true"] .site-nav a[href*="this-page"] { display: none; }

<!-- Publishing = removing the attribute. One line change. One commit. -->
<body>

The flag is visible in the HTML. It requires no database. It requires no state management. It requires no workflow tooling. The human check is: read the page, confirm it meets quality standards, delete the attribute, commit. That sequence is the entire publish workflow.

The pattern generalises. Any agent output that has a review stage before it takes effect can be modelled as a draft flag on the output artifact. A Terraform plan before apply. A pull request before merge. A spec document before implementation begins. These are all HITL checkpoints implemented through the artifact state, not through process tooling.

When autonomy level is a bug, not a feature

The most common autonomy failure mode is not agents acting too autonomously. It is the opposite: systems designed as fully supervised because no one made a deliberate choice about the boundary. Every action routes to a human queue. The queue grows. Reviewers stop reading. Approvals become reflexive. The HITL check exists in process but not in practice.

The resource-aware framing helps here: human review time is a scarce resource. The goal is to allocate it to decisions that actually benefit from human judgment. A Lambda function memory change from 128MB to 256MB for a routine optimisation does not benefit from a CTO review. A new external API integration requires one. Setting the autonomy boundary correctly maximises the value of every human review in the queue.

The two patterns together produce a coherent design principle: use the minimum resources necessary to accomplish each task (resource-aware optimization), and apply human oversight at the decision points where human judgment actually adds value (human-in-the-loop). Both principles require naming your constraints and making deliberate choices. Both produce better systems than unconstrained defaults.

The autonomy audit: list every agentic workflow in your system. For each one, write down where the current autonomy boundary is. Then write down where it should be and why. The gaps between those two columns are your Human-in-the-Loop design backlog.

Pattern taxonomy from Antonio Gulli, "Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems" (Springer, 2025). All examples are original implementations from the ticketyboo.dev platform. No book content reproduced verbatim.

If the articles or tools have been useful, a coffee helps keep things running.

☕ buy me a coffee

ticketyboo runs five governance agents on every pull request — Security, Cost, SRE, CTO, and Dependency. Evidence signed, audit trail complete.

See how it works 5 free runs, one-time →

Building on a budget with humans in the loop

The Free Tier constraint as architectural clarity

Resource-aware optimization beyond money

Model tier routing: resource-awareness in the fixer-bot

Human-in-the-Loop: the autonomy spectrum

Where each ticketyboo workflow sits on the spectrum

The data-draft flag: HITL for $0

When autonomy level is a bug, not a feature

More from the agentic patterns series