The ticketyboo.dev platform was substantially built by AI agents working within a human-defined governance framework. The Terraform modules, the Lambda functions, the scanner logic, and the articles you're reading were all produced by AI agents acting on structured specifications — with humans reviewing, approving, and adjusting at key gates.
This is not a novelty. It's a repeatable engineering methodology. This article documents what I've learned about making agentic development work reliably, safely, and in a way that produces code you'd actually want to own.
What makes a development agent different from a code completion tool
AI code completion tools operate at the level of a single file or function. A development agent operates at the level of a task: it reads context across multiple files, plans a sequence of changes, executes them, and verifies the results.
The defining characteristic of an agent is the loop: Plan → Execute → Verify → Adjust. Each iteration of the loop produces a concrete artefact (a file, a command output, a test result) that informs the next iteration. Without this loop, you have autocomplete. With it, you have a collaborator that can take a task from specification to implementation.
The specification-first pattern
Agentic development works best when the task is fully specified before execution begins. Ambiguity at the specification level produces ambiguous code — but unlike human ambiguity, AI ambiguity can be confident and subtly wrong.
The specification pattern we use has three documents:
requirements.md
User-facing functional requirements. Written in plain language, numbered for traceability. Each requirement has a unique ID that appears in both the design document and the task list. If a requirement can't be traced to a task, it won't get implemented.
design.md
Technical design document. Architecture diagrams (mermaid), data models, interface definitions, key design decisions with rationale. This is the document that an agent reads to understand how to implement a requirement — not just what the output should be.
tasks.md
Atomic, ordered implementation tasks. Each task references the requirements it satisfies and the design sections it implements. Tasks are small enough that each one can be executed, reviewed, and approved independently. No task should take more than a few hundred lines of code.
Governing agent actions
The most important governance rule for agentic development: the agent can propose; only a human can approve. This applies at different granularities:
| Action type | Agent can | Human must |
|---|---|---|
| Write code | Generate and apply | Review diff before merge |
| Run tests | Execute and report | Interpret failures |
| Deploy to production | Prepare and propose | Approve and trigger |
| Create AWS resources | Write Terraform, run plan | Review plan, approve apply |
| Rotate secrets | Identify compromised credentials | Rotate and propagate |
Context management: the unsolved problem
The biggest practical challenge in agentic development is context degradation. As a task gets longer, older context gets truncated or deprioritised. An agent that was given the right constraints at the start of a task may violate them by the end because the constraints have fallen outside its effective context window.
Mitigations we use:
- Checkpointing: At the end of each task, the agent writes a summary of decisions made and constraints applied. This summary is prepended to the next task's context.
- Constraint files: Key rules (no npm, no NAT gateways, SSM not Secrets Manager) are in a
.clinerulesfile that is injected into every agent context automatically. - Small tasks: Tasks that take more than ~1000 lines of code are split. Smaller tasks fit within context more reliably.
- Explicit verification steps: Tasks include "verify X" steps that force the agent to re-check constraints, not just assume they were followed.
The coding standards contract
Agentic development produces consistent code only if the coding standards are explicit, machine-readable, and enforced by automated tools rather than post-hoc review. Human reviewers catching style violations in agent-generated code is a productivity waste — the linter should catch them.
For this project, the standards contract includes:
# Python standards (enforced by ruff + mypy)
- Type hints on ALL function signatures (mypy --strict)
- Docstrings on all public functions (pydocstyle)
- logging.getLogger(__name__) # never print()
- Specific exception types # never bare except:
- All Lambda responses via _build_response() helper
# Infrastructure standards (enforced by tfsec)
- All resources tagged: Project, Environment, Owner
- No public S3 buckets
- SSE-S3 encryption on all storage
- No NAT gateways, WAF, KMS CMKs, VPC endpoints
When an agent generates code that violates these standards, CI fails and the agent is asked to fix the violation. The feedback loop is automated — no human needs to review every line for style compliance.
The allowlist pattern: agent data never leaks
One concrete governance pattern the agents themselves run under: the Lambda proxy that serves the team dashboard uses an explicit field allowlist. Every response from DynamoDB passes through it before reaching the browser. Internal IDs, raw telemetry, anything not on the list — none of it can leak accidentally, because the allowlist is the only exit:
# api/team_proxy.py — field allowlist enforced on every response
FIELD_ALLOWLISTS: dict[str, list[str]] = {
"activity": ["id", "actor_type", "actor_id", "action", "entity_type",
"entity_id", "timestamp", "details_agent_id"],
"runs": ["id", "agent_id", "status", "started_at", "finished_at",
"source"],
"board": ["id", "title", "status", "assignee", "priority",
"created_at", "updated_at", "ref"],
"costs": ["period", "total", "currency", "agent_spend", "aws_costs",
"budget_remaining", "byAgent"],
"status": ["service", "status", "last_checked", "details"],
"governance": ["id", "type", "status", "requestType", "createdAt",
"approvedAt", "ref", "title"],
}
def _filter_list(items: list, resource: str) -> dict:
"""Apply allowlist filtering to a list of DynamoDB items."""
allowlist = FIELD_ALLOWLISTS.get(resource, [])
filtered = [
{k: v for k, v in item.items() if k in allowlist}
for item in items
if isinstance(item, dict)
]
return {"items": filtered}
The pattern applies equally to humans writing the proxy and agents using it as a tool.
An agent that calls /api/team/activity gets exactly the fields listed —
no more. The governance rule is in the code, not in the prompt.
What ticketyboo.dev was built with
This platform was built using AI coding assistants and direct model API calls for specific reasoning tasks. Each tool has different strengths: the implementation agent is effective for tasks with clear specifications; the planning agent is strong for architecture review and cross-cutting constraint enforcement; direct API calls are used for quorum reasoning on design decisions.
The governance framework that shaped this development is open.
The Gatekeep specification, the .clinerules file, and the spec documents
are all in the public repository. Copy them, adapt them, use them.
If the articles or tools have been useful, a coffee helps keep things running.
☕ buy me a coffeeticketyboo brings governed AI development to your pull request workflow. 5 governance runs free, one-time welcome grant. No card required.
View pricing Start free →