Building a governance proxy in 200 lines of Python -- ticketyboo.dev

Most governance documents are aspirational. They describe what agents should and should not do, then sit in a Confluence page while the agents do whatever they want. The gap between written policy and enforced behaviour is where incidents happen.

The governance proxy described here is a different approach. Rules are written as JSON. Every agent action is evaluated against them before it executes. The result is one of three decisions: allow, deny, or gate. Gate means "pause and ask a human." The whole engine is 200 lines of Python with no external dependencies and no LLM call.

Every action flows through the proxy. Nothing executes without evaluation.

Why not use an LLM to judge governance?

The obvious alternative is to pass the proposed action to an LLM and ask "is this permitted?" That approach has real appeal: natural language rules, nuanced interpretation, handles edge cases gracefully. It also has three problems:

First, latency. An LLM call to a small model adds 600-1000ms to every agent action. Governance should not be the bottleneck. Second, cost. At 1000 agent actions per day, Haiku adds roughly $0.60/month. Small but real. Third, and most important: consistency. LLMs are probabilistic. The same action text evaluated twice might produce different verdicts. Governance that is inconsistent is not governance.

Deterministic evaluation is correct for this problem. The rules are written by humans, reviewed by humans, and committed to version control. The engine just applies them.

The schema

A guardrails config is a JSON object. It has a name, a default action, and an array of personas. Each persona contains an array of rules. Each rule has a match block and an action.

{
  "version": "1.0",
  "name": "Gatekeep (ticketyboo.dev)",
  "default_action": "allow",
  "personas": [
    {
      "id": "sentinel",
      "name": "Security Sentinel",
      "rules": [
        {
          "id": "no-iam-modification",
          "description": "Deny IAM policy or role modifications",
          "match": {
            "action_contains": ["iam", "policy", "role", "permission"],
            "action_type": ["write", "create", "delete", "modify"]
          },
          "action": "deny",
          "reason": "IAM modifications require human approval."
        }
      ]
    }
  ]
}

The default_action is what happens when no rule matches. Setting it to "allow" creates a permissive-by-default policy. Setting it to "deny" creates a zero-trust policy where everything not explicitly permitted is blocked. Most real configs land somewhere in between: allow by default, deny specific dangerous patterns, gate expensive operations.

Five match conditions

Every rule has a match block. All conditions in the block are ANDed: every condition must pass for the rule to match. This prevents overly broad rules from triggering on partial text.

The five supported conditions are:

action_contains: any of the listed strings must appear (case-insensitive) in the action text. Used for keyword matching: ["nat", "nat gateway"] catches both abbreviated and full forms.

action_type: the action type field must match one of the listed values. Typical values: read, write, execute, deploy, create, delete. This separates read-only operations from mutations.

target_contains: any of the listed strings must appear in the target resource field. Used to scope rules to specific environments or services: ["production", "prod", "live"].

context_has_key: the context dict must contain any of the listed keys. Useful for environment-specific rules: only apply if "environment" is present.

context_key_value: the context dict must have a key with an exact value. {"environment": "production"} matches only production deployments.

The implementation is a single function:

def _match_condition(
    rule_match: dict, action_text: str,
    action_type: str, target: str, context: dict,
) -> bool:
    action_lower = action_text.lower()

    if "action_contains" in rule_match:
        terms = rule_match["action_contains"]
        if not any(t.lower() in action_lower for t in terms):
            return False

    if "action_type" in rule_match:
        types = [t.lower() for t in rule_match["action_type"]]
        if action_type.lower() not in types:
            return False

    if "target_contains" in rule_match:
        terms = rule_match["target_contains"]
        if not any(t.lower() in target.lower() for t in terms):
            return False

    if "context_has_key" in rule_match:
        keys = rule_match["context_has_key"]
        if not any(k in context for k in keys):
            return False

    if "context_key_value" in rule_match:
        for k, v in rule_match["context_key_value"].items():
            if context.get(k) != v:
                return False

    return True

Each condition returns False immediately if it fails. If all conditions pass, the function returns True. The short-circuit evaluation keeps the common case (no match) fast.

Persona dispatch and first-match-wins

Personas are a structural device. They let you group rules by concern: security rules in one persona, cost rules in another, compliance rules in a third. Each team or discipline can own its persona without touching the others.

The evaluation loop is simple: iterate personas in order, iterate rules within each persona in order, return on the first match. If no rule matches, apply the default action.

def evaluate(config: dict, action_text: str,
             action_type: str = "unknown",
             target: str = "",
             context: dict | None = None) -> EvaluationResult:
    if context is None:
        context = {}

    default_action = config.get("default_action", "allow")
    evaluation_steps: list[str] = []

    for persona in config.get("personas", []):
        persona_id = persona.get("id", "?")
        evaluation_steps.append(f"Checking persona: {persona.get('name', persona_id)}")

        for rule in persona.get("rules", []):
            rule_id = rule.get("id", "?")
            matched = _match_condition(
                rule.get("match", {}),
                action_text, action_type, target, context,
            )
            evaluation_steps.append(
                f"  Rule '{rule_id}': {'MATCH' if matched else 'no match'}"
            )

            if matched:
                return EvaluationResult(
                    decision=rule["action"],
                    matched_persona=persona_id,
                    matched_rule=rule_id,
                    reason=rule.get("reason", f"Matched rule '{rule_id}'"),
                    config_name=config.get("name", "unnamed"),
                    evaluation_steps=evaluation_steps,
                )

    evaluation_steps.append(f"No rules matched. Applying default: {default_action}")
    return EvaluationResult(
        decision=default_action,
        matched_persona=None,
        matched_rule=None,
        reason=f"No governance rule matched. Default action is '{default_action}'.",
        config_name=config.get("name", "unnamed"),
        evaluation_steps=evaluation_steps,
    )

The evaluation_steps list is a trace of everything the engine checked. It is returned with the result and shown in the demo UI. When a rule fires unexpectedly, the trace tells you exactly which persona and rule matched and why.

Same proxy, different actions, different outcomes. The rules decide.

Three decisions

allow: the action is permitted. The agent proceeds. No further action required. Most actions in a permissive-by-default config will result in allow.

deny: the action is blocked. The agent should not proceed. The reason field explains why. Deny rules cover actions that are always wrong regardless of context: storing secrets in environment variables, creating NAT Gateways, modifying IAM without review.

gate: the action is paused pending human approval. The agent cannot proceed unilaterally. Gate rules cover actions that are sometimes right but always require a second set of eyes: production deployments, destructive database operations, cost- significant resource provisioning.

The distinction between deny and gate is the distinction between "this is never acceptable" and "this might be acceptable but requires human judgement." Getting that boundary right is the design work. The engine just executes whatever the config says.

Mapping GATEKEEP.md to personas

ticketyboo.dev has a governance document at docs/GATEKEEP.md. It describes security constraints, cost constraints, and deployment gates in prose. The governance proxy demo ships a JSON config that encodes those constraints as executable rules.

The mapping is direct. The sentinel persona captures security rules from Gatekeep: no IAM modifications, no secrets in Lambda environment variables, no raw credentials in code. The cost-guardian persona captures cost rules: no NAT Gateways ($32+/month), no Secrets Manager ($0.40/secret/month), no RDS outside the specific free-tier allowance.

{
  "id": "cost-guardian",
  "name": "Cost Guardian",
  "rules": [
    {
      "id": "no-nat-gateway",
      "match": {
        "action_contains": ["nat", "nat gateway", "natgateway"],
        "action_type": ["create", "deploy", "provision", "add"]
      },
      "action": "deny",
      "reason": "NAT Gateways cost $32+/month. Use Lambda Function URLs instead."
    },
    {
      "id": "no-secrets-manager",
      "match": {
        "action_contains": ["secrets manager", "secretsmanager", "create secret"],
        "action_type": ["create", "add", "provision"]
      },
      "action": "deny",
      "reason": "Use SSM Parameter Store SecureString instead."
    }
  ]
}

The JSON config is committed alongside the codebase. When governance policies change, the diff shows exactly what changed and who approved it. The rule that blocked a specific agent action can be traced to a specific commit.

The Lambda handler

The proxy is a Lambda Function URL. It exposes four endpoints:

POST /evaluate: evaluate an action against a config. The request body contains action_text, action_type, target, context, and optionally a config_example name or a full config object. If neither is provided, the Gatekeep config is the default.

GET /examples: list the bundled example configs by name and description. The frontend uses this to populate the config selector chips.

GET /examples/{id}: return the full JSON for a named example config. Lets clients display and modify the config before evaluating against it.

POST /validate: validate a config without evaluating. Returns a 200 with valid: true or a 400 with the validation error. Used by the frontend's "Validate" button.

The handler includes rate limiting (20 requests per IP per hour, 200 per day total) via DynamoDB TTL records, and an SSM kill switch for emergency shutdown. Both are patterns used across every other Lambda in the system.

Validation before evaluation

User-submitted configs are validated before evaluation. The validator checks: structural correctness (personas array, rules array, required fields), action values are one of allow/deny/gate, and aggregate limits (max 20 personas, max 100 rules total). Configs over 32 KB are rejected.

def validate_config(config: dict) -> None:
    if "personas" not in config:
        raise GuardrailsError("Config must have a 'personas' array")

    total_rules = 0
    for i, persona in enumerate(config["personas"]):
        if "id" not in persona:
            raise GuardrailsError(f"Persona {i} missing 'id'")
        if "rules" not in persona:
            raise GuardrailsError(f"Persona '{persona.get('id', i)}' missing 'rules'")

        for rule in persona["rules"]:
            if "action" not in rule:
                raise GuardrailsError(f"Rule missing 'action'")
            if rule["action"] not in ("allow", "deny", "gate"):
                raise GuardrailsError(f"Rule action must be allow|deny|gate")
            if "match" not in rule:
                raise GuardrailsError(f"Rule missing 'match' conditions")

        total_rules += len(persona["rules"])
        if total_rules > 100:
            raise GuardrailsError("Maximum 100 rules total")

Validation errors are returned as 400 responses with a human-readable message. This gives integrators immediate feedback when a config is malformed, without leaking internal stack traces.

What this leaves out

This proxy is intentionally narrow. Several governance capabilities are out of scope by design:

LLM-as-judge mode: passing the action to a model and asking for a verdict. Useful for nuanced or contextual cases where keyword matching is insufficient. The architecture supports it as an optional extension: if no deterministic rule matches, invoke an LLM and cache the result. Not implemented here because it adds latency and cost that the current use cases do not justify.

Audit log: every evaluation result written to DynamoDB for compliance purposes. The current implementation logs to CloudWatch. A full audit trail would add a DynamoDB write to every evaluation call. Straightforward to add; deferred because the demo does not require it for its stated purpose.

Rule chaining: the ability to say "if sentinel denies, check cost- guardian as well." The first-match-wins model is simpler and avoids the complexity of conflicting verdicts from multiple matching rules. For most real-world policies, this is the right tradeoff.

Dynamic rules: rules that reference external state (e.g., "deny if the AWS cost for today exceeds $10"). The current model is purely static. Dynamic rules would require an additional lookup on every evaluation and introduce the possibility of the governance proxy itself failing due to an external dependency.

Patterns

This build demonstrates three agentic patterns from Gulli's taxonomy:

Pattern 18 (Guardrails): constrain agent behaviour within defined boundaries. The proxy is the constraint layer. Every action that an agent proposes passes through it before execution. The rules define the boundary; the engine enforces it.

Pattern 2 (Routing): classify input and direct it to the appropriate handler. Persona dispatch is routing: the action is classified by type and content, then routed to the first persona whose rules match it. The personas play the role of specialised handlers.

Pattern 13 (Governance and Auditability): systematic tracking of agent decisions with enough context to reproduce and explain them. The evaluation_steps trace returned with every result is the audit record. It is human-readable, deterministic, and requires no additional infrastructure to produce.

Try it: the live demo lets you paste a config, write an action, and see the evaluation trace. The Gatekeep example config is pre-loaded. You can modify any rule and see how it changes the verdict.

Patterns 18, 2, and 13 from Agentic Design Patterns by Antonio Gulli. Pattern 18 is Guardrails and Safety Constraints. Pattern 2 is Routing and Classification. Pattern 13 is Governance, Auditability, and Compliance.

If this was useful, buy me a coffee. It keeps the Free Tier running.

ticketyboo runs five governance agents on every pull request — Security, Cost, SRE, CTO, and Dependency. Evidence signed, audit trail complete.

See how it works 5 free runs, one-time →