Every scanning tool has the same problem: it's very good at generating findings, and not particularly interested in what happens to them afterwards. You scan a repo, you get a list of issues, you file them somewhere, and a month later that list is longer and the oldest items have been quietly ignored.
I wanted to close that loop. Not by building an automated system that makes changes without oversight — that's a different kind of problem — but by automating the mechanical steps between "we know about this" and "there's a PR ready for review." The pipeline I've been calling AutoDev does that work.
The gap between scanning and fixing
When I mapped out the lifecycle of a finding, the expensive part wasn't the scan. It was everything that happened after:
- Someone reads the finding and decides whether it's worth fixing.
- Someone else translates it into a task and assigns it.
- A developer spends time context-switching to understand what needs changing.
- They make the change, open a PR, wait for review.
For a single critical finding, that process is fine — it warrants human attention at every stage. But for the long tail of medium-severity, mechanical fixes (unpinned dependency versions, missing resource tags, hardcoded strings that should be config, out-of-date patterns), the overhead kills momentum. Either the team processes them slowly, or they don't process them at all.
AutoDev targets that long tail. High-severity, complex findings still go to humans directly. The mechanical ones go through the pipeline.
The pipeline in five stages
1. Ingestion
A finding arrives — from a scanner, a governance check, a dependency audit, whatever. It gets normalised into a standard shape: what repository, what file, what line, what rule was violated, what severity, what suggested fix. That last field is the key one — without a suggested fix, there's nothing for the pipeline to work with.
2. Proposal generation
The pipeline reads the finding and the relevant file content, then generates a concrete fix proposal. Not a description of what should change — an actual diff, with an explanation of what was wrong and why the proposed change fixes it. This is where an LLM does useful work: translating "dependency X is unpinned" into "here's the exact version to pin it to, and here's the line to change."
The proposal is stored and surfaced for human review before anything touches the repository. Approving a proposal means: "yes, this is the right fix — proceed." Rejecting it means: "no, route this differently" or "I'll handle this manually."
3. Branch and change
Once a proposal is approved, the pipeline creates a branch, applies the diff, and runs the project's own verification steps — linting, type-checking, whatever the repo has configured. If verification fails, the branch is flagged and a human is notified. The pipeline doesn't try to iterate on a failing fix; that's a sign the proposed change was too mechanical and needs rethinking.
4. Pull request
If verification passes, a PR is opened. The PR description is generated from the proposal — it explains the original finding, what was changed, and references the governance rule that flagged it. Reviewers get context, not just a diff.
The PR is explicitly labelled as machine-generated. I think transparency here matters: reviewers should know they're looking at automated output so they apply appropriate scrutiny rather than assuming a human thought carefully about every line.
5. Close the finding
When the PR merges, the original finding is marked resolved. The scan that next runs on that repo should not surface the same finding. If it does, something went wrong with the fix and the system raises it again — this time with "previously attempted" context so a human knows the history.
What this doesn't solve
AutoDev is useful for a specific class of findings: ones where the correct fix is deterministic enough that an LLM can generate it reliably. That's a narrower category than it sounds. Security vulnerabilities, architectural flaws, logic errors, test coverage gaps — these need human reasoning. The pipeline won't help you there, and claiming otherwise would be dishonest.
What it does help with:
- Dependency pinning and version bumps
- Missing resource tags in infrastructure code
- Configuration values that should be environment variables
- Linting violations with deterministic fixes
- Missing docstrings on public functions
- Boilerplate patterns that the codebase enforces (logging, response helpers, type hints)
For larger codebases, this class of finding is a significant proportion of the backlog. Automating it clears space for the team to focus on the findings that actually require thinking.
It was tested: the ACME sandbox
I want to be concrete about this because "I built a pipeline" without any evidence of it running is exactly the kind of vaporware I try to avoid on this site.
The AutoDev pipeline was tested against purpose-built sandbox repositories — deliberately constructed with governance violations, unpinned dependencies, missing tags, and lint failures. The pipeline scanned them, generated proposals, opened branches, applied fixes, and raised PRs. The ACME Inventory System and ACME Widgets CRM were both built as autonomous dev sandboxes — known bad state, controlled environment, real pipeline execution against a real GitHub organisation.
Not everything worked first time. The proposal generation step was over-confident early on — it would generate fixes for findings it didn't fully understand, producing changes that were syntactically correct but semantically wrong. The verification step (run linting and type-checking before opening the PR) caught most of these. The ones it didn't catch were caught in PR review. That's the right failure mode: noisy PRs, not merged bad code.
The upstream contract: the Finding schema
AutoDev works best when scanning is governed — when findings have consistent severity ratings, standard rule identifiers, and machine-readable remediation hints. If your scanner outputs free-text descriptions with no structure, the pipeline has nothing to work with.
This is the actual Finding dataclass from the scanner
(api/models.py).
It's the contract between the scanner and AutoDev — every field is required, none are free-text blobs:
# api/models.py
@dataclass
class Finding:
"""Individual scan finding — PK=SCAN#{id} SK=FINDING#{index}."""
index: int
category: str # dependency | security | code_quality | iac | governance
# + deep: secret | sast | license | quality
severity: str # critical | high | medium | low | info
title: str
description: str
remediation: str # concrete, actionable — this is what AutoDev works from
file_path: Optional[str] = None
# Deep scan additions
analysis_layer: Optional[str] = None # dependency|secret|sast|iac|license|quality
line_number: Optional[int] = None # exact line for SAST and secret findings
The remediation field is the key one. A finding without a concrete remediation
hint ("pin to version 2.31.0" vs "dependency is outdated") gives the pipeline nothing to work with.
The scanner is the upstream contract; AutoDev is a consumer of it.
Deduplication is also part of the contract — the scanner runs
_deduplicate(findings) before storing, keyed on (file_path, category, title),
keeping the highest severity when the same issue appears in multiple layers.
AutoDev never sees duplicate findings.
The full cycle — scan, propose, branch, PR, merge, re-scan — is what I mean when I talk about a closed loop. Not lights-out automation, but a pipeline where the mechanical steps don't pile up in a backlog waiting for human time that never comes.
If the articles or tools have been useful, a coffee helps keep things running.
☕ buy me a coffeeScan any public GitHub repo for dependency risk, secrets, and code quality issues — free, no account needed.
Scan a repo free See governance agents →