Security vulnerabilities are cheapest to fix before they are written. Shift left means moving security checks as early in the development lifecycle as possible. A pre-commit hook runs in under a second and blocks nothing that was going to work anyway. A production incident review involves on-call engineers, customer communication, and a post-mortem.

The interesting question is not whether to shift left, but how far left you can go. The answer is: as far as you like. The full stack runs on open-source tools, GitHub Actions free tier, and a scheduled Lambda. No enterprise contract required.

Four-stage shift-left security diagram: pre-commit, CI PR check, scheduled scan, external/on-demand
The four stages of defence. Cost to fix increases left to right. The goal is to catch everything in stage 1.

Stage 1: pre-commit

Pre-commit hooks run locally on every git commit. They have no network dependency, no queue, no CI minutes. If they fail, the commit is rejected before it leaves the developer's machine.

The hooks configured in this stack:

Hook What it catches
terraform_fmt HCL formatting — enforced before the file is committed, not after review
tflint Terraform IaC linting — provider-specific rules, deprecated arguments, type errors
ruff Python lint — style, unused imports, obvious bugs — sub-second on any codebase
mypy Python type checking — catches type mismatches before runtime
bandit Python SAST — injection patterns, hardcoded passwords, shell=True, weak crypto

The combined runtime for all five hooks on a typical Python + Terraform repo is under three seconds. There is no reason not to run them.

Stage 2: CI PR check

Two workflows run on every pull request: python-ci.yml (application security) and iac-scan-orca.yml (infrastructure security). Critical or High findings block the merge. Medium and Low findings annotate the PR without blocking.

The PR check adds tools that are too slow for pre-commit but fast enough for CI:

Tool What it catches
Semgrep Cross-language SAST with community rulesets — injection, auth bypass, misuse of crypto APIs
pip-audit Known CVEs in Python dependencies, cross-referenced against PyPI Advisory Database
Orca IaC scan Terraform misconfiguration — public exposure, missing encryption, overpermissive IAM
pytest (70% gate) Regression guard — not security-specific, but a regression that bypasses auth is a security issue

Stage 3: scheduled scans

Some security problems cannot be caught at commit or PR time. Two categories matter here:

Secrets committed and later rotated. A developer commits an AWS key, realises, rotates it, and removes it in a follow-up commit. The secret is gone from HEAD — but it is still in git history. Gitleaks scans the full commit history on a nightly schedule and will find it. The rotation was correct; the history still needs to be reviewed and optionally rewritten.

CVEs published after the last PR merge. Your dependency on requests==2.28.1 was clean when you merged six weeks ago. A CVE was published last Thursday. No code has changed, so no PR check would have caught it. Trivy runs nightly against the live filesystem and container images, and will surface newly-published vulnerabilities against unchanged dependencies.

The security-github-scanner Lambda — deployed in this stack — runs both Gitleaks and Trivy on a schedule across the full GitHub organisation, writing structured findings to DynamoDB for review.

Secret detection: the highest-value scan

Hardcoded credentials are the single highest-return security scan. Automated scanners harvest leaked credentials from GitHub within minutes of a commit — there are bots running continuously watching for AWS key patterns, GitHub tokens, Stripe keys, and more.

The patterns running in the ticketyboo scanner (api/layers/secret.py):

# api/layers/secret.py — pattern registry (name, compiled_re, severity)
_PATTERNS = [
    ("AWS Access Key",  re.compile(r"AKIA[0-9A-Z]{16}"),                     "critical"),
    ("AWS Secret Key",  re.compile(r"(?i)aws_secret_access_key\s*[=:]\s*['\"]?[A-Za-z0-9/+=]{40}"), "critical"),
    ("Private Key",     re.compile(r"-----BEGIN (RSA|EC|OPENSSH|DSA|PGP) PRIVATE KEY"), "critical"),
    ("Database URL",    re.compile(r"(?i)(postgres|mysql|mongodb|redis)://[^\s'\"]+:[^\s'\"]+@"),  "critical"),
    ("Generic API Key", re.compile(r"(?i)(api[_-]?key|apikey)\s*[=:]\s*['\"][A-Za-z0-9_\-]{16,}['\"]"), "high"),
    ("Generic Token",   re.compile(r"(?i)(token|secret|auth)\s*[=:]\s*['\"][A-Za-z0-9_\-]{16,}['\"]"),   "high"),
    ("JWT Token",       re.compile(r"eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}"),   "high"),
    ("Webhook URL",     re.compile(r"https://hooks\.(slack|discord)\.com/[^\s'\"]+"),                     "high"),
]

# Plain regex isn't enough — high-entropy strings catch keys that don't match patterns
_ENTROPY_THRESHOLD = 4.5   # bits/char
_ENTROPY_MIN_LENGTH = 16

# False positives suppressed by placeholder pattern
_PLACEHOLDER_RE = re.compile(
    r"(?i)(your[_-]?api[_-]?key|REPLACE_ME|xxx+|placeholder|example|changeme|TODO)"
)

def _shannon_entropy(s: str) -> float:
    freq: dict[str, int] = {}
    for ch in s:
        freq[ch] = freq.get(ch, 0) + 1
    length = len(s)
    return -sum((c / length) * math.log2(c / length) for c in freq.values())

Entropy analysis catches secrets that don't match known patterns — randomly-generated tokens, session keys, internal service credentials. A 32-character base64 string assigned to a variable named auth with entropy > 4.5 bits/char gets flagged even if it doesn't look like any known credential format. Matched values are redacted (first 4 + last 4 chars) before storage — the finding records the location, not the secret itself.

When you find a hardcoded credential: Treat it as already compromised. Rotate it immediately — before patching the code. Assume the credential was harvested within the first hour of the commit being public. The patch is secondary to the rotation.

SAST: AST-based detection in Python

Regex-based SAST has a false-positive problem: it can't tell the difference between a string that looks dangerous and code that is dangerous. The scanner uses Python's ast module to parse Python files and inspect the actual call graph:

# api/layers/sast.py — command injection check via AST walk
def _check_command_injection(node: ast.AST) -> Optional[str]:
    """Flag subprocess/os calls with shell=True."""
    if isinstance(node, ast.Call):
        func = node.func
        func_name = ""
        if isinstance(func, ast.Attribute):
            func_name = func.attr
        elif isinstance(func, ast.Name):
            func_name = func.id
        if func_name in ("system", "popen", "run", "call", "Popen", "check_output"):
            for kw in node.keywords:
                if kw.arg == "shell" and isinstance(kw.value, ast.Constant) \
                        and kw.value.value is True:
                    return "shell=True enables command injection if user input is passed"
    return None

# SQL injection: detect f-strings or concatenation inside .execute()
def _check_sql_injection(node: ast.AST) -> Optional[str]:
    if isinstance(node, ast.Call):
        func = node.func
        if isinstance(func, ast.Attribute) and func.attr == "execute":
            if node.args:
                arg = node.args[0]
                if isinstance(arg, (ast.JoinedStr, ast.BinOp)):
                    return "String-formatted SQL query is vulnerable to injection"
    return None

# Checks are registered with name + severity, run against every node in the AST
_AST_CHECKS = [
    ("Command Injection",          "critical", _check_command_injection),
    ("SQL Injection",              "critical", _check_sql_injection),
    ("Insecure Deserialization",   "high",     _check_insecure_deser),
    ("Cross-Site Scripting (XSS)", "high",     _check_xss),
    ("Path Traversal",             "high",     _check_path_traversal),
    ("Weak Cryptography",          "medium",   _check_weak_crypto),
]

AST walking catches things that a regex can't: shell=True as a keyword argument regardless of spacing or quoting, f-strings inside .execute() regardless of variable names, pickle.loads() vs json.loads(). For JavaScript, Go, and Ruby — where Python's ast module doesn't apply — the scanner falls back to regex patterns. Parse failures on Python files also fall back to regex.

IaC security scanning

Infrastructure as Code files (Terraform, CloudFormation, Pulumi) are often more security-critical than application code, but receive less scrutiny. Common patterns:

Public S3 buckets

Any Terraform resource with acl = "public-read" or block_public_acls = false without explicit data classification sign-off. Default should be private. Public delivery should use CloudFront OAC, not open ACLs.

Unbounded Lambda concurrency

A Lambda with no reserved_concurrent_executions set can exhaust your account's total concurrency (default: 1000). On Free Tier, unexpected traffic can consume your monthly compute allocation before you notice.

Missing encryption

DynamoDB tables, S3 buckets, and SQS queues without server-side encryption enabled. SSE-S3 (AES-256) is free and should be the default for all storage resources.

Stage 4: org-level scanning and the OWASP Top 10

The GitHub API's /orgs/{org}/repos endpoint returns all repositories in an organisation (paginated). With an authenticated PAT (5,000 requests/hour), you can enumerate all repos, fetch their file trees, and download specific files for analysis — all within the free tier of the GitHub API.

# Enumerate all repos in an org and scan each
import asyncio
from github_client import GitHubClient
from scanner import scan_repository

async def scan_org(org: str, pat: str) -> list[dict]:
    client = GitHubClient(pat)
    repos = client.list_org_repos(org)  # handles pagination

    # Scan up to 10 repos concurrently
    semaphore = asyncio.Semaphore(10)

    async def scan_one(repo):
        async with semaphore:
            return await scan_repository(
                repo_url=f"https://github.com/{org}/{repo['name']}",
                scan_id=generate_scan_id(),
            )

    return await asyncio.gather(*[scan_one(r) for r in repos])

The OWASP Top 10 is not a checklist — it's a risk taxonomy. Each category maps to a set of detectable code and configuration patterns covered by the scanner layers:

OWASP Category Scanner layer What it checks
A01: Broken Access Control iac.py S3 buckets with public ACLs, missing IAM conditions, no resource policies
A02: Cryptographic Failures secret.py Hardcoded secrets, HTTP (not HTTPS) endpoints, weak cipher configs
A03: Injection sast.py String concatenation in SQL queries, shell=True in subprocess calls
A05: Security Misconfiguration iac.py Debug mode enabled, default credentials, missing security headers
A06: Vulnerable Components dependency.py Dependencies with known CVEs via GHSA GraphQL batch query
A09: Logging Failures quality.py Missing audit logging, print() in production, no structured log format

This separation — fast targeted scan on PR, full org scan on schedule — keeps CI times under 60 seconds while still catching the full range of findings over time.

In production: The scanner found a hardcoded credential pattern in the pallets/flask demo scan example results — a SECRET_KEY assignment flagged as a potential secret leak. The scanner also detected missing SECURITY.md and CODEOWNERS across multiple repositories in the demo set.

Scan your repo →

If the articles or tools have been useful, a coffee helps keep things running.

☕ buy me a coffee

Related tools and articles

→ Scan your repository for security findings (live) → Governance as code → Data governance and GDPR compliance

Scan any public GitHub repo for dependency risk, secrets, and code quality issues — free, no account needed.

Scan a repo free See governance agents →