# Designing quality gates for AI-driven development: enforce types, tests, static analysis, and security in CI to make AI's speed safe

> An explanation of designing quality gates that keep an AI coding agent's output at production quality. Mechanically enforce type safety (mypy strict / tsc), tests, static analysis, and security scanning in pre-commit and CI, structurally eliminating humans' forgotten reviews. From exhaustiveness checking via NeverError to golden vectors, it systematizes the topic from real-project know-how of running in production with 100% test coverage.

- Published: 2026-06-25
- Author: 友田 陽大
- Tags: AI駆動開発, 型安全, テスト, CI/CD, セキュリティ, Claude Code
- URL: https://tomodahinata.com/en/blog/ai-driven-development-quality-gates-ci-type-safety-test-security
- Category: AI-driven development & productivity
- Pillar guide: https://tomodahinata.com/en/blog/spec-driven-development-claude-code-ai-agent-production-workflow

## Key points

- What makes AI's speed safe is 'verification gates.' Rather than relying on human review alone for AI's output, lay down a mechanism that mechanically judges pass/fail.
- The gates are four layers: types, tests, static analysis, and security. In two stages — pre-commit (a few seconds) and CI (a full mirror) — forbid skipping verification and direct push to main.
- Type safety yields value only when 'enforced,' not as a 'policy' — eliminate any and turn switch omissions into compile errors with NeverError.
- Fix pure logic with golden vectors so regressions are detected instantly even when AI changes the implementation. There's a track record of mandating 100% coverage in CI.
- Automate security with secret scanning, dependency vulnerability checks, and CVE checks to structurally close the holes AI tends to introduce.

---

Let me state the conclusion first. **What makes an AI coding agent's speed "safe" is not human review but mechanical quality gates.** AI writes code several times faster than humans, and human review can't keep up with that speed. That's exactly why you need a mechanism that **mechanically enforces verification of types, tests, static analysis, and security in CI** and judges "may AI's output go to production" by pass/fail rather than subjectivity. AI's speed and production quality are not a trade-off; they can be reconciled by the design of these quality gates.

This article publishes the concrete design of quality gates that I've used while **mandating 100% test coverage to run an AI/GPU pipeline in production**, and in a B2B SaaS that went through four rounds of security audits. This is the "verification installment" of [the production workflow for spec-driven development](/blog/spec-driven-development-claude-code-ai-agent-production-workflow).

---

## 1. Why "human review" alone isn't enough

Traditional quality assurance centered on human code review. But in AI-driven development, this breaks down.

- **Asymmetry of speed** — AI generates a large amount of code in a day. Human review can't keep up with that speed, becoming a bottleneck or a formality.
- **Review misses** — humans miss type escape hatches, boundary conditions, and security holes. In particular, the "plausible but wrong code" AI generates easily slips through review.
- **Lack of reproducibility** — quality varies by "who reviewed it."

The solution is to **split review into "what machines can do" and "what only humans can do."** Quality that can be judged mechanically — types, tests, static analysis, security — **let machines enforce it**. Humans concentrate on the parts machines can't do: design decisions and domain validity. This is the philosophy of quality gates.

---

## 2. The four-layer quality gate

My projects' quality gates are composed of four layers. I run them in two stages: `pre-commit` (changes only, a few seconds) and `pre-push` / CI (a full mirror).

| Layer | Backend (Python example) | Frontend (TS example) | What it prevents |
|---|---|---|---|
| **Format** | Ruff | Prettier | Meaningless diffs, review load |
| **Static analysis / Lint** | Ruff / Bandit / Vulture / deptry | ESLint / Knip | Latent bugs, unused, dangerous patterns |
| **Types** | mypy --strict | tsc --noEmit | `any` escape hatches, type mismatches |
| **Tests** | pytest (golden vectors) | Vitest | Regression, missed abnormal cases |
| **Security** | pip-audit / gitleaks / Trivy | npm audit / gitleaks | Vulnerabilities, secrets, CVEs |

This gate **forbids skipping verification with `--no-verify` and direct push to `main`**. No matter how fast AI writes, it doesn't reach production unless this gate is green. In fact, in my AI video-localization platform, by **mandating 100% backend test coverage in CI (the build fails if unmet)** and keeping `mypy strict`, Ruff, and Vulture at zero errors, I raised heavy, unstable AI/GPU processing to production-operations quality.

---

## 3. Type safety yields value only when "enforced"

The policy "let's be type-safe" becomes a formality unless enforced. Unless instructed, AI passes compilation with `any` or easy type casts. There are two concrete measures to enforce type safety with quality gates.

### ① A total ban on `any` / dangerous casts

Strictify `tsconfig` and ban escape hatches like `any`, non-null assertions, and `enum` via Lint. If `any` mixes into AI's output, CI fails. The crux is making it "a constraint that fails the build," not "a policy."

### ② Exhaustiveness checking via `NeverError`

Omissions in branches (`switch`) are a common bug in AI-generated code. Turn this into **a compile-time error**.

```ts
/** 到達不能であるべき値を受け取ったら投げる。網羅性をコンパイル時に強制する番人。 */
class NeverError extends Error {
  constructor(value: never) {
    super(`Unreachable: ${JSON.stringify(value)}`);
  }
}

type PaymentStatus = "pending" | "authorized" | "captured" | "refunded";

function label(status: PaymentStatus): string {
  switch (status) {
    case "pending":
      return "保留中";
    case "authorized":
      return "与信済み";
    case "captured":
      return "確定";
    case "refunded":
      return "返金済み";
    default:
      // 新しい状態を PaymentStatus に足して case を書き忘れると、
      // ここで `never` 型が崩れてコンパイルエラーになる（取りこぼしを型で防ぐ）。
      throw new NeverError(status);
  }
}
```

This pattern pays off because **it crushes the typical regression of "a state was added but the branch wasn't written" at compile time, without waiting for the test run.** Even if AI adds a new state and forgets the branch, the build tells you. For payment state machines and fee classification, I thoroughly apply the discipline of banning `as`/`any`/`enum` and enforcing exhaustiveness with `NeverError`, even in team development.

---

## 4. Tests: fix pure logic with golden vectors

So that AI doesn't regress when it rewrites the implementation, it's effective to **fix the core pure logic with golden vectors (a fixed set of inputs and expected outputs).**

```ts
// 料金解決・状態遷移・冪等性など、副作用のない純粋関数として隔離し、
// DBなしで「入力 → 期待出力」を固定する。AIが内部実装を変えても、振る舞いは守られる。
const goldenCases = [
  { input: { lines: [{ qty: 2, unit: 500 }], taxRate: 0.1 }, expected: 1100 },
  { input: { lines: [], taxRate: 0.1 }, expectThrow: "EmptyOrderError" },
  { input: { lines: [{ qty: 1, unit: 100 }], taxRate: 0.05 }, expectThrow: "InvalidTaxRateError" },
] as const;

describe("resolveTotal（ゴールデンベクタ）", () => {
  for (const c of goldenCases) {
    it(JSON.stringify(c.input), () => {
      if ("expectThrow" in c) {
        expect(() => resolveTotal(c.input)).toThrow(c.expectThrow);
      } else {
        expect(resolveTotal(c.input).totalJpy).toBe(c.expected);
      }
    });
  }
});
```

The point is to **isolate logic into pure functions with no side effects (DB I/O).** This lets you test fast without standing up a DB and cover even boundary conditions. In my projects, with this policy I fix payments, fees, and state machines, and make a CI that runs hundreds to thousands of tests in seconds to a dozen-odd seconds. Tests being fast and green is a precondition for delegating implementation to AI with peace of mind.

---

## 5. Security: automatically close the holes AI tends to introduce

AI-generated code tends to create security holes. Hardcoded secrets, dependency vulnerabilities, dangerous patterns — **detect these mechanically with automated scanning.**

| Scan | What it detects |
|---|---|
| **gitleaks** | API keys / secrets hardcoded into the code |
| **Dependency audit** (pip-audit / npm audit) | Dependency packages with known vulnerabilities |
| **Trivy** | CVEs in container images |
| **Static analysis** (Bandit, etc.) | Use of dangerous functions, signs of injection |

Embed these in CI, and further reduce holes on the secrets-management side with **keyless CI/CD via OIDC** (authenticating from GitHub Actions without issuing long-lived cloud keys). Continuously update dependencies with `Dependabot` and mandate Conventional Commits — these "not relying on manual effort" mechanisms make quality keep up with AI's speed. Together with [idempotency](/blog/payment-double-charge-prevention-idempotency-procurement-guide) in payments and [productionizing](/blog/vibe-coding-ai-generated-code-production-hardening-guide) AI-built code, I keep verification-first design consistent.

---

## FAQ

### Q. Is it OK to not have humans review AI-written code?

Rather than "removing" human review, the right answer is "let machines do what machines can, and have humans concentrate on what machines can't." Enforce types, tests, static analysis, and security mechanically, and have humans review design decisions and domain validity. Since human review can't keep up with AI's generation speed, quality assurance becomes a formality without mechanical quality gates.

### Q. What concretely should I put in the quality gates?

At minimum, five families: format, static analysis (Lint), type checking (mypy strict / tsc), tests, and security scanning (secrets, dependency vulnerabilities, CVEs). Run these in two stages — pre-commit (changes, a few seconds) and CI (a full mirror) — and forbid skipping verification and direct push to main. What matters is making them "a constraint that fails the build" — merely writing them as a policy won't keep them.

### Q. Is 100% test coverage realistic?

Limited to the core pure logic (fee calculation, state machines, idempotency, etc.), it's realistic and effective. If you isolate logic into pure functions with no side effects, you can test fast without a DB and cover even boundary conditions. There's actually a track record of mandating 100% backend coverage in CI and running an AI/GPU pipeline in production. Rather than mechanically chasing 100% overall, including UI and external I/O, reliably fixing "the core that hurts most when broken" is more cost-effective.

### Q. What is NeverError? Why is it important?

It's a mechanism that guarantees at compile time that you've handled all cases in a branch like `switch`. By calling a function in the `default` clause that receives a value of type `never`, if you add a new case and forget the branch, it becomes a compile error without waiting for the test run. It can prevent, at the type level, the typical regression of AI adding a new state and forgetting the branch, so it's especially effective in AI-driven development.

### Q. How do you ensure security?

Embed secret scanning (gitleaks), dependency vulnerability audit (pip-audit / npm audit), container CVE scanning (Trivy), and static analysis (Bandit, etc.) into CI, and detect automatically. Further, reduce the management risk of long-lived keys with keyless CI/CD via OIDC, and continuously update dependencies with Dependabot. Since AI tends to create security holes, enforcing these mechanically is important.

---

## Summary: let machines enforce, and have humans concentrate on design

To make AI's speed safe, here's what to grasp with quality gates.

1. **What makes AI's speed safe is verification gates** — human review alone can't keep up with the speed.
2. **The gates are four layers: types, tests, static analysis, security** — in two stages, pre-commit and CI, forbid skipping and direct push to main.
3. **Type safety yields value only when enforced** — ban `any`, and turn exhaustiveness into a compile error with `NeverError`.
4. **Fix pure logic with golden vectors** — regressions are detected instantly even when AI changes the implementation. A track record of 100% coverage.
5. **Close security holes with automated scanning** — detect secrets, dependency vulnerabilities, and CVEs mechanically.

"I want to build fast with AI, but I don't want to sacrifice quality and security" — this design of quality gates is exactly the differentiator I've consistently built. I take on building verification-first CI/CD and quality gates, including introduction into existing projects.
