# Spec-driven development × Claude Code: a production workflow that doesn't break even when you hand large implementations to AI

> With spec-driven development — the opposite of vibe coding — this explains a workflow that keeps production quality even when you hand large implementations to an AI coding agent (Claude Code). Make the spec the single source of truth, have the AI 'implement' rather than 'decide,' and harden it with verification gates — it systematizes the four phases of explore→plan→implement→verify and the design of machine-checkable acceptance criteria, from real-project know-how.

- Published: 2026-06-25
- Author: 友田 陽大
- Tags: Claude Code, 生成AI, AI駆動開発, 型安全, テスト, アーキテクチャ設計
- URL: https://tomodahinata.com/en/blog/spec-driven-development-claude-code-ai-agent-production-workflow
- Category: AI-driven development & productivity

## Key points

- The opposite of vibe coding is spec-driven development. Make the spec the single source of truth, and have the AI 'implement' but not 'decide the spec.'
- The workflow is four phases: explore→plan→implement→verify. The one-shot success rate is decided by how much energy you pour into 'plan.'
- Make acceptance criteria 'machine-checkable tests' rather than prose — judge whether AI's output passes or fails without relying on human subjectivity.
- What makes AI's speed safe is verification gates (types, tests, static analysis, security). Don't trust AI's output as-is.
- Context is the scarcest resource. File-ify specs and conventions, and separate unrelated tasks, to keep AI's accuracy.

---

Let me state the conclusion first. **The key to keeping production quality even when handing large implementations to an AI coding agent is "make the spec the single source of truth, and have the AI 'implement' but not 'decide the spec.'"** The opposite of "vibe coding" — issuing instructions to AI on the fly — is this **spec-driven development**. Fix the spec first, drop it into machine-checkable acceptance criteria (tests), have the AI implement, and judge pass/fail mechanically with verification gates — this discipline reconciles AI's overwhelming implementation speed with quality that withstands production operations.

With GitHub's Spec Kit gathering tens of thousands of stars in a short time and being called "the fastest-growing approach to AI coding," software development in 2026 is moving greatly in this direction. This article publishes, in a reproducible form, the actual workflow with which I, **one-person × generative AI (Claude Code), built an METI-Minister's-Award-winning B2B SaaS and a payments platform with 0 double charges in production**.

---

## 1. Why "vibe coding" breaks down at scale

"Talk to the AI, say build it like this, and use what comes out" — this vibe-coding style is astonishingly fast for prototypes. But it breaks down as scale grows. The reason is simple: **the spec (what should be built) exists only inside the conversation with the AI, and it vanishes.**

| Symptom of vibe coding | Root cause |
|---|---|
| "It came out as a different implementation than before" | The spec depends on the conversation and isn't reproducible |
| Every fix breaks something else | No acceptance criteria (tests), so regressions can't be detected |
| Whole-system consistency collapses | Each part is generated locally and the design isn't coherent |
| "It works but can't ship to production" | The verification gates (types, security) are missing |

Vibe coding is a "let the AI think" approach. But **what you should delegate to AI is 'implementation,' not 'the decision of what to build.'** Humans holding the spec decision and making it a persistent source of truth is the starting point of spec-driven development. (Also see [productionizing AI-built code](/blog/vibe-coding-ai-generated-code-production-hardening-guide).)

---

## 2. The four-phase workflow: explore → plan → implement → verify

What I consistently use is the following four phases. This aligns with how Anthropic itself recommends using Claude Code, and **the one-shot success rate is decided almost entirely by how much energy you pour into the "plan" phase.**

```text
1. 探索（Explore）  既存コード・制約・規約を読む。まだ書かない。
   ↓
2. 計画（Plan）     仕様と段取りを固める。受け入れ条件を機械検査可能にする。← ここが勝負
   ↓
3. 実装（Implement）計画に沿ってAIに実装させる。仕様から外れたら止める。
   ↓
4. 検証（Verify）   型・テスト・静的解析・セキュリティを機械的に通す。緑でなければ未完了。
```

### Explore: read before you write

First, place a phase that **only reads** the relevant code, existing conventions, and constraints. The crux is not having the AI implement here. The larger the codebase, the more that implementing right away produces "floating code" that ignores existing conventions. When the exploration load is high, delegate it to a dedicated investigation subagent so the main context isn't polluted.

### Plan: make acceptance criteria "tests"

This is the most important. A **prose spec** like "the user can log in correctly" has a pass/fail that's ambiguous to AI and humans alike. In spec-driven development, drop the acceptance criteria into **machine-checkable tests**.

```ts
// 仕様を「散文」ではなく「実行可能な受け入れ条件」として書く。
// AIの実装が「合格」か「不合格」かを、人間の主観ではなくテストが判定する。
describe("注文金額の確定（仕様）", () => {
  it("税率は 0 / 8% / 10% のみ許容し、それ以外は拒否する", () => {
    expect(() => resolveTotal({ lines, taxRate: 0.05 })).toThrow(InvalidTaxRateError);
  });

  it("合計は明細の積み上げ＋税で決定し、クライアント値を信用しない", () => {
    // サーバ側で再計算する（金額改ざんを構造的に排除）
    expect(resolveTotal({ lines, taxRate: 0.1 }).totalJpy).toBe(expectedTotal);
  });

  it("同一の冪等キーでの再実行は二重課金しない", async () => {
    await charge(req);
    await charge(req); // リトライ
    expect(await countCharges(req.idempotencyKey)).toBe(1);
  });
});
```

If this test exists first, the AI's implementation heads toward the clear goal of "make the test green." In projects I've handled, I **isolate core logic — payment fee resolution, state machines, idempotency — as pure functions and unit-test them with golden vectors (a fixed set of inputs and expected outputs)**. Since the spec is fixed as tests, even if the AI changes the implementation, regressions are detected instantly.

---

## 3. Don't let AI "decide": file-ify specs and conventions

The core of spec-driven development is to **persist the "source of truth" handed to the AI as files.** Don't let it sink into the conversation. Concretely:

- **Project conventions (the constitution)** — articulate coding principles, prohibitions (e.g. no `any`), the architecture's layer separation, and the test policy in a file at the repository root (e.g. `CLAUDE.md`). The AI reads this every time and implements per the conventions.
- **Feature specs** — document the behavior and acceptance criteria of the feature to be built, before implementation.
- **Architecture decision records (ADRs)** — leave behind "why this choice was made" and "what risk was accepted."

This repository itself places a `CLAUDE.md` (the project's constitution) at the root and arranges local convention files per directory. The AI agent always loads the relevant conventions before implementing. **Rather than verbally praying about "what happens if conventions are broken," articulate them in a file and further enforce them mechanically in CI** — this is the mechanism that turns AI's speed into something disciplined.

> **Design philosophy**: AI is an excellent "implementer," but left alone it "optimizes on the spot" and breaks whole-system consistency. Fix the three sources of truth — spec, conventions, acceptance criteria — in files, and run the AI within those constraints. This lets you enjoy AI's speed while humans keep holding design consistency.

---

## 4. Verification gates: the last line of defense that makes AI's speed "safe"

Even when implementation is done, that's not "complete." **Only after mechanically passing the verification gates of types, tests, static analysis, and security is it complete.** Not trusting AI's output because "it looks like it's working" is the divide from production quality.

My projects' verification gates are multi-layered, for example as follows.

| Layer | Content |
|---|---|
| **Types** | `mypy --strict` / `tsc --noEmit`. Eliminate `any`, make invalid states unrepresentable |
| **Tests** | Unit tests of pure logic + golden vectors. Enforce coverage in CI |
| **Static analysis** | Lint, unused detection, dependency-health checks |
| **Security** | Secret scanning, dependency vulnerability audit, container CVE scanning |

I run these in two stages — `pre-commit` (changes only, a few seconds) and `pre-push` (a full mirror of CI) — and forbid direct push to `main` and skipping verification. The concrete design of verification gates is detailed in [quality gates for AI-driven development](/blog/ai-driven-development-quality-gates-ci-type-safety-test-security). What matters is to **integrate "AI writes fast" and "harden with verification" as a single workflow.** Speed and safety are not a trade-off; they can be reconciled by design.

---

## 5. Context is the scarcest resource

A principle often overlooked in AI coding, yet decisively important, is that **context is the scarcest resource.** An AI agent's accuracy strongly depends on the quality and quantity of the context given. When the context is polluted with irrelevant information, accuracy drops.

The practical discipline is as follows.

- **Separate unrelated tasks** — when moving to other work, reset the context. Don't cram multiple unrelated tasks into one session.
- **Offload heavy investigation to a subagent** — delegate analysis of large search results and logs to a dedicated agent, and don't pollute the main context.
- **If the same fix fails twice in a row, the context is contaminated** — rather than continuing a long chain of corrections, resetting the context and redoing with a refined prompt is almost always faster.

"Handing the key points in clean context" yields higher-quality results than "persisting in one long conversation." This is the same for human review and for AI agents.

---

## FAQ

### Q. What's the difference between spec-driven development and vibe coding?

Vibe coding is the "talk to AI and use what comes out" style, where the spec exists only inside the conversation and isn't reproducible. Spec-driven development fixes the spec first and persists it as files, drops it into machine-checkable acceptance criteria (tests), has the AI implement, and judges pass/fail with verification gates. Vibe coding is fast for prototypes, but spec-driven is needed for large-scale, production quality.

### Q. Doesn't quality drop when you delegate to AI?

It depends on how you delegate. Delegating even "the decision of what to build" to AI breaks consistency. In spec-driven development, humans fix the sources of truth — spec, conventions, acceptance criteria — in files, and the AI implements within those constraints. Further, since the verification gates of types, tests, static analysis, and security are enforced mechanically in CI, you can keep production quality while enjoying AI's speed.

### Q. How much time should I spend on the "plan" phase?

The one-shot success rate is decided almost entirely by how much energy you pour into the plan phase. Rushing to implement and skipping planning ends up slower due to rework. In particular, dropping acceptance criteria into machine-checkable tests is the most important investment to make in the plan phase. If the spec is fixed as tests, the AI's implementation heads toward a clear goal and regressions are detected instantly.

### Q. What does it concretely mean to make acceptance criteria "tests"?

It means writing them not as prose like "the user can log in correctly," but as tests whose pass/fail can be judged mechanically by running them — like "return this output for this input" or "no double charge on re-running with the same idempotency key." This lets you judge whether the AI's implementation meets the spec by the test's green/red rather than human subjectivity.

### Q. Can it be used with AI agents other than Claude Code?

It can. Spec-driven development is not a feature of a specific tool but a methodology of "make the spec the source of truth, have the AI implement, and harden with verification." The four phases of explore→plan→implement→verify, test-ifying acceptance criteria, file-ifying conventions, and CI enforcement of verification gates — these are principles applicable to any AI coding agent.

---

## Summary: hold the spec, delegate the implementation, harden with verification

To keep production quality even when handing large implementations to AI, here's what to grasp.

1. **Make the spec the single source of truth, and have the AI 'implement' but not 'decide'** — the opposite of vibe coding.
2. **The workflow is explore→plan→implement→verify** — the one-shot success rate is decided by investment in "plan."
3. **Make acceptance criteria machine-checkable tests** — don't rely on subjectivity for pass/fail.
4. **What makes AI's speed safe is verification gates** — enforce types, tests, static analysis, and security in CI.
5. **Context is the scarcest resource** — file-ify specs and conventions, and separate unrelated tasks.

"I want to build fast using AI, but also ensure quality" — this reconciliation is exactly what I've practiced with one-person × generative AI. With spec-driven development and a verification-first workflow, I take on development that reconciles speed and production quality, from requirements definition through operations.
