# Vulnerability assessment of AI-generated code (vibe coding) [2026 edition] — a practical procedure to crush, before release, the vulnerabilities that generative AI multiplies

> Why does code written by generative AI (Copilot/Claude, etc.) have more vulnerabilities? Drawing on research data from Stanford and NYU and 'slopsquatting,' it explains the risks specific to AI-generated code and a practical procedure to crush them in four layers (SCA/secrets/SAST/DAST) before release, with real code. It also shows the safety AI cannot in principle create (authorization, business logic) and the design of human verification gates.

- Published: 2026-06-28
- Author: 友田 陽大
- Tags: セキュリティ, 脆弱性診断, AI駆動開発, OWASP
- URL: https://tomodahinata.com/en/blog/ai-generated-code-vulnerability-assessment-vibe-coding-security-guide
- Category: Application-layer security
- Pillar guide: https://tomodahinata.com/en/blog/nextjs-supabase-application-security-guide

## Key points

- Research data: users of AI assistants are more likely to write vulnerable code (injection, etc.), and moreover overconfidently believe 'I wrote it safely' (Stanford). About 40% of Copilot-generated code was vulnerable (NYU). AI's speed comes with debt that should be offset by verification.
- An AI-specific new risk, 'slopsquatting': 19.7% of generated code contains non-existent package names, and when an attacker pre-registers that name, AI's hallucination becomes a supply-chain attack (OWASP A03) outright. Prevent it with lockfile pinning, existence verification, and allowlists.
- The top 5 vulnerabilities AI tends to mass-produce: ① missing authorization (IDOR/A01) ② missing input validation (A05) ③ hardcoded secrets ④ old/non-existent dependencies (A03) ⑤ insecure defaults (A02). For each, it shows how to crush them with before/after code.
- Diagnose AI-generated code in four layers before release: SCA (slopsquatting/A03) → secrets → SAST (A05) → DAST (A01/A05). All can be gated in CI with free official tools, reconciling generation speed and production quality.
- There's safety AI cannot create in principle: authorization, tenant separation, and business logic depend on 'the meaning of business rules,' and AI doesn't know the context. AI implements, humans decide the spec and verify — this role split makes speed safe.

---

The speed of writing code with generative AI has risen to a level there's no going back from. I myself, one-person × generative AI (Claude Code), have built an METI-Minister's-Award-winning B2B SaaS and a payments platform with 0 double charges in production. That's exactly why I can say it clearly — **AI is fast, but left alone it mass-produces vulnerabilities at the same speed.**

This isn't a gut feeling; it's a fact backed by research.

- **Developers using AI assistants are more likely to write vulnerable code than those who don't.** Moreover, they tend to **overconfidently believe "I wrote secure code"** (Stanford University, [Perry et al. 2022](https://arxiv.org/pdf/2211.03622)).
- **About 40% of code generated by GitHub Copilot** contained **exploitable vulnerabilities** across 89 security-relevant scenarios (NYU, [Pearce et al. 2021, "Asleep at the Keyboard?"](https://arxiv.org/abs/2108.09293)).

But **this is not a "don't use AI" story.** My conclusion is the opposite. **AI's speed can be reconciled with production quality if you offset the debt with a verification gate before release.** The key is the role split of "delegate implementation to AI, and have humans hold spec decisions and verification." This article shows that **verification = vulnerability assessment of AI-generated code** end-to-end, from data and new risks to real code and verification-gate design.

---

## 1. Why does AI generate vulnerabilities — three structural reasons

AI produces bugs not because "it isn't smart." There are **structural reasons**, and understanding them narrows the diagnostic target.

1. **It reproduces the flawed patterns of its training data.** AI learns the world's code. Since that includes **a large amount of insecure code**, it probabilistically outputs "common ways of writing" = "common vulnerable patterns." SQL string concatenation and direct insertion of `dangerouslySetInnerHTML` are typical.
2. **It doesn't know the context (your business rules).** AI doesn't know "who may access this API." So it **blithely omits authorization checks**. This is #1 of the top 5 below.
3. **It hallucinates.** It confidently generates non-existent functions and non-existent packages. This is the breeding ground for the next section's new risk, "slopsquatting."

In other words, AI's weaknesses are three: **"reproduction of structural flaws," "lack of context," and "hallucination."** Diagnosis targets these three.

---

## 2. The AI-specific new risk "slopsquatting" — hallucination becomes a supply-chain attack

Named in 2025, a vulnerability specific to the AI era is **slopsquatting**. The mechanism is this.

1. AI suggests a **non-existent package name**, "install this."
2. An attacker **pre-registers that hallucinated package name on npm/PyPI** and plants malicious code.
3. When a developer runs `npm install` as the AI says, **the malicious code is taken into production.**

This isn't a pipe dream. According to research, **19.7% of 2.23 million samples produced by 16 generative models contained non-existent package names.** Moreover, **the same hallucination is easy to reproduce** (re-running the same prompt produced the same hallucinated name 43% of the time), so attackers can predict "which name to register to land a hit" ([DevOps.com](https://devops.com/ai-generated-code-packages-can-lead-to-slopsquatting-threat-2/) / [Cloud Security Alliance](https://labs.cloudsecurityalliance.org/research/csa-research-note-slopsquatting-ai-supply-chain-20260419-csa/)).

This is **the latest AI-era risk** that hits directly at [**A03 Software Supply Chain Failures**, newly established/expanded in OWASP Top 10:2025](https://owasp.org/Top10/2025/). The defenses are these three.

```bash
# ① 提案されたパッケージが「実在し、まともか」を install 前に検証する
npm view <package-name>   # 存在しない/極端に新しい/DL数が異常に少ないものは疑う

# ② lockfile を信頼の唯一の源にし、不意の解決を防ぐ（CIは必ず ci を使う）
npm ci                    # package-lock.json に完全一致させる（install ではなく ci）

# ③ 依存を毎回スキャン（slopsquattingは「見覚えのない依存の増加」として現れる）
npm audit --audit-level=moderate
npx osv-scanner scan --lockfile=package-lock.json
```

> **An operational tip:** when AI suggests adding a dependency, **just having a human check once "does that name really exist"** prevents most slopsquatting. In the `package.json` diff review, always look for **unfamiliar packages that have increased**.

---

## 3. The top 5 vulnerabilities AI tends to mass-produce, and how to crush them

Diagnosing AI-generated code in practice, the same holes appear repeatedly. Here are the **top 5 in order of frequency**, with before/after code.

### ① Missing authorization (IDOR / OWASP A01) — most frequent and most serious

AI writes authentication (whether you're logged in), but **omits authorization (whether that person owns that data)**.

```ts
// ❌ AIがよく書く：ログインは確認するが「所有者か」を確認しない → 他人のデータが見える
export async function GET(_req: Request, { params }: { params: { id: string } }) {
  const order = await db.order.findUnique({ where: { id: params.id } });
  return Response.json(order); // 他人のIDを入れれば、他人の注文が返る（IDOR）
}
```

```ts
// ✅ 所有権でフィルタする（「誰が所有するか」は事業ルール＝人間が指定すべき不変条件）
export async function GET(req: Request, { params }: { params: { id: string } }) {
  const userId = await requireUserId(req);          // 認証
  const order = await db.order.findFirst({
    where: { id: params.id, ownerId: userId },      // 認可：所有権で絞る
  });
  if (!order) return Response.json({ error: "not found" }, { status: 404 });
  return Response.json(order);
}
```

### ② Missing input validation (OWASP A05)

AI tends to write the "happy path" and may **skip validation at the boundary**. Always validate external input at the trust boundary.

```ts
// ✅ システムの境界（外から来る値）でZod検証し、型を絞ってから使う
const Body = z.object({ email: z.string().email(), age: z.number().int().min(0).max(150) });
const parsed = Body.safeParse(await request.json());
if (!parsed.success) return Response.json({ error: "invalid" }, { status: 400 });
```

### ③ Hardcoded secrets (OWASP A02/A03)

AI may **write API keys directly into code** as a sample. Limit server-only secrets to going through `process.env`, and prevent mixing with `NEXT_PUBLIC_` via types ([details](/blog/nextjs-env-secret-leak-prevention-public-vars-guide)). Physically stop them with Gitleaks + GitHub's Push Protection.

### ④ Old / non-existent dependencies (OWASP A03)

AI suggests old versions from its training-data point, or the previous section's hallucinated packages. Crush them with `npm audit`, Dependabot, and existence verification.

### ⑤ Insecure defaults (OWASP A02)

Opening CORS to `*`, forgetting `HttpOnly`/`Secure` on cookies, returning a stack trace on error — AI prioritizes a "minimal working configuration," so it tends to drop the **safe-side defaults**. Strengthen security headers/CSP and CORS [in bulk via middleware](/blog/nextjs-security-headers-csp-nonce-middleware-guide).

---

## 4. Pre-release checklist — diagnose AI-generated code in four layers

For AI-generated code, the basic is to **always pass it through four layers of automated diagnosis before merging**. The top 5 above are almost entirely caught by these four layers (except authorization = ①; ① is in Section 6).

| Layer | The AI weakness it targets | Command | Corresponding OWASP |
|---|---|---|---|
| **SCA** | Hallucinated/old dependencies (slopsquatting) | `npm ci` + `npm audit` + OSV-Scanner | **A03** |
| **Secrets** | Hardcoded keys | `gitleaks` + Push Protection | A02 |
| **SAST** | Structural flaws (injection, config) | `semgrep scan --config=auto` | **A05** and others |
| **DAST** | Runtime behavior (reflected XSS, headers) | ZAP `zap-baseline.py` | **A01/A05/A02** |

If you **gate these four layers as PR checks in GitHub Actions**, no matter how fast AI generates code, **vulnerabilities are automatically blocked before merge**. The concrete CI workflow (down to SARIF aggregation) is in the [CI-integration procedure article](/blog/nextjs-supabase-security-ci-sarif-github-actions-guide), and the implementation details of the four layers are in the [hands-on article on the OWASP official methodology](/blog/web-application-vulnerability-assessment-owasp-zap-sast-dast-guide). **"Writing fast with AI" and "safely stopping it at a gate" hold only as a set.**

---

## 5. Designing the human verification gate — "AI implements, humans hold spec and verification"

Tool gates are necessary but not sufficient. **The key to turning generation speed into production quality is not letting go of the two things humans should hold.**

1. **Humans hold spec decisions.** "Who owns what, and which operations they're allowed" and "what are the invariants for amounts, quantities, and state transitions" — these are business rules **AI must not be allowed to decide**. Delegate to AI "an implementation that satisfies this spec."
2. **Drop acceptance criteria into tests.** If you write "another person's order can't be seen with another person's ID" as a test, then even if AI omits authorization, **the test fails and detects it**. Preparing the verification path first is the biggest quality lever of the AI era.

```ts
// ✅ 認可の不変条件を「テスト」として固定する（AIが省略しても落ちる）
it("他人のリソースは取得できない（IDOR防止）", async () => {
  const res = await GET(reqAs(userA), { params: { id: orderOwnedByUserB.id } });
  expect(res.status).toBe(404); // 200で中身が返ったら認可バグ
});
```

This is exactly the workflow I consistently use. In the four phases of **explore → plan → implement → verify**, let AI produce speed while humans confirm spec and safety at each gate. The full picture of quality gates for AI-driven development is covered in the [AI-driven-development quality-gates article](/blog/ai-driven-development-quality-gates-ci-type-safety-test-security).

---

## 6. The safety AI cannot create in principle — authorization and business logic are the human domain

Finally, the most important line. **No matter how excellent the AI or the diagnostic tool, there's safety that cannot in principle be guaranteed.** That is **authorization (A01), tenant separation, and business logic (OWASP WSTG 4.5 / 4.10).**

The reason comes down to the "lack of context" touched on in Section 1. "Who may see this invoice" and "how many times this discount can be used" are **the 'meaning' of your business rules**, and neither AI nor a scanner knows that meaning. So it can't judge a missing authorization as "missing." **This is in contrast to SQL injection being "the same structural flaw in any app."**

| What AI / tools can protect (horizontal) | What only humans can protect (vertical) |
|---|---|
| Injection, misconfiguration, known CVEs, secret leakage | **Authorization/IDOR, tenant separation** |
| Structural / known-pattern flaws | **Business logic** (abuse of quantity, price, state transitions) |

Beyond here is the domain not of tool gates but of **manual authorization review and audit.** In particular, **"right before release, just after generating a lot of code with AI" is the timing where gaps in authorization and business logic most easily slip in** — the biggest opportunity to bring in an audit. For detecting vertical risks and multi-layered defense, see the [IDOR / broken-authorization detection article](/blog/nextjs-supabase-idor-broken-authorization-rls-detection-guide), and for the scope and cost feel of audits, see [what does a security audit look at](/blog/nextjs-supabase-security-audit-scope-when-needed-guide).

---

## Summary — turn AI's speed into production quality with verification

1. **AI mass-produces vulnerabilities** (Stanford: overconfidence; NYU: about 40% vulnerable). But the speed can be offset with verification.
2. **The AI-era new risk slopsquatting** (hallucinated packages = A03). Prevent it with lockfile pinning, existence verification, and dependency scanning.
3. **Cure the top 5 AI mass-produces** (missing authorization, missing input validation, hardcoded secrets, old/non-existent dependencies, dangerous defaults) with before/after patterns.
4. **Diagnose in four layers before release** (SCA → secrets → SAST → DAST). Gate as PR checks with free tools.
5. **Humans hold spec decisions and verification.** Don't hand authorization and business logic to AI. The safety AI can't create = the domain of audit.

"Writing fast with AI" and "shipping safely" are not in conflict. **Offsetting, every time, the debt born behind the speed with a pre-release verification gate** — with just this design, generative AI becomes the strongest weapon. It's exactly the method by which I keep building production-quality products with one-person × generative AI. When you need pre-release diagnosis of an AI-generated app, or an audit of authorization and business logic, start by visualizing the current state with the free OSS [Aegis](/aegis).
