# Playwright E2E test design guide [2026 edition] — unbreakable, fast, trustworthy tests at production quality

> A complete guide to designing production-quality E2E tests with Playwright. With real code, it explains: eliminating flakes with role-based locators and web-first assertions (auto-waiting), mocking external APIs, inspecting a11y in CI, a configuration that runs against the production build, observability with traces/reports, and CI sharding.

- Published: 2026-06-24
- Author: 友田 陽大
- Tags: Next.js, フロントエンド, アクセシビリティ, TypeScript, アーキテクチャ設計, パフォーマンス
- URL: https://tomodahinata.com/en/blog/playwright-e2e-testing-production-design-guide
- Category: Frontend
- Pillar guide: https://tomodahinata.com/en/blog/nextjs-16-app-router-cache-components-data-fetching

## Key points

- E2E concentrates on high-value flows, not full coverage. Divide roles: pure logic to unit, flows/SEO/a11y to E2E.
- Write locators on a role basis with getByRole / getByLabel, avoiding CSS/XPath to get both unbreakability and a11y verification.
- Auto-wait with web-first assertions (toBeVisible, etc.), and discard the fixed sleep of waitForTimeout as a source of flakes.
- Mock external APIs with page.route, and don't reach the real payment/email (cost, reliability, security).
- Run against the production build of next start, not next dev, and make it withstand operation with traces, screenshots, and CI sharding.

---

E2E is the fortress that finally guarantees "is it OK to release." That's exactly why slow, fragile tests are worse than none. Design determines quality.

---

## 1. What to verify with E2E and what with unit

Tests have layers. Writing everything as heavy E2E makes them slow, fragile, and high-cost.

| Target | Suited test | Reason |
| ---- | ------------ | ---- |
| Pure logic (validation, computation) | Unit (Vitest, etc.) | Fastest, cheapest, easy to cover |
| User flow (input → submit → confirm) | E2E (Playwright) | Guarantees integration in a real browser |
| SEO plumbing (JSON-LD, sitemap, RSS) | E2E | Can verify the output after rendering |
| Automated accessibility inspection | E2E (axe) | Run against the real DOM |

The principle is "**concentrate on a few high-value E2E.**" For example, on this site, I hold down form submission, navigation, the booking flow, and a11y with E2E, and lean data formatting and rate-limit window calculation toward unit tests.

---

## 2. Locators: write on a role basis (unbreakability = a11y)

The most important design decision is the locator strategy. A CSS selector like `page.locator(".btn-primary")` breaks instantly on a design change. Grab elements with **the cues a user recognizes (role, label, text).**

```ts
// ❌ 実装詳細に依存。クラス名が変わると壊れる
await page.locator("div.form > button.submit").click();

// ✅ ユーザー視点。役割とアクセシブルネームで掴む
await page.getByRole("button", { name: "送信する" }).click();
await page.getByLabel("メールアドレス").fill("taro@example.com");
await page.getByRole("radio", { name: /プロジェクト単位/ }).click();
```

The side effect is powerful. Being able to grab an element with `getByRole` / `getByLabel` is **proof that the element has an accessible name.** The locator strategy itself becomes an a11y smoke test ([WCAG 2.2 implementation guide](/blog/react-nextjs-web-accessibility-wcag22-guide)).

---

## 3. Web-first assertions: discard fixed sleeps

Playwright's assertions **auto-wait and retry.** Playwright takes care of "until it's visible," "until it's enabled," so `waitForTimeout` is basically unnecessary. A fixed sleep is the biggest cause of flakes.

```ts
// ❌ フレークの温床：環境が遅いと落ち、速いと無駄に待つ
await page.click("button");
await page.waitForTimeout(3000);
expect(await page.isVisible(".success")).toBe(true);

// ✅ 条件が満たされるまで自動で待つ（タイムアウトまで再試行）
await page.getByRole("button", { name: "送信する" }).click();
await expect(page.getByText("お問い合わせを送信しました")).toBeVisible();
```

> Exception: only when verifying an intentional time gate (e.g., a spam measure of "submission is disabled until 2.5 seconds after display") is waiting that time legitimate. Distinguish "a sleep to paper over uncertainty" from "a wait as a specification."

---

## 4. Mocking external APIs: don't reach the real thing

In E2E, you **must not send external side effects** like email, payment, or billing **to the real thing** (a triple problem of cost, reliability, security). Intercept the request with `page.route`, verify the shape of the payload, and return an arbitrary response.

```ts
import { test, expect, type Page, type Route } from "@playwright/test";

// /api/contact を傍受し、Resend 等の本物に到達させない。
// 送信ボディの形を確認しつつ、指定のステータス/JSON を返す。
async function mockContactApi(
  page: Page,
  { status = 200, body = { success: true }, onRequest }: {
    status?: number;
    body?: Record<string, unknown>;
    onRequest?: (payload: unknown) => void;
  } = {},
) {
  await page.route("**/api/contact", async (route: Route) => {
    onRequest?.(route.request().postDataJSON());
    await route.fulfill({ status, contentType: "application/json", body: JSON.stringify(body) });
  });
}
```

With this, you can **deterministically** reproduce "normal," "429 (rate-limited)," and "500 (server error)." You can verify resilience (the UI on failure) too, without waiting for external services.

```ts
test("レート制限時の文言を出す", async ({ page }) => {
  await page.goto("/contact");
  await mockContactApi(page, { status: 429, body: { error: "rate limited" } });
  // ...入力して送信...
  await expect(page.getByText("短時間に複数回送信されています")).toBeVisible();
});
```

Capture the shape of the submission body with `onRequest` and verify it with `toMatchObject`, and you can even guarantee that it reached "with the correct payload, exactly once."

---

## 5. Build accessibility into E2E

With `@axe-core/playwright`, automatically inspect WCAG violations against the real DOM. Even if contrast or labels break on a design change, you can notice in CI.

```ts
import { test, expect } from "@playwright/test";
import AxeBuilder from "@axe-core/playwright";

test("主要ページに重大な a11y 違反がない", async ({ page }) => {
  await page.goto("/");
  const results = await new AxeBuilder({ page })
    .withTags(["wcag2a", "wcag2aa", "wcag22aa"])
    .analyze();
  expect(results.violations).toEqual([]);
});
```

Automated inspection isn't omnipotent (it can detect the 30–50% that's machine-judgeable). The feel of keyboard operation and a screen reader needs manual confirmation. Even so, the value of "stopping regressions in CI" is immense.

---

## 6. Testing SEO plumbing: verify the output after rendering

Structured data (JSON-LD), sitemap, RSS, and robots are hard to notice on screen even when broken. With E2E, you can directly verify the output after rendering.

```ts
test("ブログ記事に BlogPosting 構造化データが出る", async ({ page }) => {
  await page.goto("/blog/tanstack-query");
  const raw = await page
    .locator('script[type="application/ld+json"]')
    .first()
    .textContent();
  const data = JSON.parse(raw ?? "{}");
  expect(data["@type"]).toBe("BlogPosting");
  expect(data.headline).toBeTruthy();
});
```

Keeping SEO elements directly tied to inflow in a "fails if broken" state lets you be reassured on every release.

---

## 7. A configuration that runs against the production build

`next dev` has development-only behavior (unoptimized, extra logs). Run E2E against the **production build (`next start`)** and verify under conditions close to production. Playwright's `webServer` manages starting the server.

```ts
// playwright.config.ts（要点）
import { defineConfig, devices } from "@playwright/test";
const PORT = Number(process.env.PLAYWRIGHT_PORT ?? 3100);

export default defineConfig({
  testDir: "./tests/e2e",
  expect: { timeout: 7_500 },
  fullyParallel: true,
  forbidOnly: Boolean(process.env.CI),     // CI で .only を禁止
  retries: process.env.CI ? 2 : 0,         // 再試行は CI のみ
  use: {
    baseURL: `http://127.0.0.1:${PORT}`,
    trace: "on-first-retry",               // 失敗を再現する証拠を残す
    screenshot: "only-on-failure",
    video: "retain-on-failure",
  },
  projects: [
    { name: "chromium-desktop", use: { ...devices["Desktop Chrome"] } },
    { name: "mobile-safari", use: { ...devices["iPhone 14"] } }, // モバイルも検証
  ],
  webServer: {
    command: `npx next start --port ${PORT}`, // 専用ポートで本番ビルドを起動
    url: `http://127.0.0.1:${PORT}`,
    reuseExistingServer: !process.env.CI,     // ローカルは再利用、CI は毎回新規
    timeout: 120_000,
  },
});
```

Using a dedicated port (e.g., 3100) avoids collision even with `next dev` running on another terminal. With both desktop and mobile projects, you can catch responsive breakage too.

---

## 8. Observability and flake countermeasures (reliability)

An E2E that "fails occasionally" loses trust in the whole test suite. A design to make it not fail is essential.

- **Leave traces, screenshots, and video on failure.** With `trace: "on-first-retry"`, you can follow even hard-to-reproduce failures chronologically in the Trace Viewer.
- **Retries CI-only, few times.** Setting `retries: 0` locally lets you discover flakes during development. CI is 1–2.
- **Keep tests independent.** Don't depend on shared state. Each test sets up itself (`goto` in `beforeEach`).
- **Design on the premise of parallel execution.** Use test-specific data so it doesn't break even with `fullyParallel`.
- **Don't use fixed sleeps** (Chapter 3).

---

## 9. Running in CI: optimizing speed and cost

- **Split-parallelize with sharding.** Divide jobs like `--shard=1/4` and merge the blob reports later to shorten total time.
- **Run in stages.** Stage it so PRs run major browsers + important flows, and merges to main run all browsers + all scenarios — this raises cost-efficiency.
- **Cache browser binaries.** Cache the `npx playwright install` fetch to reduce CI time.
- **Make the HTML report an artifact.** It speeds up investigation on failure.

"Everything, every time, on all browsers" tends to be YAGNI. Design the run scope against the risks you want to protect.

---

## 10. Antipatterns

- ❌ **Wait with `waitForTimeout`.** The main cause of flakes. Auto-wait with web-first assertions.
- ❌ **Depend on CSS/XPath selectors.** Break on a design change. Grab with role/label.
- ❌ **Reach the real external API (payment, email).** A problem of cost, reliability, security. Mock with `page.route`.
- ❌ **Test against `next dev`.** Verify with the production build (`next start`).
- ❌ **Share state between tests.** Order dependence produces flakes and hard investigation. Keep each test independent.
- ❌ **Write everything as E2E.** Pure logic to unit. E2E concentrates on high-value flows.
- ❌ **Test implementation details (internal state, functions).** Verify user-visible behavior.

---

## 11. FAQ

**Q. How many E2E should I write?**
A. The basis is to narrow to "flows that are fatal if broken." In the test-trophy way of thinking, thicken the base with unit and integration, and make E2E a small elite force.

**Q. `getByRole` or `getByTestId`, which to use?**
A. First `getByRole` / `getByLabel` (user-viewpoint = effective for a11y too). Use `data-testid` as the last resort when you can't grab uniquely with that.

**Q. How do I handle a flow that requires login?**
A. The standard is to save the auth state in `storageState` and reuse it. Not stepping through the login UI every test makes it fast and stable.

**Q. Flakes (unstable tests) won't stop.**
A. The cause is mostly one of fixed sleep, order dependence, shared state, or implementation-detail dependence. Crush them with web-first assertions and independence, and identify the root cause with the Trace Viewer.

**Q. What's the difference between Playwright and Cypress?**
A. Playwright is strong in multi-browser (Chromium/WebKit/Firefox) support, parallel execution, auto-waiting, and traces. It also pairs well with production-build verification and CI parallelism.

---

## Conclusion: E2E is a design artifact that builds "release trust"

E2E tests don't bring reassurance just by being written. Only when they're **unbreakable, fast, and deterministic** can a team release with trust.

1. **Divide roles.** Pure logic to unit, flows/SEO/a11y to E2E.
2. **Role-based locators** to get both unbreakability and a11y.
3. **Web-first assertions** to eliminate fixed sleeps and cut flakes.
4. **Mock external APIs** to protect cost, reliability, and security.
5. **Run against the production build**, and make it withstand operation with observability (traces) and CI parallelism.

When tests are trustworthy, development speed rises. The state of "changes aren't scary" is the essence of maintainability and extensibility.

**If you need to build production-grade E2E / test infrastructure, or to create a quality-assurance mechanism, feel free to consult me.** The case study below introduces the process of designing and implementing an internal platform bundling multiple services, emphasizing quality and reliability.