# Incident-response practical guide [2026 edition]: CSIRT, Runbooks, and automated containment aligned with NIST SP 800-61 Rev.3 (CSF 2.0)

> A practical guide to designing security incident response (IR) at production quality. Centered on the new framework of NIST SP 800-61 Rev.3 (the CSF 2.0 Community Profile) revamped in 2025, it explains the CSIRT structure, severity triage, idempotent containment Runbooks, blameless postmortems, and SOAR automation, with real code faithful to official information.

- Published: 2026-06-28
- Author: 友田 陽大
- Tags: セキュリティ, インシデント対応, CSIRT, NIST, セキュリティエンジニア
- URL: https://tomodahinata.com/en/blog/incident-response-nist-800-61r3-csirt-runbook-playbook-production-guide
- Category: Security engineering & career
- Pillar guide: https://tomodahinata.com/en/blog/security-engineer-how-to-become-roadmap-skills-certification-guide

## Key points

- A breach is not 'whether' but 'when' it happens. The quality of preparation decides the damage scale and recovery speed. 90% of incident response is decided by design 'before it happens.'
- NIST revamped its incident-response guidance in April 2025 (SP 800-61 Rev.3). It abolished the former four independent phases and redefined them as a 'Community Profile' integrated into CSF 2.0's six functions (Govern, Identify, Protect, Detect, Respond, Recover).
- The new framework has two groups: 'preparation' = Govern, Identify, Protect + lessons (continuous improvement), and 'response' = Detect, Respond, Recover. Incident response became part of risk management, run across the whole organization.
- Implement containment as an 'idempotent Runbook.' It yields the same result no matter how many times you run it, and destructive operations pass a human approval gate. A design that confirms the impact with a dry-run before executing prevents secondary damage.
- Post-recovery 'Lessons Learned' are most important. With a blameless postmortem, fix the mechanism's flaw, not the person. This returns to the next preparation as CSF's 'Identify-Improvement.'

---

The question that divides security maturity isn't "will you be breached." It's **"when breached, how fast and correctly can you act."** Perfect defense doesn't exist. So top-tier organizations stand on the premise that **"a breach will certainly happen"** and invest in preparation to not panic at that moment.

The global standard for that preparation is NIST's **[SP 800-61](https://csrc.nist.gov/pubs/sp/800/61/r3/final).** An important caveat — **this guidance was significantly revamped in April 2025 (Rev.3).** The four independent phases that many old articles explain — "preparation → detection & analysis → containment, eradication & recovery → post-incident activity" — are the **Rev.2 (old) framework.** This article explains incident-response design faithful to **the latest Rev.3**, with real code.

> **This article's positioning:** it's the core skill carrying NIST CSF 2.0's **"Respond" and "Recover."** For the occupation as a whole, see [how to become a security engineer](/blog/security-engineer-how-to-become-roadmap-skills-certification-guide); for the pre-response stage (detection that notices anomalies), [log design / detection engineering](/blog/security-logging-detection-engineering-sigma-mitre-attack-siem-guide); for automated response (SOAR) on AWS, [GuardDuty automated incident response](/blog/aws-guardduty-eventbridge-automated-remediation-incident-response-guide).

---

## 0. NIST "revamped" incident response — Rev.3's new framework

[NIST SP 800-61 Rev.3](https://csrc.nist.gov/pubs/sp/800/61/r3/final) (finalized April 2025) replaced the old "Computer Security Incident Handling Guide." The biggest change is that it **redefined incident response not as an independent process but as part of [CSF 2.0](/blog/security-engineer-how-to-become-roadmap-skills-certification-guide) risk management (a Community Profile).**

Why change it? NIST explains it like this — **the "concrete way" of incident response differs greatly by technology, environment, and organization, and changes fast. So they stopped binding it as a fixed procedure manual and instead show it as "outcomes" aligned with CSF 2.0's functions.**

The new framework organizes CSF's six functions into **two groups.**

| Group | CSF function | Meaning in incident response |
|---|---|---|
| **Preparation** | **Govern** | Define policy, roles, responsibilities, supply chain |
| | **Identify** | Grasp assets, risks, vulnerabilities |
| | **Protect** | Take defensive measures to make damage less likely |
| **Incident Response** | **Detect** | Discover, analyze, and triage incidents |
| | **Respond** | Contain, eradicate, and notify stakeholders |
| | **Recover** | Return to normal operations, strengthen monitoring |
| **Lessons Learned** | **Identify: Improvement** | Feed learnings back into the next "preparation" |

This cycle is the core — **preparation → detection → response → recovery → lessons → and lessons strengthen the next preparation.** Incident response became not a one-off firefight but **a risk-management loop the organization keeps running.** Below, let's look at the practice along this flow.

---

## 1. Preparation (Govern / Identify / Protect) — 90% is decided here

The success or failure of incident response is decided almost entirely by **preparation before it happens.** Searching for the fire extinguisher after a fire starts is too late. Let me concretize what to prepare.

- **Define the CSIRT (response structure).** Who commands (the incident commander), who responds technically, who notifies legal, PR, and management. **Decide it as a "role," not a person** (so it works even when the assignee is on vacation).
- **Contact network and escalation.** Per severity, who to tell, when, and how. Include criteria for external contact ([JPCERT/CC](https://www.jpcert.or.jp/), the supervisory authority, customers).
- **Pre-grant authority.** Who can exercise the authority to "isolate a server for containment" or "rotate keys" in an emergency. Without deciding in advance, the initial response stalls.
- **Prepare Runbooks (procedures).** Prepare response procedures per typical scenario (account compromise, ransomware, data leak, etc.) as the "code" described below.
- **Drills.** Confirm you can actually act with a tabletop exercise. A procedure's holes are only visible once used.

Involving management is also part of preparation. [METI's Cybersecurity Management Guidelines](https://www.meti.go.jp/policy/netsecurity/mng_guide.html) position setting up an incident-response structure as a management responsibility and provide an appendix guide to building it.

---

## 2. Detection & triage (Detect) — judge severity instantly

Once you detect an anomaly (→ [detection engineering](/blog/security-logging-detection-engineering-sigma-mitre-attack-siem-guide)), first **triage** — quickly judge "is this a real incident or a false positive; if real, how serious." The crux is to decide the severity criteria **in advance.**

| Severity | Example | Initial response |
|---|---|---|
| **Critical** | Production data leak, ransomware, compromise of the auth platform | Immediately convene the CSIRT, appoint a commander, notify management |
| **High** | Compromise of a privileged account, unauthorized access to a key system | Start response within 1 hour |
| **Medium** | Compromise of a general account, limited unauthorized operation | Respond within the business day |
| **Low** | Traces of recon, minor policy violation | Record, strengthen monitoring |

The most important thing in triage is to **"fall to the serious side when unsure."** Finding out later "it was nothing" is healthy. Conversely, underestimating and leaving it produces the worst result.

---

## 3. Respond — contain with an idempotent Runbook

Containment is the top-priority action to **stop the spread of damage.** What's important here is to **hold the response procedure as "idempotent code (Runbook as Code)."** Manual work breeds mistakes, and especially in a late-night emergency, mistakes increase.

A good containment Runbook needs three properties — **idempotency** (the same result no matter how many times you run it), **dry-run** (you can confirm the impact before executing), and **a human approval gate** (don't auto-execute destructive operations).

```ts
// containment-runbook.ts — アカウント侵害時の封じ込めRunbook。
// 冪等・dry-run・承認ゲートを備え、深夜の緊急時でも安全に実行できる。
import { z } from "zod";

const Options = z.object({
  userId: z.string().uuid(),
  dryRun: z.boolean().default(true),     // ① 既定はdry-run。実行は明示的に。
  approvedBy: z.string().optional(),     // ② 破壊的操作には承認者が必須。
});
type Options = z.infer<typeof Options>;

interface Step {
  readonly name: string;
  readonly destructive: boolean;         // 破壊的＝承認ゲートを要する
  run(userId: string): Promise<void>;
}

// 各ステップは冪等：既に無効化済みでも、再実行で同じ最終状態に収束する。
const STEPS: readonly Step[] = [
  { name: "全セッションを失効", destructive: false,
    run: (id) => revokeAllSessions(id) },          // 何度呼んでも「失効済み」に収束
  { name: "APIトークンを失効", destructive: false,
    run: (id) => revokeApiTokens(id) },
  { name: "アカウントを一時停止", destructive: true,
    run: (id) => suspendAccount(id) },             // 破壊的：承認が要る
  { name: "認証情報の強制リセット", destructive: true,
    run: (id) => forcePasswordReset(id) },
];

export async function containAccount(rawOptions: unknown): Promise<void> {
  const opts: Options = Options.parse(rawOptions); // 境界で検証

  for (const step of STEPS) {
    // ③ 破壊的ステップは、承認者なしには実行しない（dry-runでも一貫して警告）。
    if (step.destructive && !opts.approvedBy) {
      console.warn(`⏸  [承認待ち] ${step.name}（--approvedBy が必要）`);
      continue;
    }
    if (opts.dryRun) {
      console.log(`🔍 [dry-run] ${step.name} を実行予定`);
      continue; // ④ dry-runでは副作用を起こさず、何が起きるかだけ示す
    }
    await step.run(opts.userId); // 冪等なので、途中失敗後の再実行も安全
    console.log(`✅ ${step.name} 完了`);
  }
}
```

The value of this design is that **"even amid panic, it only falls to the safe side."** The default is dry-run, destructive operations require approval, and each step is idempotent — so even if you mistype one command in an emergency, it's hard to cause secondary damage (like collaterally locking out legitimate users). **Meaning to contain but breaking your own service** — this is an actual common secondary disaster, and idempotency and the approval gate prevent it.

Eradication — removing malware, fixing the exploited vulnerability — is done after containment. **Rushing eradication without containing erases evidence, tips off the attacker, and expands the damage.** The order matters.

---

## 4. Recover — return to normal safely

Once containment and eradication are done, return to normal operations. Care is needed here too.

- **Restore from a clean state.** Restoring from a backup in the breached state is meaningless. Use a normal pre-breach backup ([backup/PITR design](/blog/postgresql-backup-pitr-pg-dump-wal-archiving-guide)).
- **Return in stages.** Rather than going full-open at once, restore the service in stages while strengthening monitoring.
- **Strengthen monitoring.** Raise detection sensitivity for a while after recovery to catch whether the same attacker returns.
- **Define the recovery completion criteria.** Don't leave "what counts as resolved" ambiguous.

Recovery's goal isn't just "return to normal" but **"return in a state where the same attack never gets through again."**

---

## 5. Lessons Learned — fix the mechanism, not the person

What Rev.3 emphasizes as "Identify: Improvement" is the **post-incident retrospective (postmortem).** This is the only place where an organization grows stronger.

The biggest principle is to be **blameless (don't blame the individual).** In a culture that asks "who made the mistake," people hide facts and you never reach the true cause again. What you should ask is —

- **What happened** (the chronological facts)
- **Why couldn't it be prevented / noticed** (the mechanism's flaw)
- **How to prevent recurrence** (improving the mechanism)
- **Were detection and response fast** (measuring MTTD/MTTR)

Don't stop at "Mr. A fell for phishing"; dig down to **"why was it designed so that one click steals credentials (what about MFA?)."** Fix the mechanism's flaw, not people's attentiveness — this learning, through CSF's cycle, **strengthens the next "preparation."** This is the very turning point that changes an incident from a "loss" into an "investment."

---

## 6. Connect to automation (SOAR)

Once mature, automate part of detection-to-response with **SOAR (Security Orchestration, Automation and Response).** The principle is the same as this article's Runbook — **"notify broadly, contain narrowly, gate destruction through a human."**

For example, on AWS, the standard is to receive GuardDuty detections with EventBridge, automate idempotent initial response (notification, isolation) with Lambda/Step Functions, and interpose human approval only for destructive operations. The concrete implementation is summarized in [GuardDuty automated incident response (SOAR)](/blog/aws-guardduty-eventbridge-automated-remediation-incident-response-guide). Even when automating, **don't remove the safety devices of idempotency and the approval gate** — this is the iron rule.

---

## 7. FAQ

**Q. Does even a small company need a CSIRT?**
A. A dedicated team isn't needed, but **deciding "who does what during an incident" is essential.** One person can hold multiple roles. What matters is creating a state where, in an emergency, you don't end up at "so, who decides?"

**Q. Does knowledge of the old version (four phases) become useless?**
A. No. The practical flow of "preparation, detection, containment, recovery, lessons" itself is still valid. Rev.3 **integrated it into CSF 2.0 risk management and repositioned it as a whole-organization endeavor.** The thinking is continuous.

**Q. When an incident occurs, what should I do first?**
A. (1) Calmly triage the severity, (2) decide on one commander, and (3) start keeping records. Before rushing to act, establishing **"who commands and what is recorded"** decides the quality of the initial response.

**Q. What if a personal-data leak is suspected?**
A. In Japan, reporting obligations to the Personal Information Protection Commission (a preliminary and a definite report) and notifying the individuals are legally required. **Involve legal and PR early**, and consult JPCERT/CC and experts. You need to proceed with legal and external responses in parallel with the technical response.

**Q. How many Runbooks should I prepare?**
A. Start with high-probability scenarios (account compromise, phishing, ransomware, public leak). You don't need to have them all from the start. Building one, running it in a drill, and adding more while filling holes is realistic.

---

## 8. Summary

Incident response is mature preparation that stands on the premise that **"a breach will certainly happen."**

- **Preparation is 90%.** Set up the CSIRT roles, contact network, authority, Runbooks, and drills before it happens.
- **Align with NIST's new framework (Rev.3).** Incident response is part of CSF 2.0 risk management. The cycle of preparation (Govern, Identify, Protect) and response (Detect, Respond, Recover).
- **Triage falls to the serious side.** Define severity criteria in advance, and when unsure, weigh it heavily.
- **Contain with an idempotent Runbook.** Fall to the safe side even amid panic, with dry-run, the approval gate, and idempotency.
- **Fix the mechanism with lessons.** With a blameless postmortem, fix the design flaw, not people, and return it to the next preparation.

If you want accompaniment in building an incident-response structure, preparing Runbooks, and designing tabletop exercises — or want to inspect "can we really act when it counts" before release — feel free to reach out. Like "the power to build fast," "preparation to respond correctly" is also an investment that supports business continuity.

---

### References (official primary sources)

- Incident response: [NIST SP 800-61 Rev.3](https://csrc.nist.gov/pubs/sp/800/61/r3/final) / [NIST Incident Response project](https://csrc.nist.gov/projects/incident-response)
- Framework: [NIST CSF 2.0](https://www.nist.gov/cyberframework)
- Domestic: [JPCERT/CC](https://www.jpcert.or.jp/) / [METI Cybersecurity Management Guidelines](https://www.meti.go.jp/policy/netsecurity/mng_guide.html)
- Related: [log design / detection engineering](/blog/security-logging-detection-engineering-sigma-mitre-attack-siem-guide) / [GuardDuty automated incident response](/blog/aws-guardduty-eventbridge-automated-remediation-incident-response-guide)
