Skip to main content
友田 陽大
Security engineering & career
セキュリティ
インシデント対応
CSIRT
NIST
セキュリティエンジニア

Incident-response practical guide [2026 edition]: CSIRT, Runbooks, and automated containment aligned with NIST SP 800-61 Rev.3 (CSF 2.0)

A practical guide to designing security incident response (IR) at production quality. Centered on the new framework of NIST SP 800-61 Rev.3 (the CSF 2.0 Community Profile) revamped in 2025, it explains the CSIRT structure, severity triage, idempotent containment Runbooks, blameless postmortems, and SOAR automation, with real code faithful to official information.

Published
Reading time
10 min read
Author
友田 陽大
Share

The question that divides security maturity isn't "will you be breached." It's "when breached, how fast and correctly can you act." Perfect defense doesn't exist. So top-tier organizations stand on the premise that "a breach will certainly happen" and invest in preparation to not panic at that moment.

The global standard for that preparation is NIST's SP 800-61. An important caveat — this guidance was significantly revamped in April 2025 (Rev.3). The four independent phases that many old articles explain — "preparation → detection & analysis → containment, eradication & recovery → post-incident activity" — are the Rev.2 (old) framework. This article explains incident-response design faithful to the latest Rev.3, with real code.

This article's positioning: it's the core skill carrying NIST CSF 2.0's "Respond" and "Recover." For the occupation as a whole, see how to become a security engineer; for the pre-response stage (detection that notices anomalies), log design / detection engineering; for automated response (SOAR) on AWS, GuardDuty automated incident response.


0. NIST "revamped" incident response — Rev.3's new framework

NIST SP 800-61 Rev.3 (finalized April 2025) replaced the old "Computer Security Incident Handling Guide." The biggest change is that it redefined incident response not as an independent process but as part of CSF 2.0 risk management (a Community Profile).

Why change it? NIST explains it like this — the "concrete way" of incident response differs greatly by technology, environment, and organization, and changes fast. So they stopped binding it as a fixed procedure manual and instead show it as "outcomes" aligned with CSF 2.0's functions.

The new framework organizes CSF's six functions into two groups.

GroupCSF functionMeaning in incident response
PreparationGovernDefine policy, roles, responsibilities, supply chain
IdentifyGrasp assets, risks, vulnerabilities
ProtectTake defensive measures to make damage less likely
Incident ResponseDetectDiscover, analyze, and triage incidents
RespondContain, eradicate, and notify stakeholders
RecoverReturn to normal operations, strengthen monitoring
Lessons LearnedIdentify: ImprovementFeed learnings back into the next "preparation"

This cycle is the core — preparation → detection → response → recovery → lessons → and lessons strengthen the next preparation. Incident response became not a one-off firefight but a risk-management loop the organization keeps running. Below, let's look at the practice along this flow.


1. Preparation (Govern / Identify / Protect) — 90% is decided here

The success or failure of incident response is decided almost entirely by preparation before it happens. Searching for the fire extinguisher after a fire starts is too late. Let me concretize what to prepare.

  • Define the CSIRT (response structure). Who commands (the incident commander), who responds technically, who notifies legal, PR, and management. Decide it as a "role," not a person (so it works even when the assignee is on vacation).
  • Contact network and escalation. Per severity, who to tell, when, and how. Include criteria for external contact (JPCERT/CC, the supervisory authority, customers).
  • Pre-grant authority. Who can exercise the authority to "isolate a server for containment" or "rotate keys" in an emergency. Without deciding in advance, the initial response stalls.
  • Prepare Runbooks (procedures). Prepare response procedures per typical scenario (account compromise, ransomware, data leak, etc.) as the "code" described below.
  • Drills. Confirm you can actually act with a tabletop exercise. A procedure's holes are only visible once used.

Involving management is also part of preparation. METI's Cybersecurity Management Guidelines position setting up an incident-response structure as a management responsibility and provide an appendix guide to building it.


2. Detection & triage (Detect) — judge severity instantly

Once you detect an anomaly (→ detection engineering), first triage — quickly judge "is this a real incident or a false positive; if real, how serious." The crux is to decide the severity criteria in advance.

SeverityExampleInitial response
CriticalProduction data leak, ransomware, compromise of the auth platformImmediately convene the CSIRT, appoint a commander, notify management
HighCompromise of a privileged account, unauthorized access to a key systemStart response within 1 hour
MediumCompromise of a general account, limited unauthorized operationRespond within the business day
LowTraces of recon, minor policy violationRecord, strengthen monitoring

The most important thing in triage is to "fall to the serious side when unsure." Finding out later "it was nothing" is healthy. Conversely, underestimating and leaving it produces the worst result.


3. Respond — contain with an idempotent Runbook

Containment is the top-priority action to stop the spread of damage. What's important here is to hold the response procedure as "idempotent code (Runbook as Code)." Manual work breeds mistakes, and especially in a late-night emergency, mistakes increase.

A good containment Runbook needs three properties — idempotency (the same result no matter how many times you run it), dry-run (you can confirm the impact before executing), and a human approval gate (don't auto-execute destructive operations).

// containment-runbook.ts — アカウント侵害時の封じ込めRunbook。
// 冪等・dry-run・承認ゲートを備え、深夜の緊急時でも安全に実行できる。
import { z } from "zod";

const Options = z.object({
  userId: z.string().uuid(),
  dryRun: z.boolean().default(true),     // ① 既定はdry-run。実行は明示的に。
  approvedBy: z.string().optional(),     // ② 破壊的操作には承認者が必須。
});
type Options = z.infer<typeof Options>;

interface Step {
  readonly name: string;
  readonly destructive: boolean;         // 破壊的=承認ゲートを要する
  run(userId: string): Promise<void>;
}

// 各ステップは冪等:既に無効化済みでも、再実行で同じ最終状態に収束する。
const STEPS: readonly Step[] = [
  { name: "全セッションを失効", destructive: false,
    run: (id) => revokeAllSessions(id) },          // 何度呼んでも「失効済み」に収束
  { name: "APIトークンを失効", destructive: false,
    run: (id) => revokeApiTokens(id) },
  { name: "アカウントを一時停止", destructive: true,
    run: (id) => suspendAccount(id) },             // 破壊的:承認が要る
  { name: "認証情報の強制リセット", destructive: true,
    run: (id) => forcePasswordReset(id) },
];

export async function containAccount(rawOptions: unknown): Promise<void> {
  const opts: Options = Options.parse(rawOptions); // 境界で検証

  for (const step of STEPS) {
    // ③ 破壊的ステップは、承認者なしには実行しない(dry-runでも一貫して警告)。
    if (step.destructive && !opts.approvedBy) {
      console.warn(`⏸  [承認待ち] ${step.name}(--approvedBy が必要)`);
      continue;
    }
    if (opts.dryRun) {
      console.log(`🔍 [dry-run] ${step.name} を実行予定`);
      continue; // ④ dry-runでは副作用を起こさず、何が起きるかだけ示す
    }
    await step.run(opts.userId); // 冪等なので、途中失敗後の再実行も安全
    console.log(`✅ ${step.name} 完了`);
  }
}

The value of this design is that "even amid panic, it only falls to the safe side." The default is dry-run, destructive operations require approval, and each step is idempotent — so even if you mistype one command in an emergency, it's hard to cause secondary damage (like collaterally locking out legitimate users). Meaning to contain but breaking your own service — this is an actual common secondary disaster, and idempotency and the approval gate prevent it.

Eradication — removing malware, fixing the exploited vulnerability — is done after containment. Rushing eradication without containing erases evidence, tips off the attacker, and expands the damage. The order matters.


4. Recover — return to normal safely

Once containment and eradication are done, return to normal operations. Care is needed here too.

  • Restore from a clean state. Restoring from a backup in the breached state is meaningless. Use a normal pre-breach backup (backup/PITR design).
  • Return in stages. Rather than going full-open at once, restore the service in stages while strengthening monitoring.
  • Strengthen monitoring. Raise detection sensitivity for a while after recovery to catch whether the same attacker returns.
  • Define the recovery completion criteria. Don't leave "what counts as resolved" ambiguous.

Recovery's goal isn't just "return to normal" but "return in a state where the same attack never gets through again."


5. Lessons Learned — fix the mechanism, not the person

What Rev.3 emphasizes as "Identify: Improvement" is the post-incident retrospective (postmortem). This is the only place where an organization grows stronger.

The biggest principle is to be blameless (don't blame the individual). In a culture that asks "who made the mistake," people hide facts and you never reach the true cause again. What you should ask is —

  • What happened (the chronological facts)
  • Why couldn't it be prevented / noticed (the mechanism's flaw)
  • How to prevent recurrence (improving the mechanism)
  • Were detection and response fast (measuring MTTD/MTTR)

Don't stop at "Mr. A fell for phishing"; dig down to "why was it designed so that one click steals credentials (what about MFA?)." Fix the mechanism's flaw, not people's attentiveness — this learning, through CSF's cycle, strengthens the next "preparation." This is the very turning point that changes an incident from a "loss" into an "investment."


6. Connect to automation (SOAR)

Once mature, automate part of detection-to-response with SOAR (Security Orchestration, Automation and Response). The principle is the same as this article's Runbook — "notify broadly, contain narrowly, gate destruction through a human."

For example, on AWS, the standard is to receive GuardDuty detections with EventBridge, automate idempotent initial response (notification, isolation) with Lambda/Step Functions, and interpose human approval only for destructive operations. The concrete implementation is summarized in GuardDuty automated incident response (SOAR). Even when automating, don't remove the safety devices of idempotency and the approval gate — this is the iron rule.


7. FAQ

Q. Does even a small company need a CSIRT? A. A dedicated team isn't needed, but deciding "who does what during an incident" is essential. One person can hold multiple roles. What matters is creating a state where, in an emergency, you don't end up at "so, who decides?"

Q. Does knowledge of the old version (four phases) become useless? A. No. The practical flow of "preparation, detection, containment, recovery, lessons" itself is still valid. Rev.3 integrated it into CSF 2.0 risk management and repositioned it as a whole-organization endeavor. The thinking is continuous.

Q. When an incident occurs, what should I do first? A. (1) Calmly triage the severity, (2) decide on one commander, and (3) start keeping records. Before rushing to act, establishing "who commands and what is recorded" decides the quality of the initial response.

Q. What if a personal-data leak is suspected? A. In Japan, reporting obligations to the Personal Information Protection Commission (a preliminary and a definite report) and notifying the individuals are legally required. Involve legal and PR early, and consult JPCERT/CC and experts. You need to proceed with legal and external responses in parallel with the technical response.

Q. How many Runbooks should I prepare? A. Start with high-probability scenarios (account compromise, phishing, ransomware, public leak). You don't need to have them all from the start. Building one, running it in a drill, and adding more while filling holes is realistic.


8. Summary

Incident response is mature preparation that stands on the premise that "a breach will certainly happen."

  • Preparation is 90%. Set up the CSIRT roles, contact network, authority, Runbooks, and drills before it happens.
  • Align with NIST's new framework (Rev.3). Incident response is part of CSF 2.0 risk management. The cycle of preparation (Govern, Identify, Protect) and response (Detect, Respond, Recover).
  • Triage falls to the serious side. Define severity criteria in advance, and when unsure, weigh it heavily.
  • Contain with an idempotent Runbook. Fall to the safe side even amid panic, with dry-run, the approval gate, and idempotency.
  • Fix the mechanism with lessons. With a blameless postmortem, fix the design flaw, not people, and return it to the next preparation.

If you want accompaniment in building an incident-response structure, preparing Runbooks, and designing tabletop exercises — or want to inspect "can we really act when it counts" before release — feel free to reach out. Like "the power to build fast," "preparation to respond correctly" is also an investment that supports business continuity.


References (official primary sources)

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

I can take on the implementation from this article as an engagement

Security engineering, from design to implementation and operations

Design reviews via threat modeling, correct implementation of crypto and authn/authz, log design and detection (detection engineering), and building an incident-response capability. With experience building — solo × generative AI — a METI Minister's Award B2B SaaS and a payments platform with zero double charges in production, I help get your product to a state where it ships fast and can be defended. Vertical risks that only design can address, I also take on as an audit.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading