The question that divides security maturity isn't "will you be breached." It's "when breached, how fast and correctly can you act." Perfect defense doesn't exist. So top-tier organizations stand on the premise that "a breach will certainly happen" and invest in preparation to not panic at that moment.
The global standard for that preparation is NIST's SP 800-61. An important caveat — this guidance was significantly revamped in April 2025 (Rev.3). The four independent phases that many old articles explain — "preparation → detection & analysis → containment, eradication & recovery → post-incident activity" — are the Rev.2 (old) framework. This article explains incident-response design faithful to the latest Rev.3, with real code.
This article's positioning: it's the core skill carrying NIST CSF 2.0's "Respond" and "Recover." For the occupation as a whole, see how to become a security engineer; for the pre-response stage (detection that notices anomalies), log design / detection engineering; for automated response (SOAR) on AWS, GuardDuty automated incident response.
0. NIST "revamped" incident response — Rev.3's new framework
NIST SP 800-61 Rev.3 (finalized April 2025) replaced the old "Computer Security Incident Handling Guide." The biggest change is that it redefined incident response not as an independent process but as part of CSF 2.0 risk management (a Community Profile).
Why change it? NIST explains it like this — the "concrete way" of incident response differs greatly by technology, environment, and organization, and changes fast. So they stopped binding it as a fixed procedure manual and instead show it as "outcomes" aligned with CSF 2.0's functions.
The new framework organizes CSF's six functions into two groups.
| Group | CSF function | Meaning in incident response |
|---|---|---|
| Preparation | Govern | Define policy, roles, responsibilities, supply chain |
| Identify | Grasp assets, risks, vulnerabilities | |
| Protect | Take defensive measures to make damage less likely | |
| Incident Response | Detect | Discover, analyze, and triage incidents |
| Respond | Contain, eradicate, and notify stakeholders | |
| Recover | Return to normal operations, strengthen monitoring | |
| Lessons Learned | Identify: Improvement | Feed learnings back into the next "preparation" |
This cycle is the core — preparation → detection → response → recovery → lessons → and lessons strengthen the next preparation. Incident response became not a one-off firefight but a risk-management loop the organization keeps running. Below, let's look at the practice along this flow.
1. Preparation (Govern / Identify / Protect) — 90% is decided here
The success or failure of incident response is decided almost entirely by preparation before it happens. Searching for the fire extinguisher after a fire starts is too late. Let me concretize what to prepare.
- Define the CSIRT (response structure). Who commands (the incident commander), who responds technically, who notifies legal, PR, and management. Decide it as a "role," not a person (so it works even when the assignee is on vacation).
- Contact network and escalation. Per severity, who to tell, when, and how. Include criteria for external contact (JPCERT/CC, the supervisory authority, customers).
- Pre-grant authority. Who can exercise the authority to "isolate a server for containment" or "rotate keys" in an emergency. Without deciding in advance, the initial response stalls.
- Prepare Runbooks (procedures). Prepare response procedures per typical scenario (account compromise, ransomware, data leak, etc.) as the "code" described below.
- Drills. Confirm you can actually act with a tabletop exercise. A procedure's holes are only visible once used.
Involving management is also part of preparation. METI's Cybersecurity Management Guidelines position setting up an incident-response structure as a management responsibility and provide an appendix guide to building it.
2. Detection & triage (Detect) — judge severity instantly
Once you detect an anomaly (→ detection engineering), first triage — quickly judge "is this a real incident or a false positive; if real, how serious." The crux is to decide the severity criteria in advance.
| Severity | Example | Initial response |
|---|---|---|
| Critical | Production data leak, ransomware, compromise of the auth platform | Immediately convene the CSIRT, appoint a commander, notify management |
| High | Compromise of a privileged account, unauthorized access to a key system | Start response within 1 hour |
| Medium | Compromise of a general account, limited unauthorized operation | Respond within the business day |
| Low | Traces of recon, minor policy violation | Record, strengthen monitoring |
The most important thing in triage is to "fall to the serious side when unsure." Finding out later "it was nothing" is healthy. Conversely, underestimating and leaving it produces the worst result.
3. Respond — contain with an idempotent Runbook
Containment is the top-priority action to stop the spread of damage. What's important here is to hold the response procedure as "idempotent code (Runbook as Code)." Manual work breeds mistakes, and especially in a late-night emergency, mistakes increase.
A good containment Runbook needs three properties — idempotency (the same result no matter how many times you run it), dry-run (you can confirm the impact before executing), and a human approval gate (don't auto-execute destructive operations).
// containment-runbook.ts — アカウント侵害時の封じ込めRunbook。
// 冪等・dry-run・承認ゲートを備え、深夜の緊急時でも安全に実行できる。
import { z } from "zod";
const Options = z.object({
userId: z.string().uuid(),
dryRun: z.boolean().default(true), // ① 既定はdry-run。実行は明示的に。
approvedBy: z.string().optional(), // ② 破壊的操作には承認者が必須。
});
type Options = z.infer<typeof Options>;
interface Step {
readonly name: string;
readonly destructive: boolean; // 破壊的=承認ゲートを要する
run(userId: string): Promise<void>;
}
// 各ステップは冪等:既に無効化済みでも、再実行で同じ最終状態に収束する。
const STEPS: readonly Step[] = [
{ name: "全セッションを失効", destructive: false,
run: (id) => revokeAllSessions(id) }, // 何度呼んでも「失効済み」に収束
{ name: "APIトークンを失効", destructive: false,
run: (id) => revokeApiTokens(id) },
{ name: "アカウントを一時停止", destructive: true,
run: (id) => suspendAccount(id) }, // 破壊的:承認が要る
{ name: "認証情報の強制リセット", destructive: true,
run: (id) => forcePasswordReset(id) },
];
export async function containAccount(rawOptions: unknown): Promise<void> {
const opts: Options = Options.parse(rawOptions); // 境界で検証
for (const step of STEPS) {
// ③ 破壊的ステップは、承認者なしには実行しない(dry-runでも一貫して警告)。
if (step.destructive && !opts.approvedBy) {
console.warn(`⏸ [承認待ち] ${step.name}(--approvedBy が必要)`);
continue;
}
if (opts.dryRun) {
console.log(`🔍 [dry-run] ${step.name} を実行予定`);
continue; // ④ dry-runでは副作用を起こさず、何が起きるかだけ示す
}
await step.run(opts.userId); // 冪等なので、途中失敗後の再実行も安全
console.log(`✅ ${step.name} 完了`);
}
}
The value of this design is that "even amid panic, it only falls to the safe side." The default is dry-run, destructive operations require approval, and each step is idempotent — so even if you mistype one command in an emergency, it's hard to cause secondary damage (like collaterally locking out legitimate users). Meaning to contain but breaking your own service — this is an actual common secondary disaster, and idempotency and the approval gate prevent it.
Eradication — removing malware, fixing the exploited vulnerability — is done after containment. Rushing eradication without containing erases evidence, tips off the attacker, and expands the damage. The order matters.
4. Recover — return to normal safely
Once containment and eradication are done, return to normal operations. Care is needed here too.
- Restore from a clean state. Restoring from a backup in the breached state is meaningless. Use a normal pre-breach backup (backup/PITR design).
- Return in stages. Rather than going full-open at once, restore the service in stages while strengthening monitoring.
- Strengthen monitoring. Raise detection sensitivity for a while after recovery to catch whether the same attacker returns.
- Define the recovery completion criteria. Don't leave "what counts as resolved" ambiguous.
Recovery's goal isn't just "return to normal" but "return in a state where the same attack never gets through again."
5. Lessons Learned — fix the mechanism, not the person
What Rev.3 emphasizes as "Identify: Improvement" is the post-incident retrospective (postmortem). This is the only place where an organization grows stronger.
The biggest principle is to be blameless (don't blame the individual). In a culture that asks "who made the mistake," people hide facts and you never reach the true cause again. What you should ask is —
- What happened (the chronological facts)
- Why couldn't it be prevented / noticed (the mechanism's flaw)
- How to prevent recurrence (improving the mechanism)
- Were detection and response fast (measuring MTTD/MTTR)
Don't stop at "Mr. A fell for phishing"; dig down to "why was it designed so that one click steals credentials (what about MFA?)." Fix the mechanism's flaw, not people's attentiveness — this learning, through CSF's cycle, strengthens the next "preparation." This is the very turning point that changes an incident from a "loss" into an "investment."
6. Connect to automation (SOAR)
Once mature, automate part of detection-to-response with SOAR (Security Orchestration, Automation and Response). The principle is the same as this article's Runbook — "notify broadly, contain narrowly, gate destruction through a human."
For example, on AWS, the standard is to receive GuardDuty detections with EventBridge, automate idempotent initial response (notification, isolation) with Lambda/Step Functions, and interpose human approval only for destructive operations. The concrete implementation is summarized in GuardDuty automated incident response (SOAR). Even when automating, don't remove the safety devices of idempotency and the approval gate — this is the iron rule.
7. FAQ
Q. Does even a small company need a CSIRT? A. A dedicated team isn't needed, but deciding "who does what during an incident" is essential. One person can hold multiple roles. What matters is creating a state where, in an emergency, you don't end up at "so, who decides?"
Q. Does knowledge of the old version (four phases) become useless? A. No. The practical flow of "preparation, detection, containment, recovery, lessons" itself is still valid. Rev.3 integrated it into CSF 2.0 risk management and repositioned it as a whole-organization endeavor. The thinking is continuous.
Q. When an incident occurs, what should I do first? A. (1) Calmly triage the severity, (2) decide on one commander, and (3) start keeping records. Before rushing to act, establishing "who commands and what is recorded" decides the quality of the initial response.
Q. What if a personal-data leak is suspected? A. In Japan, reporting obligations to the Personal Information Protection Commission (a preliminary and a definite report) and notifying the individuals are legally required. Involve legal and PR early, and consult JPCERT/CC and experts. You need to proceed with legal and external responses in parallel with the technical response.
Q. How many Runbooks should I prepare? A. Start with high-probability scenarios (account compromise, phishing, ransomware, public leak). You don't need to have them all from the start. Building one, running it in a drill, and adding more while filling holes is realistic.
8. Summary
Incident response is mature preparation that stands on the premise that "a breach will certainly happen."
- Preparation is 90%. Set up the CSIRT roles, contact network, authority, Runbooks, and drills before it happens.
- Align with NIST's new framework (Rev.3). Incident response is part of CSF 2.0 risk management. The cycle of preparation (Govern, Identify, Protect) and response (Detect, Respond, Recover).
- Triage falls to the serious side. Define severity criteria in advance, and when unsure, weigh it heavily.
- Contain with an idempotent Runbook. Fall to the safe side even amid panic, with dry-run, the approval gate, and idempotency.
- Fix the mechanism with lessons. With a blameless postmortem, fix the design flaw, not people, and return it to the next preparation.
If you want accompaniment in building an incident-response structure, preparing Runbooks, and designing tabletop exercises — or want to inspect "can we really act when it counts" before release — feel free to reach out. Like "the power to build fast," "preparation to respond correctly" is also an investment that supports business continuity.
References (official primary sources)
- Incident response: NIST SP 800-61 Rev.3 / NIST Incident Response project
- Framework: NIST CSF 2.0
- Domestic: JPCERT/CC / METI Cybersecurity Management Guidelines
- Related: log design / detection engineering / GuardDuty automated incident response