"I installed GuardDuty, but can I notice if something weird runs inside a container?" — in a setting where I'm consulted about container-platform security, this is a question that hits the core.
The answer is "with the foundational GuardDuty alone, you can't notice." Agentless foundational detection (CloudTrail, VPC Flow Logs, DNS) sees AWS's control-plane operations and the outline of the network. But OS-level behavior that happens inside the workload — like "an unknown binary was executed inside a container," "/etc/shadow was tampered with," or "a reverse shell was set up" — isn't visible from the outside. What fills this in is GuardDuty Runtime Monitoring.
This article is an implementation guide for designing and operating Runtime Monitoring at production quality across the three surfaces of EKS / ECS-Fargate / EC2. GuardDuty's overall design (protection-plan selection, organizational control, EventBridge auto-response) is left to this pillar article; this piece concentrates on digging one level deeper into that pillar's single line "Runtime Monitoring" — down to the distribution architecture, coverage confirmation, cost, and troubleshooting. I cross-implemented IAM, observability, and DR on a multi-account-AWS serverless payment platform, and have run production workloads on ECS on Fargate — on that experience as a foundation, I'll talk about "how to design and implement Runtime Monitoring in your environment."
The rules of this article: the specs, supported OS/architecture, kernel requirements, the agent's resource limits, and the pricing model are based on the AWS official documentation (as of June 2026). Since the supported OS/kernel/Kubernetes versions are updated frequently, always confirm the official "support matrix (verified platforms)" before going to production (in the body too I avoid enumerating specific versions and show the confirmation procedure). And one more thing — Runtime Monitoring is one feature (one layer) of GuardDuty, and doesn't substitute for WAF, least-privilege IAM, image scanning, or network isolation. Since the cost is proportional to protected vCPU, the basics of the design are not "put it in everything" but "narrow it to important workloads."
0. Mental model: Runtime Monitoring is "the eye that sees inside the workload"
Before starting the design, let's fix in one line what Runtime Monitoring complements about foundational detection.
Runtime Monitoring = GuardDuty's optional protection plan that places an eBPF security agent on EKS / ECS-Fargate / EC2 workloads, observes OS-level "process execution, file access, and network connections" from the inside, and generates findings.
The official definition goes like this — "Runtime Monitoring observes and analyzes operating system-level, networking, and file events", and the agent visualizes "file access, process execution, command line arguments, and network connections." From here, three consequences emerge.
- Foundational detection is "the outline," Runtime Monitoring is "the inside." The foundational VPC Flow Logs see "which IP it communicated with" from the outside of the network. Runtime Monitoring sees, from the inside of the host, down to "which process initiated that communication," "what the command-line arguments are," and "what the parent process is (process lineage)." For example, cryptocurrency mining can also be caught by domain with foundational DNS detection, but with Runtime Monitoring it captures the fact that "a miner binary was actually executed" as
Impact:Runtime/CryptoMinerExecuted. - To see, you need an "agent." Whereas foundational detection is agentless, Runtime Monitoring requires an eBPF-based security agent that runs on the workload. How to distribute this agent is the main subject of this article (the distribution differs between EKS / Fargate / EC2).
- "Enabled" ≠ "protected." Detection works only once the agent is correctly placed and runtime events reach GuardDuty via the VPC endpoint. What represents this "is it really arriving" is the coverage status (Healthy / Unhealthy). Leaving it
Unhealthybecomes a hole that, while you think you've enabled it, produces no findings at all (chapter 4).
Grasping these three points, you see that introducing Runtime Monitoring is the four designs of "① on which surface, ② how to distribute (automatic or manual), ③ confirm it's really arriving, and ④ control the cost." Let's build them in order.
1. Why foundational detection alone isn't enough: what Runtime Monitoring adds
As seen in the pillar article, enabling GuardDuty turns the foundational data sources on immediately. So what does Runtime Monitoring specifically make visible additionally? Lining it up along the attack kill chain makes the complementary relationship clear.
| Attack stage | What foundational detection (agentless) sees | What Runtime Monitoring sees additionally |
|---|---|---|
| Initial intrusion | Access from a known malicious IP (VPC Flow) | Suspicious shell spawning inside a container, new-binary execution |
| Execution | (hard to see) | Execution:Runtime/NewBinaryExecuted, ReverseShell, SuspiciousTool |
| Privilege escalation | Anomalies in IAM operations (CloudTrail) | Docker socket access, runc container escape, elevation to root |
| Defense evasion | (hard to see) | process injection, fileless execution, kernel module loading |
| Persistence | (hard to see) | modification of sensitive files (Persistence:Runtime/SensitiveFileModified) |
| C&C / exfiltration | DNS queries to a malicious domain | which process initiated that communication (with process lineage) |
The key point is that "foundational detection captures 'signs observable from outside,' Runtime Monitoring captures 'behavior observable only inside the host.'" The typical scenario the official cites goes like this — a single container running a vulnerable web app is compromised, and using misconfigured credentials as a foothold, access spreads to the entire account. This chain of "container compromise → privilege escalation → data access" can't be caught in the early stage unless you look inside the container.
And one more decisive thing is the relationship with Extended Threat Detection (attack sequences). Details are left to the attack-sequence article, but just the conclusion:
- The ECS attack sequence (
AttackSequence:ECS/CompromisedCluster) presupposes Runtime Monitoring on Fargate-ECS or EC2. - The EC2 attack sequence (
AttackSequence:EC2/CompromisedInstanceGroup) is strengthened by Runtime Monitoring.
In other words, if you "want to bundle ECS/EC2 multi-stage attacks into one Critical finding," Runtime Monitoring becomes a prerequisite, not an optional plan. This is the biggest reason "it's worth putting in for important workloads."
2. The three distribution surfaces: how the agent arrives
The essential difficulty of Runtime Monitoring is that the same "security agent" is distributed in completely different ways per resource type. The official collectively calls EKS, Fargate-ECS, and EC2 the "resource types."
| Resource type | Agent distribution form | Automatic management | Manual management |
|---|---|---|---|
| Amazon EKS | Managed add-on aws-guardduty-agent (placed on each node as a DaemonSet) | Possible (GuardDuty deploys/updates the add-on) | Possible (manage the add-on's lifecycle/version yourself) |
| ECS (AWS Fargate) | Inject a GuardDuty-managed sidecar container into the task | Possible | Not possible (automatic management only; no manual option) |
| Amazon EC2 | Introduce/update the agent on the host via SSM (AWS Systems Manager) | Possible (automatic deployment via SSM) | Possible (install/update yourself) |
Understanding this accurately is the foundation of the design judgments. Let's look at each of the three surfaces' "sweet spots."
2.1 EKS: the managed add-on aws-guardduty-agent
On EKS, the agent is distributed as the EKS managed add-on aws-guardduty-agent. Its substance is a DaemonSet within the cluster (1 Pod per node).
- Automatic management: GuardDuty deploys/updates the add-on as needed. "GuardDuty manages the security agent (EKS add-on) on your behalf, it updates the add-on, as needed". Default values are set for the configuration parameters (CPU/memory,
PriorityClass,dnsPolicy), but you can override them yourself if needed. - Manual management: you hold the add-on's version and lifecycle yourself. Choose this when you want to strictly pin the version / put it on the cluster's change-management process. In the manual case, you need to confirm yourself whether the Kubernetes version supports the agent version (there's an official correspondence table — described later).
EKS support scope (important): Runtime Monitoring supports EKS clusters on EC2 nodes and EKS Auto Mode. On the other hand, EKS Hybrid Nodes and EKS clusters on AWS Fargate are unsupported ("GuardDuty doesn't support Amazon EKS clusters running on AWS Fargate"). If you "run EKS on Fargate," you can't protect with this feature — always confirm at design time.
2.2 ECS-Fargate: the injected sidecar
On Fargate, since you can't touch the host OS, the agent is injected into the task as a sidecar container. Here's the most important constraint — Fargate (ECS) is automatic-management-only and has no manual-management option (the official also clearly states "with an exception to Fargate (Amazon ECS only)"). GuardDuty fully handles placement and updates.
If you run containers in production on Fargate, Runtime Monitoring is the surface with the lightest introduction. No host patching or SSM management is needed, and once enabled, GuardDuty takes care of the sidecar. In return, you have no room to control the version or placement yourself — this is a trade-off consistent with Fargate's "serverless, leave it to us" nature.
2.3 EC2: the agent via SSM
On EC2, the agent is distributed through SSM (AWS Systems Manager). "GuardDuty uses AWS Systems Manager (SSM) to automatically deploy, install, and manage the security agent on your instances".
- Premise of automatic management: the instance is under SSM management (a state shown in Fleet Manager). If you use automatic management, this is a hard requirement.
- Manual management: "If you plan to manually install and manage the GuardDuty agent, SSM is not required". In an environment that doesn't/can't use SSM (a special base image, etc.), choose manual.
- Exclusion tag: if you want to exclude a specific instance under automatic management, attach the
GuardDutyManaged:falsetag before launch and enable tag reference of the instance metadata (IMDS). It's effective for the "want to put it in everything but exclude only some" operation.
3. Enable it with Terraform: RUNTIME_MONITORING and the three toggles
Once the design is firm, drop it into code. Runtime Monitoring is added with the aws_guardduty_detector_feature resource, separate from the aws_guardduty_detector body.
# Runtime Monitoring を有効化し、各面のエージェント配置を GuardDuty に自動管理させる。
# コストが保護 vCPU に比例して増えるため、「本当にランタイム可視性が要る環境」に絞って有効化する。
resource "aws_guardduty_detector_feature" "runtime_monitoring" {
detector_id = aws_guardduty_detector.this.id
name = "RUNTIME_MONITORING"
status = "ENABLED"
# --- 面ごとに「エージェントを GuardDuty に自動管理させるか」を個別に切り替える ---
# ここを ENABLED にすると、その面のエージェント配置・更新を GuardDuty が担う(自動管理)。
# DISABLED にすると、Runtime Monitoring 自体は有効でも、その面のエージェントは
# 「手動管理」(自分でアドオン/SSM を回す)前提になる。
additional_configuration {
name = "EKS_ADDON_MANAGEMENT" # EKS: aws-guardduty-agent アドオンを自動デプロイ・更新
status = "ENABLED"
}
additional_configuration {
name = "ECS_FARGATE_AGENT_MANAGEMENT" # Fargate: サイドカーを自動注入(手動の選択肢なし)
status = "ENABLED"
}
additional_configuration {
name = "EC2_AGENT_MANAGEMENT" # EC2: SSM 経由でエージェントを自動管理
status = "ENABLED"
}
}
Three points of the design.
- The toggles are "per-surface automatic-management switches." On top of setting
RUNTIME_MONITORINGtoENABLED, switchEKS_ADDON_MANAGEMENT/ECS_FARGATE_AGENT_MANAGEMENT/EC2_AGENT_MANAGEMENTindividually. If you want to make only EKS manual, setEKS_ADDON_MANAGEMENTtoDISABLEDand manage the add-on separately (next section). - The EKS add-on can be made explicit as a separate resource. With automatic management (
EKS_ADDON_MANAGEMENT = ENABLED), GuardDuty installs the add-on, but if you manage it manually, make the EKS add-on explicit in Terraform.
# 【手動管理を選んだ場合のみ】EKS アドオンを自分で管理する。
# バージョンを固定し、クラスタの変更管理プロセスに乗せたいときに有効。
# 自動管理(EKS_ADDON_MANAGEMENT = ENABLED)なら、このリソースは不要。
resource "aws_eks_addon" "guardduty_agent" {
cluster_name = aws_eks_cluster.this.name
addon_name = "aws-guardduty-agent"
# バージョンは公式の対応表で「使用中の Kubernetes バージョンが
# そのエージェントバージョンをサポートするか」を確認してから固定する。
# ハードコードせず、変数 or data source で管理する(対応表は更新されるため)。
addon_version = var.guardduty_agent_addon_version
# 既存設定との衝突時の挙動。手動でパラメータ(CPU/メモリ等)を上書きするなら PRESERVE。
resolve_conflicts_on_update = "PRESERVE"
}
- If you enable it in bulk across the organization, you can roll out
RUNTIME_MONITORINGorg-wide with the same structure as the pillar article'saws_guardduty_organization_configuration_feature. But since it's a cost driver, estimate the cost before defaulting it ON org-wide (chapter 6).
The relationship with
EKS_RUNTIME_MONITORING(needs confirmation): historically, there was an independent feature calledEKS_RUNTIME_MONITORING, dedicated to EKS. Currently, consolidation intoRUNTIME_MONITORING(the unified feature that bundles EKS/ECS/EC2) is progressing. For new builds, usingRUNTIME_MONITORINGis the correct answer. Since the migration feasibility/deadline for existing environments usingEKS_RUNTIME_MONITORINGmay be revised, always confirm the current treatment in the official documentation (at the time of writing, the fact of "the consolidation direction" is confirmed; the specific deprecation schedule follows the official).
4. Confirm coverage: plug the "thought-I-enabled-it" hole
This is the point where accidents most often occur in Runtime Monitoring operation. Just "enabled it in Terraform" is insufficient, and you must separately confirm whether the agent is really delivering events. What represents this is the coverage status.
4.1 What Healthy / Unhealthy means
The official definition is clear. The coverage status becomes Healthy only when three conditions are all in place.
Coverage status is determined by making sure that you have enabled Runtime Monitoring, your Amazon VPC endpoint has been created, and the GuardDuty security agent for the corresponding resource has been deployed.
| Status | Meaning | Consequence |
|---|---|---|
| Healthy | The three points are in place — ① Runtime Monitoring enabled ② VPC endpoint created ③ agent placed — and runtime events reach GuardDuty | findings can be generated normally |
| Unhealthy | There's a problem in any of the above (config, VPC endpoint, agent placement) | GuardDuty can't receive events and generates no Runtime Monitoring findings at all |
What's decisive is — while Unhealthy, findings are zero. Whether "I enabled it but no alerts come" is because "it's peaceful" or because "I'm blind with Unhealthy" can't be distinguished unless you look at the coverage. So always make "enabling" and "confirming coverage Healthy" a set.
4.2 Confirm coverage with bash
GuardDuty has APIs that return coverage statistics and the status of individual resources. You can incorporate them into a CI or post-deploy verification script.
#!/usr/bin/env bash
# Runtime Monitoring のカバレッジを確認し、Unhealthy なリソースを洗い出す。
# デプロイ後の検証 or 定期監査に使う。「有効化したつもり」の穴を機械的に塞ぐ。
set -euo pipefail
DETECTOR_ID="${1:?Usage: check-coverage.sh <detector-id>}"
# ① 種別ごとのカバレッジ統計(Healthy/Unhealthy のカウント)を取得。
# 全体像を一目で掴む。
echo "=== Coverage statistics by resource type ==="
aws guardduty get-coverage-statistics \
--detector-id "$DETECTOR_ID" \
--statistics-type COUNT_BY_COVERAGE_STATUS \
--output table
# ② Unhealthy なリソースだけを列挙(個別の調査対象を特定)。
# フィルタで coverage status = UNHEALTHY のものに絞る。
echo "=== Unhealthy resources (need troubleshooting) ==="
aws guardduty list-coverage \
--detector-id "$DETECTOR_ID" \
--filter-criteria '{
"FilterCriterion": [
{ "CriterionKey": "COVERAGE_STATUS", "FilterCondition": { "Equals": ["UNHEALTHY"] } }
]
}' \
--query 'Resources[].{Id:ResourceId,Type:ResourceDetails.ResourceType,Status:CoverageStatus,Issue:Issue}' \
--output table
Build the verification path first: I recommend placing this script in a post-deploy hook in CI. Run it right after turning
RUNTIME_MONITORINGON in Terraform, and don't consider the deploy "complete" until you confirm the important workloads areHealthy— the most effective guard that structurally crushes "thought-I-enabled-it."
4.3 Typical causes of Unhealthy and troubleshooting
The cause of Unhealthy is that somewhere in the three conditions is missing. Triage per surface.
| Surface | Common Unhealthy cause | Direction of action |
|---|---|---|
| Common | VPC endpoint not created / unreachable (in manual management you need to create it yourself) | Confirm the endpoint's existence and SG/routes |
| EC2 | The instance is not under SSM management (doesn't meet the premise of automatic management) | Confirm the SSM Agent's introduction and IAM role |
| EC2 | The OS/kernel is out of support, or the CONFIG_DEBUG_INFO_BTF flag is unset | Confirm the official verified platforms |
| EKS | The add-on aws-guardduty-agent is not deployed / the K8s version is unsupported | Confirm the add-on's status and the supported-version table |
| EKS | The agent Pod has reached the CPU/memory limit | Adjust the add-on's resource settings (chapter 5) |
| Organization | An SCP denies guardduty:SendSecurityTelemetry | Confirm the SCP's permission boundary |
If you chose manual management, note that creating the VPC endpoint is your responsibility (with automatic management GuardDuty creates it). "Install agent manually, which requires you to create the VPC endpoint as a prerequisite." — forgetting this lands you in the typical pitfall where, even though the agent is running, events don't arrive and it stays Unhealthy.
5. Is the agent "lightweight": pin down the CPU/memory limits with numbers
When you hear "put a security agent into production," what first concerns you is the load on the workload. The basis for Runtime Monitoring's agent being called "lightweight" is that explicit limits are set. Let's pin it down with official numbers, not guesses.
| Surface | CPU limit | Memory limit | Note |
|---|---|---|---|
| EC2 | up to 10% of total vCPU cores | follows the official table | with 4 vCPU, up to 40% (= 40% of 400%) |
EKS (add-on aws-guardduty-agent) | 200m–1000m (0.2–1.0 vCPU) | 256Mi–1024Mi | the values are configurable from add-on v1.5.0 onward |
| ECS-Fargate | follows the sidecar's resource allocation | same as left | monitor the measured values with Container Insights |
Three points.
- EC2 is capped by "ratio." "The maximum CPU limit for the GuardDuty security agent associated with Amazon EC2 instances is 10 percent of the total vCPU cores." The larger the instance, the larger the absolute amount, but since the ratio is constant (10%), it's a design that doesn't easily crowd the workload.
- EKS is capped by "absolute value." The add-on's defaults are CPU
200m–1000m, memory256Mi–1024Mi. From v1.5.0 onward,CPU/Memory/PriorityClass/dnsPolicyare configurable, and you can adjust them to the workload and instance size. If Insights shows "it's reaching the limit," raise these. - Monitor with measurement. The official recommends monitoring CPU/memory consumption with Container Insights (both ECS and EKS). Don't fully believe "lightweight" — measure it — this is the production discipline.
A note on
PriorityClass: the EKS add-on's defaultPriorityClassis a setting that "doesn't give the agent Pod special treatment based on priority." When the node is under resource pressure, if the agent Pod is evicted first, at that moment the coverage drops toUnhealthy. For a cluster where you don't want to break the security visibility, consider changing it tosystem-node-critical, etc., including the trade-off (the eviction risk of important workloads vs. the continuity of monitoring).
6. Cost discipline: the biggest driver, proportional to protected vCPU
This is the core of the decision-making in Runtime Monitoring operation. Runtime Monitoring is billed proportionally to protected vCPU, and among GuardDuty's protection plans, it's the feature most likely to become the most expensive. GuardDuty's overall cost optimization is left to the dedicated article, but here are three points specific to Runtime Monitoring.
6.1 Billing is "the vCPU scale of the protected workload × running time"
The official billing unit is based on "the number of vCPUs of the provisioned instances/tasks protected as monitoring targets × running time." In other words, the larger the vCPU of the workload and the longer you protect it, the higher. The nature of the billing differs from foundational detection (event volume, GB billing), and it's directly tied to "the scale of the assets." So — "just put it in all EC2 and all clusters" is the shortest route for the vCPU billing to balloon snowball-style.
6.2 So "narrow it to important workloads"
The basics of the design are "narrow the scope by importance." The phased introduction I recommend:
- Start from the surface that needs attack sequences: first put it in production important workloads where you want to get ECS/EC2 attack sequences (Critical findings).
- Surfaces handling sensitive data: workloads with a large impact when compromised, like payments, personal information, and authentication infrastructure.
- Forgo the rest, or explicitly exclude it with an EC2 exclusion tag.
Rather than "put it all in for peace of mind," "concentrate investment in high-impact assets and maximize cost-effectiveness including the ETD correlation effect" wins with a limited security budget.
6.3 The "VPC Flow Logs double-charge exemption" worth knowing
In discussing Runtime Monitoring's cost, this is an easily-overlooked relief. The official text:
When you manage the security agent ... and GuardDuty is presently deployed on an Amazon EC2 instance and receives the Collected runtime event types from this instance, GuardDuty will not charge your AWS account for the analysis of VPC flow logs from this Amazon EC2 instance. This helps GuardDuty avoid double usage cost in the account.
In other words — if the agent runs on an EC2 instance and GuardDuty receives that instance's runtime events, GuardDuty doesn't charge for that instance's VPC Flow Logs analysis. Since Runtime Monitoring sees the network connections via the agent, it's a design that doesn't double-take with the foundational VPC Flow Logs analysis. Runtime Monitoring's vCPU billing offsets a part of the foundational Flow Logs billing — incorporate this exemption in the cost estimate (if you don't know this and estimate "Runtime Monitoring is a pure add-on," it becomes overstated).
The 30-day free trial for new regions: Runtime Monitoring also comes with a 30-day free trial. You can grasp the expected bill at production volume before billing starts. The judgment to narrow to important workloads gains further precision with the measurement during this trial period.
7. Reading findings: Runtime Monitoring's type families
Runtime Monitoring findings can be identified at a glance by their resource segment being Runtime (type : Runtime / threat name). It's a separate lineage from the foundational EC2/S3/IAMUser, etc. Let me organize the major families per attack stage.
| Family (ThreatPurpose) | Representative finding types | What it captures |
|---|---|---|
| Execution | Execution:Runtime/NewBinaryExecuted, ReverseShell, SuspiciousTool, NewLibraryLoaded | execution of a new binary/library, reverse shell, suspicious tool |
| PrivilegeEscalation | PrivilegeEscalation:Runtime/DockerSocketAccessed, RuncContainerEscape, ElevationToRoot, ContainerMountsHostDirectory | Docker socket access, container escape, root elevation, host-directory mount |
| DefenseEvasion | DefenseEvasion:Runtime/ProcessInjection.*, FilelessExecution, KernelModuleLoaded, PtraceAntiDebugging | process injection, fileless execution, kernel module loading |
| Persistence | Persistence:Runtime/SensitiveFileModified, SuspiciousCommand | modification of sensitive files, a command aiming for persistence |
| Impact | Impact:Runtime/CryptoMinerExecuted, MaliciousDomainRequest.Reputation, etc. | execution of a cryptocurrency miner, communication to a malicious domain |
| CryptoCurrency | CryptoCurrency:Runtime/BitcoinTool.B (has a !DNS derivative) | inquiries to cryptocurrency-related IPs/domains |
| Backdoor | Backdoor:Runtime/C&CActivity.B (has a !DNS derivative) | communication with a C&C server |
| Trojan | Trojan:Runtime/BlackholeTraffic, DropPoint, DGADomainRequest.C!DNS, etc. | typical trojan communication patterns |
| UnauthorizedAccess | UnauthorizedAccess:Runtime/TorRelay, TorClient, MetadataDNSRebind | Tor relay/client, metadata DNS rebind |
| Discovery | Discovery:Runtime/SuspiciousCommand | a command aiming for reconnaissance |
Two implementation points.
- You can route by the
Runtimesegment. With EventBridge auto-response (chapter 6 of the pillar article), for those whosetypeis*:Runtime/*, you can consider, as "a compromise inside the container/host," a response that goes beyond just network isolation, even into stopping/isolating the relevant Pod/task. Since the process-lineage information is included in the finding, the investigation is fast. - Sanitizing finding fields is mandatory: an important caveat the official clearly states — because Runtime Monitoring findings include file paths, etc., that an attacker can control, "When processing Runtime Monitoring findings outside of GuardDuty console, you must sanitize finding fields. For example, you can HTML encode finding fields." When you display/forward findings on the web in auto-response or notifications, always pass through sanitization like HTML escaping (treat it as untrusted input). This is completely consistent with the "validate/escape external input" of my route conventions.
8. Decision table: on which surface, automatic or manual, to put it in
Let me drop the above into practical judgment. "On which workload, on which surface, automatic or manual" in one sheet.
| Your situation | Should you put it in | Surface | Automatic / manual | Reason |
|---|---|---|---|---|
| Running production APIs/workers on Fargate | Put it in (the lightest introduction) | ECS-Fargate | automatic only | GuardDuty fully manages sidecar injection. ECS attack sequence is also unlocked |
| Operating a production cluster on EKS (EC2 nodes) | Put it in | EKS | automatic in principle | GuardDuty updates the add-on. Low operational load |
| Want to strictly pin the version / put it on change management on EKS | Put it in | EKS | manual | Hold the add-on's lifecycle yourself. Confirming the K8s correspondence table is self-responsibility |
| Running sensitive workloads on EC2, already SSM-managed | Put it in | EC2 | automatic | Since SSM exists, automatic deployment is straightforward |
| EC2 but can't/won't use SSM | Put it in | EC2 | manual | SSM-independent. Creating the VPC endpoint is self-responsibility |
| Running EKS on Fargate | Can't put it in | — | — | Unsupported (EKS on Fargate / Hybrid Nodes are out of support) |
| ECS on EC2 nodes (not Fargate) | Can put it in (Runtime detection is possible) | EC2 | automatic/manual | But the ECS attack sequence is unsupported (Fargate-ECS or EC2 is the premise) |
| Small-scale workloads for dev/verification | Forgo in principle | — | — | The detection value worth the vCPU billing is thin. Explicitly exclude with an exclusion tag |
There are only two axes for the decision: ① is that workload of an importance worth "runtime visibility / attack sequences" (= is it worth paying the cost), and ② do you need to hold version management yourself (= automatic vs. manual). With these two axes, it falls into the table above.
9. Summary: GuardDuty Runtime Monitoring production cheat sheet
A quick reference for when you're lost.
- What layer it is: Runtime Monitoring is an optional protection plan that observes the inside of the workload (OS-level processes, files, networking) with an eBPF security agent. It complements agentless foundational detection (the outline).
- The three distribution surfaces: EKS = the managed add-on
aws-guardduty-agent(DaemonSet) / ECS-Fargate = the injected sidecar (automatic only) / EC2 = via SSM. Understanding that the distribution differs per surface is the first step. - Automatic vs. manual: when in doubt, automatic (low operational load). Manual is only when you want to pin the version / put it on change management. Fargate has no manual option. With manual, creating the VPC endpoint is self-responsibility.
- Terraform:
aws_guardduty_detector_feature'sRUNTIME_MONITORING+ the three toggles (EKS_ADDON_MANAGEMENT/ECS_FARGATE_AGENT_MANAGEMENT/EC2_AGENT_MANAGEMENT). For new builds useRUNTIME_MONITORING(EKS_RUNTIME_MONITORINGis in the consolidation direction). - Coverage confirmation (mandatory): enabled ≠ protected. Healthy = the state where the three points Runtime Monitoring enabled + VPC endpoint + agent placement are all in place. With
Unhealthy, findings are zero. Confirm mechanically withget-coverage-statistics/list-coverageand incorporate into CI. - Lightness in numbers: EC2 is up to 10% of total vCPU, the EKS add-on is CPU 200m–1000m, memory 256Mi–1024Mi (configurable from v1.5.0 onward). Measure with Container Insights.
- Cost discipline: GuardDuty's biggest cost driver, proportional to protected vCPU. Narrow to important workloads. However, the VPC Flow Logs analysis cost for EC2 where the agent runs is exempted (no double charge) — incorporate it into the estimate. Measure with the 30-day free trial.
- The trap of support scope: EKS on Fargate / EKS Hybrid Nodes are unsupported. ECS on EC2 nodes doesn't support the ECS attack sequence (Fargate-ECS or EC2 is the premise). Always confirm before designing.
- The support matrix: the supported OS/kernel/K8s versions are updated frequently. Don't hard-code; confirm the official verified platforms. Don't forget kernel requirements like
CONFIG_DEBUG_INFO_BTF=y.
Runtime Monitoring is not "a box that, once enabled, sees inside for you," but a feature that produces value only once you've fully designed '① on which surface, ② how to distribute, ③ confirm coverage, and ④ control cost.' The biggest leverage lies in the discipline of "narrow to important workloads, guarantee coverage Healthy in CI, and optimize cost including the Flow Logs exemption."
I cross-implemented IAM, observability, and DR on a multi-account serverless payment platform, and have run production workloads on ECS on Fargate. I design the introduction of Runtime Monitoring with the same philosophy — ① discern the important workloads and narrow the scope, ② choose automatic/manual by requirements, ③ incorporate coverage Healthy into the verification path, and ④ read the cost incorporating vCPU billing and the Flow Logs exemption. Structurally crush the "thought-I-enabled-it" hole, and fully put detection into operation.
"How to design Runtime Monitoring for your EKS / Fargate / EC2, how much to leave to automatic management, and how to hold down cost" — from selecting the distribution surface to Terraform implementation, coverage verification, cost estimation, and attack-sequence integration, I can accompany you fast and safely with one person × generative AI (Claude Code). Even from the requirement-organizing stage, please feel free to consult me.
References (official documentation)
- GuardDuty Runtime Monitoring — Runtime Monitoring's definition, supported resources, the security agent's observation targets
- How Runtime Monitoring works — the two stages of enablement + agent management, VPC endpoint, the VPC Flow Logs double-charge exemption
- Managing GuardDuty security agents — automatic/manual management, per-surface agent management, the premise of Healthy/Unhealthy
- Reviewing runtime coverage statistics and troubleshooting — definition of the coverage status (Healthy = the three conditions met / Unhealthy = zero findings)
- Prerequisites for Amazon EC2 instance support — SSM management, supported architectures, the EC2 CPU limit (10% of total vCPU), kernel requirements
- Prerequisites for Amazon EKS cluster support — verified platforms (OS/kernel/K8s correspondence table), the EKS add-on's CPU/memory limits, Fargate-EKS / Hybrid Nodes unsupported
- Configure GuardDuty security agent (add-on) parameters for Amazon EKS — v1.5.0+'s CPU/memory,
PriorityClass,dnsPolicysettings - GuardDuty Runtime Monitoring finding types — the list of
*:Runtime/*finding types and the sanitization caveat - Amazon GuardDuty pricing — Runtime Monitoring's protected-vCPU-based billing, the 30-day free trial