Running GuardDuty Runtime Monitoring in production on EKS / ECS-Fargate / EC2: security agent, coverage, cost, troubleshooting

"I installed GuardDuty, but can I notice if something weird runs inside a container?" — in a setting where I'm consulted about container-platform security, this is a question that hits the core.

The answer is "with the foundational GuardDuty alone, you can't notice." Agentless foundational detection (CloudTrail, VPC Flow Logs, DNS) sees AWS's control-plane operations and the outline of the network. But OS-level behavior that happens inside the workload — like "an unknown binary was executed inside a container," "/etc/shadow was tampered with," or "a reverse shell was set up" — isn't visible from the outside. What fills this in is GuardDuty Runtime Monitoring.

This article is an implementation guide for designing and operating Runtime Monitoring at production quality across the three surfaces of EKS / ECS-Fargate / EC2. GuardDuty's overall design (protection-plan selection, organizational control, EventBridge auto-response) is left to this pillar article; this piece concentrates on digging one level deeper into that pillar's single line "Runtime Monitoring" — down to the distribution architecture, coverage confirmation, cost, and troubleshooting. I cross-implemented IAM, observability, and DR on a multi-account-AWS serverless payment platform, and have run production workloads on ECS on Fargate — on that experience as a foundation, I'll talk about "how to design and implement Runtime Monitoring in your environment."

The rules of this article: the specs, supported OS/architecture, kernel requirements, the agent's resource limits, and the pricing model are based on the AWS official documentation (as of June 2026). Since the supported OS/kernel/Kubernetes versions are updated frequently, always confirm the official "support matrix (verified platforms)" before going to production (in the body too I avoid enumerating specific versions and show the confirmation procedure). And one more thing — Runtime Monitoring is one feature (one layer) of GuardDuty, and doesn't substitute for WAF, least-privilege IAM, image scanning, or network isolation. Since the cost is proportional to protected vCPU, the basics of the design are not "put it in everything" but "narrow it to important workloads."

0. Mental model: Runtime Monitoring is "the eye that sees inside the workload"

Before starting the design, let's fix in one line what Runtime Monitoring complements about foundational detection.

Runtime Monitoring = GuardDuty's optional protection plan that places an eBPF security agent on EKS / ECS-Fargate / EC2 workloads, observes OS-level "process execution, file access, and network connections" from the inside, and generates findings.

The official definition goes like this — "Runtime Monitoring observes and analyzes operating system-level, networking, and file events", and the agent visualizes "file access, process execution, command line arguments, and network connections." From here, three consequences emerge.

Foundational detection is "the outline," Runtime Monitoring is "the inside." The foundational VPC Flow Logs see "which IP it communicated with" from the outside of the network. Runtime Monitoring sees, from the inside of the host, down to "which process initiated that communication," "what the command-line arguments are," and "what the parent process is (process lineage)." For example, cryptocurrency mining can also be caught by domain with foundational DNS detection, but with Runtime Monitoring it captures the fact that "a miner binary was actually executed" as Impact:Runtime/CryptoMinerExecuted.
To see, you need an "agent." Whereas foundational detection is agentless, Runtime Monitoring requires an eBPF-based security agent that runs on the workload. How to distribute this agent is the main subject of this article (the distribution differs between EKS / Fargate / EC2).
"Enabled" ≠ "protected." Detection works only once the agent is correctly placed and runtime events reach GuardDuty via the VPC endpoint. What represents this "is it really arriving" is the coverage status (Healthy / Unhealthy). Leaving it Unhealthy becomes a hole that, while you think you've enabled it, produces no findings at all (chapter 4).

Grasping these three points, you see that introducing Runtime Monitoring is the four designs of "① on which surface, ② how to distribute (automatic or manual), ③ confirm it's really arriving, and ④ control the cost." Let's build them in order.

1. Why foundational detection alone isn't enough: what Runtime Monitoring adds

As seen in the pillar article, enabling GuardDuty turns the foundational data sources on immediately. So what does Runtime Monitoring specifically make visible additionally? Lining it up along the attack kill chain makes the complementary relationship clear.

Attack stage	What foundational detection (agentless) sees	What Runtime Monitoring sees additionally
Initial intrusion	Access from a known malicious IP (VPC Flow)	Suspicious shell spawning inside a container, new-binary execution
Execution	(hard to see)	`Execution:Runtime/NewBinaryExecuted`, `ReverseShell`, `SuspiciousTool`
Privilege escalation	Anomalies in IAM operations (CloudTrail)	Docker socket access, `runc` container escape, elevation to root
Defense evasion	(hard to see)	process injection, fileless execution, kernel module loading
Persistence	(hard to see)	modification of sensitive files (`Persistence:Runtime/SensitiveFileModified`)
C&C / exfiltration	DNS queries to a malicious domain	which process initiated that communication (with process lineage)

The key point is that "foundational detection captures 'signs observable from outside,' Runtime Monitoring captures 'behavior observable only inside the host.'" The typical scenario the official cites goes like this — a single container running a vulnerable web app is compromised, and using misconfigured credentials as a foothold, access spreads to the entire account. This chain of "container compromise → privilege escalation → data access" can't be caught in the early stage unless you look inside the container.

And one more decisive thing is the relationship with Extended Threat Detection (attack sequences). Details are left to the attack-sequence article, but just the conclusion:

The ECS attack sequence (AttackSequence:ECS/CompromisedCluster) presupposes Runtime Monitoring on Fargate-ECS or EC2.
The EC2 attack sequence (AttackSequence:EC2/CompromisedInstanceGroup) is strengthened by Runtime Monitoring.

In other words, if you "want to bundle ECS/EC2 multi-stage attacks into one Critical finding," Runtime Monitoring becomes a prerequisite, not an optional plan. This is the biggest reason "it's worth putting in for important workloads."

2. The three distribution surfaces: how the agent arrives

The essential difficulty of Runtime Monitoring is that the same "security agent" is distributed in completely different ways per resource type. The official collectively calls EKS, Fargate-ECS, and EC2 the "resource types."

Resource type	Agent distribution form	Automatic management	Manual management
Amazon EKS	Managed add-on `aws-guardduty-agent` (placed on each node as a DaemonSet)	Possible (GuardDuty deploys/updates the add-on)	Possible (manage the add-on's lifecycle/version yourself)
ECS (AWS Fargate)	Inject a GuardDuty-managed sidecar container into the task	Possible	Not possible (automatic management only; no manual option)
Amazon EC2	Introduce/update the agent on the host via SSM (AWS Systems Manager)	Possible (automatic deployment via SSM)	Possible (install/update yourself)

Understanding this accurately is the foundation of the design judgments. Let's look at each of the three surfaces' "sweet spots."

2.1 EKS: the managed add-on `aws-guardduty-agent`

On EKS, the agent is distributed as the EKS managed add-on aws-guardduty-agent. Its substance is a DaemonSet within the cluster (1 Pod per node).

Automatic management: GuardDuty deploys/updates the add-on as needed. "GuardDuty manages the security agent (EKS add-on) on your behalf, it updates the add-on, as needed". Default values are set for the configuration parameters (CPU/memory, PriorityClass, dnsPolicy), but you can override them yourself if needed.
Manual management: you hold the add-on's version and lifecycle yourself. Choose this when you want to strictly pin the version / put it on the cluster's change-management process. In the manual case, you need to confirm yourself whether the Kubernetes version supports the agent version (there's an official correspondence table — described later).

EKS support scope (important): Runtime Monitoring supports EKS clusters on EC2 nodes and EKS Auto Mode. On the other hand, EKS Hybrid Nodes and EKS clusters on AWS Fargate are unsupported ("GuardDuty doesn't support Amazon EKS clusters running on AWS Fargate"). If you "run EKS on Fargate," you can't protect with this feature — always confirm at design time.

2.2 ECS-Fargate: the injected sidecar

On Fargate, since you can't touch the host OS, the agent is injected into the task as a sidecar container. Here's the most important constraint — Fargate (ECS) is automatic-management-only and has no manual-management option (the official also clearly states "with an exception to Fargate (Amazon ECS only)"). GuardDuty fully handles placement and updates.

If you run containers in production on Fargate, Runtime Monitoring is the surface with the lightest introduction. No host patching or SSM management is needed, and once enabled, GuardDuty takes care of the sidecar. In return, you have no room to control the version or placement yourself — this is a trade-off consistent with Fargate's "serverless, leave it to us" nature.

2.3 EC2: the agent via SSM

On EC2, the agent is distributed through SSM (AWS Systems Manager). "GuardDuty uses AWS Systems Manager (SSM) to automatically deploy, install, and manage the security agent on your instances".

Premise of automatic management: the instance is under SSM management (a state shown in Fleet Manager). If you use automatic management, this is a hard requirement.
Manual management: "If you plan to manually install and manage the GuardDuty agent, SSM is not required". In an environment that doesn't/can't use SSM (a special base image, etc.), choose manual.
Exclusion tag: if you want to exclude a specific instance under automatic management, attach the GuardDutyManaged:false tag before launch and enable tag reference of the instance metadata (IMDS). It's effective for the "want to put it in everything but exclude only some" operation.

3. Enable it with Terraform: `RUNTIME_MONITORING` and the three toggles

Once the design is firm, drop it into code. Runtime Monitoring is added with the aws_guardduty_detector_feature resource, separate from the aws_guardduty_detector body.

# Runtime Monitoring を有効化し、各面のエージェント配置を GuardDuty に自動管理させる。
# コストが保護 vCPU に比例して増えるため、「本当にランタイム可視性が要る環境」に絞って有効化する。
resource "aws_guardduty_detector_feature" "runtime_monitoring" {
  detector_id = aws_guardduty_detector.this.id
  name        = "RUNTIME_MONITORING"
  status      = "ENABLED"

  # --- 面ごとに「エージェントを GuardDuty に自動管理させるか」を個別に切り替える ---
  # ここを ENABLED にすると、その面のエージェント配置・更新を GuardDuty が担う(自動管理)。
  # DISABLED にすると、Runtime Monitoring 自体は有効でも、その面のエージェントは
  # 「手動管理」(自分でアドオン/SSM を回す)前提になる。

  additional_configuration {
    name   = "EKS_ADDON_MANAGEMENT" # EKS: aws-guardduty-agent アドオンを自動デプロイ・更新
    status = "ENABLED"
  }
  additional_configuration {
    name   = "ECS_FARGATE_AGENT_MANAGEMENT" # Fargate: サイドカーを自動注入(手動の選択肢なし)
    status = "ENABLED"
  }
  additional_configuration {
    name   = "EC2_AGENT_MANAGEMENT" # EC2: SSM 経由でエージェントを自動管理
    status = "ENABLED"
  }
}

Three points of the design.

The toggles are "per-surface automatic-management switches." On top of setting RUNTIME_MONITORING to ENABLED, switch EKS_ADDON_MANAGEMENT / ECS_FARGATE_AGENT_MANAGEMENT / EC2_AGENT_MANAGEMENT individually. If you want to make only EKS manual, set EKS_ADDON_MANAGEMENT to DISABLED and manage the add-on separately (next section).
The EKS add-on can be made explicit as a separate resource. With automatic management (EKS_ADDON_MANAGEMENT = ENABLED), GuardDuty installs the add-on, but if you manage it manually, make the EKS add-on explicit in Terraform.

# 【手動管理を選んだ場合のみ】EKS アドオンを自分で管理する。
# バージョンを固定し、クラスタの変更管理プロセスに乗せたいときに有効。
# 自動管理(EKS_ADDON_MANAGEMENT = ENABLED)なら、このリソースは不要。
resource "aws_eks_addon" "guardduty_agent" {
  cluster_name = aws_eks_cluster.this.name
  addon_name   = "aws-guardduty-agent"

  # バージョンは公式の対応表で「使用中の Kubernetes バージョンが
  # そのエージェントバージョンをサポートするか」を確認してから固定する。
  # ハードコードせず、変数 or data source で管理する(対応表は更新されるため)。
  addon_version = var.guardduty_agent_addon_version

  # 既存設定との衝突時の挙動。手動でパラメータ(CPU/メモリ等)を上書きするなら PRESERVE。
  resolve_conflicts_on_update = "PRESERVE"
}

If you enable it in bulk across the organization, you can roll out RUNTIME_MONITORING org-wide with the same structure as the pillar article's aws_guardduty_organization_configuration_feature. But since it's a cost driver, estimate the cost before defaulting it ON org-wide (chapter 6).

The relationship with EKS_RUNTIME_MONITORING (needs confirmation): historically, there was an independent feature called EKS_RUNTIME_MONITORING, dedicated to EKS. Currently, consolidation into RUNTIME_MONITORING (the unified feature that bundles EKS/ECS/EC2) is progressing. For new builds, using RUNTIME_MONITORING is the correct answer. Since the migration feasibility/deadline for existing environments using EKS_RUNTIME_MONITORING may be revised, always confirm the current treatment in the official documentation (at the time of writing, the fact of "the consolidation direction" is confirmed; the specific deprecation schedule follows the official).

4. Confirm coverage: plug the "thought-I-enabled-it" hole

This is the point where accidents most often occur in Runtime Monitoring operation. Just "enabled it in Terraform" is insufficient, and you must separately confirm whether the agent is really delivering events. What represents this is the coverage status.

4.1 What Healthy / Unhealthy means

The official definition is clear. The coverage status becomes Healthy only when three conditions are all in place.

Coverage status is determined by making sure that you have enabled Runtime Monitoring, your Amazon VPC endpoint has been created, and the GuardDuty security agent for the corresponding resource has been deployed.

Status	Meaning	Consequence
Healthy	The three points are in place — ① Runtime Monitoring enabled ② VPC endpoint created ③ agent placed — and runtime events reach GuardDuty	findings can be generated normally
Unhealthy	There's a problem in any of the above (config, VPC endpoint, agent placement)	GuardDuty can't receive events and generates no Runtime Monitoring findings at all

What's decisive is — while Unhealthy, findings are zero. Whether "I enabled it but no alerts come" is because "it's peaceful" or because "I'm blind with Unhealthy" can't be distinguished unless you look at the coverage. So always make "enabling" and "confirming coverage Healthy" a set.

4.2 Confirm coverage with bash

GuardDuty has APIs that return coverage statistics and the status of individual resources. You can incorporate them into a CI or post-deploy verification script.

#!/usr/bin/env bash
# Runtime Monitoring のカバレッジを確認し、Unhealthy なリソースを洗い出す。
# デプロイ後の検証 or 定期監査に使う。「有効化したつもり」の穴を機械的に塞ぐ。
set -euo pipefail

DETECTOR_ID="${1:?Usage: check-coverage.sh <detector-id>}"

# ① 種別ごとのカバレッジ統計(Healthy/Unhealthy のカウント)を取得。
#    全体像を一目で掴む。
echo "=== Coverage statistics by resource type ==="
aws guardduty get-coverage-statistics \
  --detector-id "$DETECTOR_ID" \
  --statistics-type COUNT_BY_COVERAGE_STATUS \
  --output table

# ② Unhealthy なリソースだけを列挙(個別の調査対象を特定)。
#    フィルタで coverage status = UNHEALTHY のものに絞る。
echo "=== Unhealthy resources (need troubleshooting) ==="
aws guardduty list-coverage \
  --detector-id "$DETECTOR_ID" \
  --filter-criteria '{
    "FilterCriterion": [
      { "CriterionKey": "COVERAGE_STATUS", "FilterCondition": { "Equals": ["UNHEALTHY"] } }
    ]
  }' \
  --query 'Resources[].{Id:ResourceId,Type:ResourceDetails.ResourceType,Status:CoverageStatus,Issue:Issue}' \
  --output table

Build the verification path first: I recommend placing this script in a post-deploy hook in CI. Run it right after turning RUNTIME_MONITORING ON in Terraform, and don't consider the deploy "complete" until you confirm the important workloads are Healthy — the most effective guard that structurally crushes "thought-I-enabled-it."

4.3 Typical causes of Unhealthy and troubleshooting

The cause of Unhealthy is that somewhere in the three conditions is missing. Triage per surface.

Surface	Common Unhealthy cause	Direction of action
Common	VPC endpoint not created / unreachable (in manual management you need to create it yourself)	Confirm the endpoint's existence and SG/routes
EC2	The instance is not under SSM management (doesn't meet the premise of automatic management)	Confirm the SSM Agent's introduction and IAM role
EC2	The OS/kernel is out of support, or the `CONFIG_DEBUG_INFO_BTF` flag is unset	Confirm the official verified platforms
EKS	The add-on `aws-guardduty-agent` is not deployed / the K8s version is unsupported	Confirm the add-on's status and the supported-version table
EKS	The agent Pod has reached the CPU/memory limit	Adjust the add-on's resource settings (chapter 5)
Organization	An SCP denies `guardduty:SendSecurityTelemetry`	Confirm the SCP's permission boundary

If you chose manual management, note that creating the VPC endpoint is your responsibility (with automatic management GuardDuty creates it). "Install agent manually, which requires you to create the VPC endpoint as a prerequisite." — forgetting this lands you in the typical pitfall where, even though the agent is running, events don't arrive and it stays Unhealthy.

5. Is the agent "lightweight": pin down the CPU/memory limits with numbers

When you hear "put a security agent into production," what first concerns you is the load on the workload. The basis for Runtime Monitoring's agent being called "lightweight" is that explicit limits are set. Let's pin it down with official numbers, not guesses.

Surface	CPU limit	Memory limit	Note
EC2	up to 10% of total vCPU cores	follows the official table	with 4 vCPU, up to 40% (= 40% of 400%)
EKS (add-on `aws-guardduty-agent`)	200m–1000m (0.2–1.0 vCPU)	256Mi–1024Mi	the values are configurable from add-on v1.5.0 onward
ECS-Fargate	follows the sidecar's resource allocation	same as left	monitor the measured values with Container Insights

Three points.

EC2 is capped by "ratio." "The maximum CPU limit for the GuardDuty security agent associated with Amazon EC2 instances is 10 percent of the total vCPU cores." The larger the instance, the larger the absolute amount, but since the ratio is constant (10%), it's a design that doesn't easily crowd the workload.
EKS is capped by "absolute value." The add-on's defaults are CPU 200m–1000m, memory 256Mi–1024Mi. From v1.5.0 onward, CPU / Memory / PriorityClass / dnsPolicy are configurable, and you can adjust them to the workload and instance size. If Insights shows "it's reaching the limit," raise these.
Monitor with measurement. The official recommends monitoring CPU/memory consumption with Container Insights (both ECS and EKS). Don't fully believe "lightweight" — measure it — this is the production discipline.

A note on PriorityClass: the EKS add-on's default PriorityClass is a setting that "doesn't give the agent Pod special treatment based on priority." When the node is under resource pressure, if the agent Pod is evicted first, at that moment the coverage drops to Unhealthy. For a cluster where you don't want to break the security visibility, consider changing it to system-node-critical, etc., including the trade-off (the eviction risk of important workloads vs. the continuity of monitoring).

6. Cost discipline: the biggest driver, proportional to protected vCPU

This is the core of the decision-making in Runtime Monitoring operation. Runtime Monitoring is billed proportionally to protected vCPU, and among GuardDuty's protection plans, it's the feature most likely to become the most expensive. GuardDuty's overall cost optimization is left to the dedicated article, but here are three points specific to Runtime Monitoring.

6.1 Billing is "the vCPU scale of the protected workload × running time"

The official billing unit is based on "the number of vCPUs of the provisioned instances/tasks protected as monitoring targets × running time." In other words, the larger the vCPU of the workload and the longer you protect it, the higher. The nature of the billing differs from foundational detection (event volume, GB billing), and it's directly tied to "the scale of the assets." So — "just put it in all EC2 and all clusters" is the shortest route for the vCPU billing to balloon snowball-style.

6.2 So "narrow it to important workloads"

The basics of the design are "narrow the scope by importance." The phased introduction I recommend:

Start from the surface that needs attack sequences: first put it in production important workloads where you want to get ECS/EC2 attack sequences (Critical findings).
Surfaces handling sensitive data: workloads with a large impact when compromised, like payments, personal information, and authentication infrastructure.
Forgo the rest, or explicitly exclude it with an EC2 exclusion tag.

Rather than "put it all in for peace of mind," "concentrate investment in high-impact assets and maximize cost-effectiveness including the ETD correlation effect" wins with a limited security budget.

6.3 The "VPC Flow Logs double-charge exemption" worth knowing

In discussing Runtime Monitoring's cost, this is an easily-overlooked relief. The official text:

When you manage the security agent ... and GuardDuty is presently deployed on an Amazon EC2 instance and receives the Collected runtime event types from this instance, GuardDuty will not charge your AWS account for the analysis of VPC flow logs from this Amazon EC2 instance. This helps GuardDuty avoid double usage cost in the account.

In other words — if the agent runs on an EC2 instance and GuardDuty receives that instance's runtime events, GuardDuty doesn't charge for that instance's VPC Flow Logs analysis. Since Runtime Monitoring sees the network connections via the agent, it's a design that doesn't double-take with the foundational VPC Flow Logs analysis. Runtime Monitoring's vCPU billing offsets a part of the foundational Flow Logs billing — incorporate this exemption in the cost estimate (if you don't know this and estimate "Runtime Monitoring is a pure add-on," it becomes overstated).

The 30-day free trial for new regions: Runtime Monitoring also comes with a 30-day free trial. You can grasp the expected bill at production volume before billing starts. The judgment to narrow to important workloads gains further precision with the measurement during this trial period.

7. Reading findings: Runtime Monitoring's type families

Runtime Monitoring findings can be identified at a glance by their resource segment being Runtime (type : Runtime / threat name). It's a separate lineage from the foundational EC2/S3/IAMUser, etc. Let me organize the major families per attack stage.

Family (ThreatPurpose)	Representative finding types	What it captures
Execution	`Execution:Runtime/NewBinaryExecuted`, `ReverseShell`, `SuspiciousTool`, `NewLibraryLoaded`	execution of a new binary/library, reverse shell, suspicious tool
PrivilegeEscalation	`PrivilegeEscalation:Runtime/DockerSocketAccessed`, `RuncContainerEscape`, `ElevationToRoot`, `ContainerMountsHostDirectory`	Docker socket access, container escape, root elevation, host-directory mount
DefenseEvasion	`DefenseEvasion:Runtime/ProcessInjection.*`, `FilelessExecution`, `KernelModuleLoaded`, `PtraceAntiDebugging`	process injection, fileless execution, kernel module loading
Persistence	`Persistence:Runtime/SensitiveFileModified`, `SuspiciousCommand`	modification of sensitive files, a command aiming for persistence
Impact	`Impact:Runtime/CryptoMinerExecuted`, `MaliciousDomainRequest.Reputation`, etc.	execution of a cryptocurrency miner, communication to a malicious domain
CryptoCurrency	`CryptoCurrency:Runtime/BitcoinTool.B` (has a `!DNS` derivative)	inquiries to cryptocurrency-related IPs/domains
Backdoor	`Backdoor:Runtime/C&CActivity.B` (has a `!DNS` derivative)	communication with a C&C server
Trojan	`Trojan:Runtime/BlackholeTraffic`, `DropPoint`, `DGADomainRequest.C!DNS`, etc.	typical trojan communication patterns
UnauthorizedAccess	`UnauthorizedAccess:Runtime/TorRelay`, `TorClient`, `MetadataDNSRebind`	Tor relay/client, metadata DNS rebind
Discovery	`Discovery:Runtime/SuspiciousCommand`	a command aiming for reconnaissance

Two implementation points.

You can route by the Runtime segment. With EventBridge auto-response (chapter 6 of the pillar article), for those whose type is *:Runtime/*, you can consider, as "a compromise inside the container/host," a response that goes beyond just network isolation, even into stopping/isolating the relevant Pod/task. Since the process-lineage information is included in the finding, the investigation is fast.
Sanitizing finding fields is mandatory: an important caveat the official clearly states — because Runtime Monitoring findings include file paths, etc., that an attacker can control, "When processing Runtime Monitoring findings outside of GuardDuty console, you must sanitize finding fields. For example, you can HTML encode finding fields." When you display/forward findings on the web in auto-response or notifications, always pass through sanitization like HTML escaping (treat it as untrusted input). This is completely consistent with the "validate/escape external input" of my route conventions.

8. Decision table: on which surface, automatic or manual, to put it in

Let me drop the above into practical judgment. "On which workload, on which surface, automatic or manual" in one sheet.

Your situation	Should you put it in	Surface	Automatic / manual	Reason
Running production APIs/workers on Fargate	Put it in (the lightest introduction)	ECS-Fargate	automatic only	GuardDuty fully manages sidecar injection. ECS attack sequence is also unlocked
Operating a production cluster on EKS (EC2 nodes)	Put it in	EKS	automatic in principle	GuardDuty updates the add-on. Low operational load
Want to strictly pin the version / put it on change management on EKS	Put it in	EKS	manual	Hold the add-on's lifecycle yourself. Confirming the K8s correspondence table is self-responsibility
Running sensitive workloads on EC2, already SSM-managed	Put it in	EC2	automatic	Since SSM exists, automatic deployment is straightforward
EC2 but can't/won't use SSM	Put it in	EC2	manual	SSM-independent. Creating the VPC endpoint is self-responsibility
Running EKS on Fargate	Can't put it in	—	—	Unsupported (EKS on Fargate / Hybrid Nodes are out of support)
ECS on EC2 nodes (not Fargate)	Can put it in (Runtime detection is possible)	EC2	automatic/manual	But the ECS attack sequence is unsupported (Fargate-ECS or EC2 is the premise)
Small-scale workloads for dev/verification	Forgo in principle	—	—	The detection value worth the vCPU billing is thin. Explicitly exclude with an exclusion tag

There are only two axes for the decision: ① is that workload of an importance worth "runtime visibility / attack sequences" (= is it worth paying the cost), and ② do you need to hold version management yourself (= automatic vs. manual). With these two axes, it falls into the table above.

9. Summary: GuardDuty Runtime Monitoring production cheat sheet

A quick reference for when you're lost.

What layer it is: Runtime Monitoring is an optional protection plan that observes the inside of the workload (OS-level processes, files, networking) with an eBPF security agent. It complements agentless foundational detection (the outline).
The three distribution surfaces: EKS = the managed add-on aws-guardduty-agent (DaemonSet) / ECS-Fargate = the injected sidecar (automatic only) / EC2 = via SSM. Understanding that the distribution differs per surface is the first step.
Automatic vs. manual: when in doubt, automatic (low operational load). Manual is only when you want to pin the version / put it on change management. Fargate has no manual option. With manual, creating the VPC endpoint is self-responsibility.
Terraform: aws_guardduty_detector_feature's RUNTIME_MONITORING + the three toggles (EKS_ADDON_MANAGEMENT / ECS_FARGATE_AGENT_MANAGEMENT / EC2_AGENT_MANAGEMENT). For new builds use RUNTIME_MONITORING (EKS_RUNTIME_MONITORING is in the consolidation direction).
Coverage confirmation (mandatory): enabled ≠ protected. Healthy = the state where the three points Runtime Monitoring enabled + VPC endpoint + agent placement are all in place. With Unhealthy, findings are zero. Confirm mechanically with get-coverage-statistics / list-coverage and incorporate into CI.
Lightness in numbers: EC2 is up to 10% of total vCPU, the EKS add-on is CPU 200m–1000m, memory 256Mi–1024Mi (configurable from v1.5.0 onward). Measure with Container Insights.
Cost discipline: GuardDuty's biggest cost driver, proportional to protected vCPU. Narrow to important workloads. However, the VPC Flow Logs analysis cost for EC2 where the agent runs is exempted (no double charge) — incorporate it into the estimate. Measure with the 30-day free trial.
The trap of support scope: EKS on Fargate / EKS Hybrid Nodes are unsupported. ECS on EC2 nodes doesn't support the ECS attack sequence (Fargate-ECS or EC2 is the premise). Always confirm before designing.
The support matrix: the supported OS/kernel/K8s versions are updated frequently. Don't hard-code; confirm the official verified platforms. Don't forget kernel requirements like CONFIG_DEBUG_INFO_BTF=y.

Runtime Monitoring is not "a box that, once enabled, sees inside for you," but a feature that produces value only once you've fully designed '① on which surface, ② how to distribute, ③ confirm coverage, and ④ control cost.' The biggest leverage lies in the discipline of "narrow to important workloads, guarantee coverage Healthy in CI, and optimize cost including the Flow Logs exemption."

I cross-implemented IAM, observability, and DR on a multi-account serverless payment platform, and have run production workloads on ECS on Fargate. I design the introduction of Runtime Monitoring with the same philosophy — ① discern the important workloads and narrow the scope, ② choose automatic/manual by requirements, ③ incorporate coverage Healthy into the verification path, and ④ read the cost incorporating vCPU billing and the Flow Logs exemption. Structurally crush the "thought-I-enabled-it" hole, and fully put detection into operation.

"How to design Runtime Monitoring for your EKS / Fargate / EC2, how much to leave to automatic management, and how to hold down cost" — from selecting the distribution surface to Terraform implementation, coverage verification, cost estimation, and attack-sequence integration, I can accompany you fast and safely with one person × generative AI (Claude Code). Even from the requirement-organizing stage, please feel free to consult me.

References (official documentation)

GuardDuty Runtime Monitoring — Runtime Monitoring's definition, supported resources, the security agent's observation targets
How Runtime Monitoring works — the two stages of enablement + agent management, VPC endpoint, the VPC Flow Logs double-charge exemption
Managing GuardDuty security agents — automatic/manual management, per-surface agent management, the premise of Healthy/Unhealthy
Reviewing runtime coverage statistics and troubleshooting — definition of the coverage status (Healthy = the three conditions met / Unhealthy = zero findings)
Prerequisites for Amazon EC2 instance support — SSM management, supported architectures, the EC2 CPU limit (10% of total vCPU), kernel requirements
Prerequisites for Amazon EKS cluster support — verified platforms (OS/kernel/K8s correspondence table), the EKS add-on's CPU/memory limits, Fargate-EKS / Hybrid Nodes unsupported
Configure GuardDuty security agent (add-on) parameters for Amazon EKS — v1.5.0+'s CPU/memory, PriorityClass, dnsPolicy settings
GuardDuty Runtime Monitoring finding types — the list of *:Runtime/* finding types and the sanitization caveat
Amazon GuardDuty pricing — Runtime Monitoring's protected-vCPU-based billing, the 30-day free trial

Running GuardDuty Runtime Monitoring in production on EKS / ECS-Fargate / EC2: security agent, coverage, cost, troubleshooting

0. Mental model: Runtime Monitoring is "the eye that sees inside the workload"

1. Why foundational detection alone isn't enough: what Runtime Monitoring adds

2. The three distribution surfaces: how the agent arrives

2.1 EKS: the managed add-on `aws-guardduty-agent`

2.2 ECS-Fargate: the injected sidecar

2.3 EC2: the agent via SSM

3. Enable it with Terraform: `RUNTIME_MONITORING` and the three toggles

4. Confirm coverage: plug the "thought-I-enabled-it" hole

4.1 What Healthy / Unhealthy means

4.2 Confirm coverage with bash

4.3 Typical causes of Unhealthy and troubleshooting

5. Is the agent "lightweight": pin down the CPU/memory limits with numbers

6. Cost discipline: the biggest driver, proportional to protected vCPU

6.1 Billing is "the vCPU scale of the protected workload × running time"

6.2 So "narrow it to important workloads"

6.3 The "VPC Flow Logs double-charge exemption" worth knowing

7. Reading findings: Runtime Monitoring's type families

8. Decision table: on which surface, automatic or manual, to put it in

9. Summary: GuardDuty Runtime Monitoring production cheat sheet

References (official documentation)

Designing AWS Threat Detection in Production with Amazon GuardDuty: Protection Plans, Extended Threat Detection, Org-Wide Bulk Enablement, and EventBridge Automated Response, in Real Code

GuardDuty × Amazon Detective: The 'Next Step' After Detection—A Workflow to Investigate Root Cause and Blast Radius

Amazon GuardDuty pricing and cost optimization (FinOps): decompose the billing model, cut waste, and predict the bill

GuardDuty EKS Protection: Detecting Control-Plane Threats (Anonymous Access, RBAC Tampering, Privilege Escalation) with Kubernetes Audit Logs

Also worth reading

Echo file-upload production design: receiving safely with multipart, S3 streaming, presigned URLs, and validation

A complete conquest of SSRF attacks [2026]: cloud-metadata theft, blind SSRF, filter bypass — a version faithful to the official docs

The Complete AWS CloudTrail Guide (2026 Edition): Designing API Activity Auditing, Trails, CloudTrail Lake, Athena Analysis, and Real-Time Detection at Production Quality

0. Mental model: Runtime Monitoring is "the eye that sees inside the workload"

1. Why foundational detection alone isn't enough: what Runtime Monitoring adds

2. The three distribution surfaces: how the agent arrives

2.1 EKS: the managed add-on aws-guardduty-agent

2.2 ECS-Fargate: the injected sidecar

2.3 EC2: the agent via SSM

3. Enable it with Terraform: RUNTIME_MONITORING and the three toggles

4. Confirm coverage: plug the "thought-I-enabled-it" hole

4.1 What Healthy / Unhealthy means

4.2 Confirm coverage with bash

4.3 Typical causes of Unhealthy and troubleshooting

5. Is the agent "lightweight": pin down the CPU/memory limits with numbers

6. Cost discipline: the biggest driver, proportional to protected vCPU

6.1 Billing is "the vCPU scale of the protected workload × running time"

6.2 So "narrow it to important workloads"

6.3 The "VPC Flow Logs double-charge exemption" worth knowing

7. Reading findings: Runtime Monitoring's type families

8. Decision table: on which surface, automatic or manual, to put it in

9. Summary: GuardDuty Runtime Monitoring production cheat sheet

References (official documentation)

Related articles

Designing AWS Threat Detection in Production with Amazon GuardDuty: Protection Plans, Extended Threat Detection, Org-Wide Bulk Enablement, and EventBridge Automated Response, in Real Code

GuardDuty × Amazon Detective: The 'Next Step' After Detection—A Workflow to Investigate Root Cause and Blast Radius

Amazon GuardDuty pricing and cost optimization (FinOps): decompose the billing model, cut waste, and predict the bill

GuardDuty EKS Protection: Detecting Control-Plane Threats (Anonymous Access, RBAC Tampering, Privilege Escalation) with Kubernetes Audit Logs

Also worth reading

Echo file-upload production design: receiving safely with multipart, S3 streaming, presigned URLs, and validation

A complete conquest of SSRF attacks [2026]: cloud-metadata theft, blind SSRF, filter bypass — a version faithful to the official docs

The Complete AWS CloudTrail Guide (2026 Edition): Designing API Activity Auditing, Trails, CloudTrail Lake, Athena Analysis, and Real-Time Detection at Production Quality

2.1 EKS: the managed add-on `aws-guardduty-agent`

3. Enable it with Terraform: `RUNTIME_MONITORING` and the three toggles