# Running GuardDuty Runtime Monitoring in production on EKS / ECS-Fargate / EC2: security agent, coverage, cost, troubleshooting

> A production-operation guide for GuardDuty Runtime Monitoring. It explains, with Terraform/bash, how the eBPF agent observes OS-level processes, files, and networking from the inside, the three distribution surfaces EKS/ECS-Fargate/EC2 and automatic/manual management, coverage (Healthy/Unhealthy) confirmation, the VPC Flow Logs double-charge exemption, and the discipline of vCPU billing.

- Published: 2026-06-27
- Author: 友田 陽大
- Tags: セキュリティ, AWS, GuardDuty, EKS, コンテナ
- URL: https://tomodahinata.com/en/blog/aws-guardduty-runtime-monitoring-eks-ecs-fargate-ec2-guide
- Category: Amazon GuardDuty in production
- Pillar guide: https://tomodahinata.com/en/blog/aws-guardduty-threat-detection-multi-account-terraform-eventbridge-guide

## Key points

- Runtime Monitoring uses an eBPF security agent to observe OS-level process execution, file access, and network connections from the 'inside' of the workload. It's the layer that fills in the container/host-internal behavior that foundational detection (CloudTrail/VPC Flow/DNS) can't see.
- There are three agent distribution surfaces: EKS is the managed add-on `aws-guardduty-agent`, ECS-Fargate is a GuardDuty-managed sidecar, EC2 is via SSM. Only Fargate is automatic-management-only (no manual option).
- Confirm 'are you really protected' by coverage status (Healthy/Unhealthy). It becomes Healthy only when the three points — Runtime Monitoring enablement, VPC endpoint creation, and agent placement — are all in place. With Unhealthy, no findings come out at all.
- Cost discipline is needed: Runtime Monitoring is GuardDuty's biggest cost driver, proportional to protected vCPU. Enable it narrowed to important workloads. However, if the agent runs on EC2, the VPC Flow Logs analysis cost for that portion is exempted, avoiding double charging.
- ECS/EC2 attack sequences (Extended Threat Detection) presuppose Runtime Monitoring. Fargate-ECS and EC2 are supported, but ECS on EC2 nodes doesn't support attack sequences — design grasping the scope accurately.

---

"I installed GuardDuty, but can I notice if something weird runs inside a container?" — in a setting where I'm consulted about container-platform security, this is a question that hits the core.

The answer is "with the foundational GuardDuty **alone,** you can't notice." Agentless foundational detection (CloudTrail, VPC Flow Logs, DNS) sees AWS's **control-plane operations** and the **outline of the network.** But OS-level behavior that happens **inside the workload** — like "an unknown binary was executed inside a container," "`/etc/shadow` was tampered with," or "a reverse shell was set up" — isn't visible from the outside. What fills this in is **GuardDuty Runtime Monitoring.**

This article is an implementation guide for designing and operating Runtime Monitoring at **production quality** across the three surfaces of **EKS / ECS-Fargate / EC2.** GuardDuty's overall design (protection-plan selection, organizational control, EventBridge auto-response) is left to [this pillar article](/blog/aws-guardduty-threat-detection-multi-account-terraform-eventbridge-guide); this piece concentrates on **digging one level deeper into that pillar's single line "Runtime Monitoring" — down to the distribution architecture, coverage confirmation, cost, and troubleshooting.** I cross-implemented IAM, observability, and DR on a multi-account-AWS [serverless payment platform](/case-studies/payment-platform-reliability), and have run production workloads on [ECS on Fargate](/blog/aws-ecs-fargate-production-guide) — on that experience as a foundation, I'll talk about "**how to design and implement Runtime Monitoring in your environment.**"

> **The rules of this article**: the specs, supported OS/architecture, kernel requirements, the agent's resource limits, and the pricing model are based on the **AWS official documentation (as of June 2026).** Since the supported OS/kernel/Kubernetes versions are updated frequently, **always confirm the official "support matrix (verified platforms)" before going to production** (in the body too I avoid enumerating specific versions and show the confirmation procedure). And one more thing — **Runtime Monitoring is one feature (one layer) of GuardDuty, and doesn't substitute for WAF, least-privilege IAM, image scanning, or network isolation.** Since the cost is **proportional to protected vCPU,** the basics of the design are not "put it in everything" but "narrow it to important workloads."

---

## 0. Mental model: Runtime Monitoring is "the eye that sees inside the workload"

Before starting the design, let's fix in one line **what** Runtime Monitoring **complements** about foundational detection.

> **Runtime Monitoring = GuardDuty's optional protection plan that places an eBPF security agent on EKS / ECS-Fargate / EC2 workloads, observes OS-level "process execution, file access, and network connections" from the inside, and generates findings.**

The official definition goes like this — *"Runtime Monitoring observes and analyzes operating system-level, networking, and file events"*, and the agent visualizes *"file access, process execution, command line arguments, and network connections."* From here, three consequences emerge.

1. **Foundational detection is "the outline," Runtime Monitoring is "the inside."** The foundational VPC Flow Logs see "which IP it communicated with" from the **outside of the network.** Runtime Monitoring sees, from the **inside of the host,** down to "**which process** initiated that communication," "what the command-line arguments are," and "what the parent process is (process lineage)." For example, cryptocurrency mining can also be caught by domain with foundational DNS detection, but with Runtime Monitoring it captures the fact that "a miner binary was actually executed" as `Impact:Runtime/CryptoMinerExecuted`.
2. **To see, you need an "agent."** Whereas foundational detection is agentless, Runtime Monitoring requires an **eBPF-based security agent that runs on the workload.** **How to distribute** this agent is the main subject of this article (the distribution differs between EKS / Fargate / EC2).
3. **"Enabled" ≠ "protected."** Detection works only once the agent is correctly placed and runtime events reach GuardDuty via the VPC endpoint. What represents this "is it really arriving" is the **coverage status (Healthy / Unhealthy).** Leaving it `Unhealthy` becomes a hole that, while you think you've enabled it, **produces no findings at all** (chapter 4).

Grasping these three points, you see that introducing Runtime Monitoring is the four designs of **"① on which surface, ② how to distribute (automatic or manual), ③ confirm it's really arriving, and ④ control the cost."** Let's build them in order.

---

## 1. Why foundational detection alone isn't enough: what Runtime Monitoring adds

As seen in the [pillar article](/blog/aws-guardduty-threat-detection-multi-account-terraform-eventbridge-guide), enabling GuardDuty turns the foundational data sources on immediately. So what does Runtime Monitoring specifically **make visible additionally**? Lining it up along the attack kill chain makes the complementary relationship clear.

| Attack stage | What foundational detection (agentless) sees | What Runtime Monitoring sees additionally |
| --- | --- | --- |
| Initial intrusion | Access from a known malicious IP (VPC Flow) | **Suspicious shell spawning** inside a container, new-binary execution |
| Execution | (hard to see) | `Execution:Runtime/NewBinaryExecuted`, `ReverseShell`, `SuspiciousTool` |
| Privilege escalation | Anomalies in IAM operations (CloudTrail) | **Docker socket access,** `runc` container escape, elevation to root |
| Defense evasion | (hard to see) | process injection, fileless execution, kernel module loading |
| Persistence | (hard to see) | **modification of sensitive files** (`Persistence:Runtime/SensitiveFileModified`) |
| C&C / exfiltration | DNS queries to a malicious domain | **which process** initiated that communication (with process lineage) |

The key point is that **"foundational detection captures 'signs observable from outside,' Runtime Monitoring captures 'behavior observable only inside the host.'"** The typical scenario the official cites goes like this — *a single container running a vulnerable web app is compromised, and using misconfigured credentials as a foothold, access spreads to the entire account.* This chain of "container compromise → privilege escalation → data access" **can't be caught in the early stage unless you look inside the container.**

And one more decisive thing is **the relationship with Extended Threat Detection (attack sequences).** Details are left to the [attack-sequence article](/blog/aws-guardduty-extended-threat-detection-attack-sequence-findings-guide), but just the conclusion:

- The **ECS attack sequence** (`AttackSequence:ECS/CompromisedCluster`) **presupposes Runtime Monitoring on Fargate-ECS or EC2.**
- The **EC2 attack sequence** (`AttackSequence:EC2/CompromisedInstanceGroup`) is **strengthened** by Runtime Monitoring.

In other words, if you "want to bundle ECS/EC2 multi-stage attacks into one Critical finding," Runtime Monitoring becomes **a prerequisite, not an optional plan.** This is the biggest reason "it's worth putting in for important workloads."

---

## 2. The three distribution surfaces: how the agent arrives

The essential difficulty of Runtime Monitoring is that **the same "security agent" is distributed in completely different ways per resource type.** The official collectively calls EKS, Fargate-ECS, and EC2 the **"resource types."**

| Resource type | Agent distribution form | Automatic management | Manual management |
| --- | --- | --- | --- |
| **Amazon EKS** | Managed add-on **`aws-guardduty-agent`** (placed on each node as a DaemonSet) | Possible (GuardDuty deploys/updates the add-on) | **Possible** (manage the add-on's lifecycle/version yourself) |
| **ECS (AWS Fargate)** | Inject a GuardDuty-managed **sidecar container** into the task | Possible | **Not possible** (automatic management only; no manual option) |
| **Amazon EC2** | Introduce/update the agent on the host via **SSM (AWS Systems Manager)** | Possible (automatic deployment via SSM) | Possible (install/update yourself) |

Understanding this accurately is the foundation of the design judgments. Let's look at each of the three surfaces' "sweet spots."

### 2.1 EKS: the managed add-on `aws-guardduty-agent`

On EKS, the agent is distributed as the **EKS managed add-on `aws-guardduty-agent`.** Its substance is a **DaemonSet** within the cluster (1 Pod per node).

- **Automatic management**: GuardDuty deploys/updates the add-on as needed. *"GuardDuty manages the security agent (EKS add-on) on your behalf, it updates the add-on, as needed"*. Default values are set for the configuration parameters (CPU/memory, `PriorityClass`, `dnsPolicy`), but you can override them yourself if needed.
- **Manual management**: you hold the add-on's version and lifecycle yourself. Choose this when you **want to strictly pin the version / put it on the cluster's change-management process.** In the manual case, you need to confirm yourself **whether the Kubernetes version supports the agent version** (there's an official correspondence table — described later).

> **EKS support scope (important)**: Runtime Monitoring supports **EKS clusters on EC2 nodes** and **EKS Auto Mode.** On the other hand, **EKS Hybrid Nodes** and **EKS clusters on AWS Fargate are unsupported** (*"GuardDuty doesn't support Amazon EKS clusters running on AWS Fargate"*). If you "run EKS on Fargate," you can't protect with this feature — always confirm at design time.

### 2.2 ECS-Fargate: the injected sidecar

On Fargate, since you can't touch the host OS, the agent is injected into the task as a **sidecar container.** Here's the most important constraint — **Fargate (ECS) is automatic-management-only and has no manual-management option** (the official also clearly states *"with an exception to Fargate (Amazon ECS only)"*). GuardDuty fully handles placement and updates.

If you [run containers in production on Fargate](/blog/aws-ecs-fargate-production-guide), Runtime Monitoring is **the surface with the lightest introduction.** No host patching or SSM management is needed, and once enabled, GuardDuty takes care of the sidecar. In return, **you have no room to control the version or placement yourself** — this is a trade-off consistent with Fargate's "serverless, leave it to us" nature.

### 2.3 EC2: the agent via SSM

On EC2, the agent is distributed through **SSM (AWS Systems Manager).** *"GuardDuty uses AWS Systems Manager (SSM) to automatically deploy, install, and manage the security agent on your instances"*.

- **Premise of automatic management**: the instance is **under SSM management** (a state shown in Fleet Manager). If you use automatic management, this is a **hard requirement.**
- **Manual management**: *"If you plan to manually install and manage the GuardDuty agent, SSM is not required"*. In an environment that doesn't/can't use SSM (a special base image, etc.), choose manual.
- **Exclusion tag**: if you want to **exclude a specific instance** under automatic management, attach the `GuardDutyManaged:false` tag **before launch** and enable tag reference of the instance metadata (IMDS). It's effective for the "want to put it in everything but exclude only some" operation.

---

## 3. Enable it with Terraform: `RUNTIME_MONITORING` and the three toggles

Once the design is firm, drop it into code. Runtime Monitoring is added with the **`aws_guardduty_detector_feature`** resource, separate from the `aws_guardduty_detector` body.

```hcl
# Runtime Monitoring を有効化し、各面のエージェント配置を GuardDuty に自動管理させる。
# コストが保護 vCPU に比例して増えるため、「本当にランタイム可視性が要る環境」に絞って有効化する。
resource "aws_guardduty_detector_feature" "runtime_monitoring" {
  detector_id = aws_guardduty_detector.this.id
  name        = "RUNTIME_MONITORING"
  status      = "ENABLED"

  # --- 面ごとに「エージェントを GuardDuty に自動管理させるか」を個別に切り替える ---
  # ここを ENABLED にすると、その面のエージェント配置・更新を GuardDuty が担う(自動管理)。
  # DISABLED にすると、Runtime Monitoring 自体は有効でも、その面のエージェントは
  # 「手動管理」(自分でアドオン/SSM を回す)前提になる。

  additional_configuration {
    name   = "EKS_ADDON_MANAGEMENT" # EKS: aws-guardduty-agent アドオンを自動デプロイ・更新
    status = "ENABLED"
  }
  additional_configuration {
    name   = "ECS_FARGATE_AGENT_MANAGEMENT" # Fargate: サイドカーを自動注入(手動の選択肢なし)
    status = "ENABLED"
  }
  additional_configuration {
    name   = "EC2_AGENT_MANAGEMENT" # EC2: SSM 経由でエージェントを自動管理
    status = "ENABLED"
  }
}
```

Three points of the design.

- **The toggles are "per-surface automatic-management switches."** On top of setting `RUNTIME_MONITORING` to `ENABLED`, switch `EKS_ADDON_MANAGEMENT` / `ECS_FARGATE_AGENT_MANAGEMENT` / `EC2_AGENT_MANAGEMENT` individually. If you **want to make only EKS manual,** set `EKS_ADDON_MANAGEMENT` to `DISABLED` and manage the add-on separately (next section).
- **The EKS add-on can be made explicit as a separate resource.** With automatic management (`EKS_ADDON_MANAGEMENT = ENABLED`), GuardDuty installs the add-on, but **if you manage it manually, make the EKS add-on explicit in Terraform.**

```hcl
# 【手動管理を選んだ場合のみ】EKS アドオンを自分で管理する。
# バージョンを固定し、クラスタの変更管理プロセスに乗せたいときに有効。
# 自動管理(EKS_ADDON_MANAGEMENT = ENABLED)なら、このリソースは不要。
resource "aws_eks_addon" "guardduty_agent" {
  cluster_name = aws_eks_cluster.this.name
  addon_name   = "aws-guardduty-agent"

  # バージョンは公式の対応表で「使用中の Kubernetes バージョンが
  # そのエージェントバージョンをサポートするか」を確認してから固定する。
  # ハードコードせず、変数 or data source で管理する(対応表は更新されるため)。
  addon_version = var.guardduty_agent_addon_version

  # 既存設定との衝突時の挙動。手動でパラメータ(CPU/メモリ等)を上書きするなら PRESERVE。
  resolve_conflicts_on_update = "PRESERVE"
}
```

- **If you enable it in bulk across the organization,** you can roll out `RUNTIME_MONITORING` org-wide with the same structure as the pillar article's `aws_guardduty_organization_configuration_feature`. But **since it's a cost driver, estimate the cost before defaulting it ON org-wide** (chapter 6).

> **The relationship with `EKS_RUNTIME_MONITORING` (needs confirmation)**: historically, there was an **independent feature** called `EKS_RUNTIME_MONITORING`, dedicated to EKS. Currently, **consolidation into `RUNTIME_MONITORING` (the unified feature that bundles EKS/ECS/EC2) is progressing.** For new builds, **using `RUNTIME_MONITORING`** is the correct answer. Since the migration feasibility/deadline for existing environments using `EKS_RUNTIME_MONITORING` may be revised, **always confirm the current treatment in the official documentation** (at the time of writing, the fact of "the consolidation direction" is confirmed; the specific deprecation schedule follows the official).

---

## 4. Confirm coverage: plug the "thought-I-enabled-it" hole

This is the point where **accidents most often occur** in Runtime Monitoring operation. Just "enabled it in Terraform" is insufficient, and you must separately confirm **whether the agent is really delivering events.** What represents this is the **coverage status.**

### 4.1 What Healthy / Unhealthy means

The official definition is clear. The coverage status becomes `Healthy` only when **three conditions** are all in place.

> *Coverage status is determined by making sure that you have enabled Runtime Monitoring, your Amazon VPC endpoint has been created, and the GuardDuty security agent for the corresponding resource has been deployed.*

| Status | Meaning | Consequence |
| --- | --- | --- |
| **Healthy** | The three points are in place — ① Runtime Monitoring enabled ② VPC endpoint created ③ agent placed — and runtime events reach GuardDuty | findings can be generated normally |
| **Unhealthy** | There's a problem in any of the above (config, VPC endpoint, agent placement) | **GuardDuty can't receive events and generates no Runtime Monitoring findings at all** |

What's decisive is — **while `Unhealthy`, findings are zero.** Whether "I enabled it but no alerts come" is because "it's peaceful" or because "I'm blind with `Unhealthy`" **can't be distinguished unless you look at the coverage.** So always make "enabling" and "confirming coverage Healthy" a **set.**

### 4.2 Confirm coverage with bash

GuardDuty has **APIs that return coverage statistics and the status of individual resources.** You can incorporate them into a CI or post-deploy verification script.

```bash
#!/usr/bin/env bash
# Runtime Monitoring のカバレッジを確認し、Unhealthy なリソースを洗い出す。
# デプロイ後の検証 or 定期監査に使う。「有効化したつもり」の穴を機械的に塞ぐ。
set -euo pipefail

DETECTOR_ID="${1:?Usage: check-coverage.sh <detector-id>}"

# ① 種別ごとのカバレッジ統計(Healthy/Unhealthy のカウント)を取得。
#    全体像を一目で掴む。
echo "=== Coverage statistics by resource type ==="
aws guardduty get-coverage-statistics \
  --detector-id "$DETECTOR_ID" \
  --statistics-type COUNT_BY_COVERAGE_STATUS \
  --output table

# ② Unhealthy なリソースだけを列挙(個別の調査対象を特定)。
#    フィルタで coverage status = UNHEALTHY のものに絞る。
echo "=== Unhealthy resources (need troubleshooting) ==="
aws guardduty list-coverage \
  --detector-id "$DETECTOR_ID" \
  --filter-criteria '{
    "FilterCriterion": [
      { "CriterionKey": "COVERAGE_STATUS", "FilterCondition": { "Equals": ["UNHEALTHY"] } }
    ]
  }' \
  --query 'Resources[].{Id:ResourceId,Type:ResourceDetails.ResourceType,Status:CoverageStatus,Issue:Issue}' \
  --output table
```

> **Build the verification path first**: I recommend placing this script in a **post-deploy hook in CI.** Run it right after turning `RUNTIME_MONITORING` ON in Terraform, and **don't consider the deploy "complete" until you confirm the important workloads are `Healthy`** — the most effective guard that structurally crushes "thought-I-enabled-it."

### 4.3 Typical causes of Unhealthy and troubleshooting

The cause of `Unhealthy` is that somewhere in the three conditions is missing. Triage per surface.

| Surface | Common Unhealthy cause | Direction of action |
| --- | --- | --- |
| **Common** | **VPC endpoint not created / unreachable** (in manual management you need to create it yourself) | Confirm the endpoint's existence and SG/routes |
| **EC2** | The instance is **not under SSM management** (doesn't meet the premise of automatic management) | Confirm the SSM Agent's introduction and IAM role |
| **EC2** | The OS/kernel is **out of support,** or the `CONFIG_DEBUG_INFO_BTF` flag is unset | Confirm the official verified platforms |
| **EKS** | The add-on `aws-guardduty-agent` is **not deployed** / the K8s version is unsupported | Confirm the add-on's status and the supported-version table |
| **EKS** | The agent Pod has **reached the CPU/memory limit** | Adjust the add-on's resource settings (chapter 5) |
| **Organization** | An SCP denies **`guardduty:SendSecurityTelemetry`** | Confirm the SCP's permission boundary |

If you chose manual management, note that **creating the VPC endpoint is your responsibility** (with automatic management GuardDuty creates it). *"Install agent manually, which requires you to create the VPC endpoint as a prerequisite."* — forgetting this lands you in the typical pitfall where, even though the agent is running, events don't arrive and it stays `Unhealthy`.

---

## 5. Is the agent "lightweight": pin down the CPU/memory limits with numbers

When you hear "put a security agent into production," what first concerns you is **the load on the workload.** The basis for Runtime Monitoring's agent being called "lightweight" is that **explicit limits are set.** Let's pin it down with official numbers, not guesses.

| Surface | CPU limit | Memory limit | Note |
| --- | --- | --- | --- |
| **EC2** | **up to 10% of total vCPU cores** | follows the official table | with 4 vCPU, up to 40% (= 40% of 400%) |
| **EKS** (add-on `aws-guardduty-agent`) | **200m–1000m** (0.2–1.0 vCPU) | **256Mi–1024Mi** | the values are configurable from add-on v1.5.0 onward |
| **ECS-Fargate** | follows the sidecar's resource allocation | same as left | monitor the measured values with Container Insights |

Three points.

- **EC2 is capped by "ratio."** *"The maximum CPU limit for the GuardDuty security agent associated with Amazon EC2 instances is 10 percent of the total vCPU cores."* The larger the instance, the larger the absolute amount, but since the **ratio is constant (10%),** it's a design that doesn't easily crowd the workload.
- **EKS is capped by "absolute value."** The add-on's defaults are CPU `200m`–`1000m`, memory `256Mi`–`1024Mi`. **From v1.5.0 onward, `CPU` / `Memory` / `PriorityClass` / `dnsPolicy` are configurable,** and you can adjust them to the workload and instance size. If Insights shows "it's reaching the limit," raise these.
- **Monitor with measurement.** The official recommends monitoring CPU/memory consumption with **Container Insights** (both ECS and EKS). **Don't fully believe "lightweight" — measure it** — this is the production discipline.

> **A note on `PriorityClass`**: the EKS add-on's default `PriorityClass` is a setting that "doesn't give the agent Pod special treatment based on priority." When the node is under resource pressure, if the agent Pod is **evicted first,** at that moment the coverage drops to `Unhealthy`. For a cluster where you don't want to break the security visibility, consider changing it to `system-node-critical`, etc., **including the trade-off (the eviction risk of important workloads vs. the continuity of monitoring).**

---

## 6. Cost discipline: the biggest driver, proportional to protected vCPU

This is the **core of the decision-making** in Runtime Monitoring operation. Runtime Monitoring is **billed proportionally to protected vCPU,** and among GuardDuty's protection plans, it's the feature **most likely to become the most expensive.** GuardDuty's overall cost optimization is left to the [dedicated article](/blog/aws-guardduty-cost-optimization-pricing-finops-guide), but here are three points specific to Runtime Monitoring.

### 6.1 Billing is "the vCPU scale of the protected workload × running time"

The official billing unit is based on **"the number of vCPUs of the provisioned instances/tasks protected as monitoring targets × running time."** In other words, **the larger the vCPU of the workload and the longer you protect it, the higher.** The nature of the billing differs from foundational detection (event volume, GB billing), and it's **directly tied to "the scale of the assets."** So — **"just put it in all EC2 and all clusters" is the shortest route for the vCPU billing to balloon snowball-style.**

### 6.2 So "narrow it to important workloads"

The basics of the design are **"narrow the scope by importance."** The phased introduction I recommend:

1. **Start from the surface that needs attack sequences**: first put it in **production important workloads** where you want to get ECS/EC2 attack sequences (Critical findings).
2. **Surfaces handling sensitive data**: workloads with a large impact when compromised, like payments, personal information, and authentication infrastructure.
3. **Forgo the rest, or explicitly exclude it with an EC2 exclusion tag.**

Rather than "put it all in for peace of mind," **"concentrate investment in high-impact assets and maximize cost-effectiveness including the ETD correlation effect"** wins with a limited security budget.

### 6.3 The "VPC Flow Logs double-charge exemption" worth knowing

In discussing Runtime Monitoring's cost, this is **an easily-overlooked relief.** The official text:

> *When you manage the security agent ... and GuardDuty is presently deployed on an Amazon EC2 instance and receives the Collected runtime event types from this instance, GuardDuty will not charge your AWS account for the analysis of VPC flow logs from this Amazon EC2 instance. This helps GuardDuty avoid double usage cost in the account.*

In other words — **if the agent runs on an EC2 instance and GuardDuty receives that instance's runtime events, GuardDuty doesn't charge for that instance's VPC Flow Logs analysis.** Since Runtime Monitoring sees the network connections via the agent, it's a design that **doesn't double-take with the foundational VPC Flow Logs analysis.** Runtime Monitoring's vCPU billing **offsets a part of the foundational Flow Logs billing** — incorporate this exemption in the cost estimate (if you don't know this and estimate "Runtime Monitoring is a pure add-on," it becomes overstated).

> **The 30-day free trial for new regions**: Runtime Monitoring also comes with a 30-day free trial. You can **grasp the expected bill at production volume before billing starts.** The judgment to narrow to important workloads gains further precision with the measurement during this trial period.

---

## 7. Reading findings: Runtime Monitoring's type families

Runtime Monitoring findings can be identified at a glance by their resource segment being **`Runtime`** (`type : Runtime / threat name`). It's a separate lineage from the foundational `EC2`/`S3`/`IAMUser`, etc. Let me organize the major families per attack stage.

| Family (ThreatPurpose) | Representative finding types | What it captures |
| --- | --- | --- |
| **Execution** | `Execution:Runtime/NewBinaryExecuted`, `ReverseShell`, `SuspiciousTool`, `NewLibraryLoaded` | execution of a new binary/library, reverse shell, suspicious tool |
| **PrivilegeEscalation** | `PrivilegeEscalation:Runtime/DockerSocketAccessed`, `RuncContainerEscape`, `ElevationToRoot`, `ContainerMountsHostDirectory` | Docker socket access, container escape, root elevation, host-directory mount |
| **DefenseEvasion** | `DefenseEvasion:Runtime/ProcessInjection.*`, `FilelessExecution`, `KernelModuleLoaded`, `PtraceAntiDebugging` | process injection, fileless execution, kernel module loading |
| **Persistence** | `Persistence:Runtime/SensitiveFileModified`, `SuspiciousCommand` | modification of sensitive files, a command aiming for persistence |
| **Impact** | `Impact:Runtime/CryptoMinerExecuted`, `MaliciousDomainRequest.Reputation`, etc. | execution of a cryptocurrency miner, communication to a malicious domain |
| **CryptoCurrency** | `CryptoCurrency:Runtime/BitcoinTool.B` (has a `!DNS` derivative) | inquiries to cryptocurrency-related IPs/domains |
| **Backdoor** | `Backdoor:Runtime/C&CActivity.B` (has a `!DNS` derivative) | communication with a C&C server |
| **Trojan** | `Trojan:Runtime/BlackholeTraffic`, `DropPoint`, `DGADomainRequest.C!DNS`, etc. | typical trojan communication patterns |
| **UnauthorizedAccess** | `UnauthorizedAccess:Runtime/TorRelay`, `TorClient`, `MetadataDNSRebind` | Tor relay/client, metadata DNS rebind |
| **Discovery** | `Discovery:Runtime/SuspiciousCommand` | a command aiming for reconnaissance |

Two implementation points.

- **You can route by the `Runtime` segment.** With EventBridge auto-response (chapter 6 of the pillar article), for those whose `type` is `*:Runtime/*`, you can consider, as "a compromise inside the container/host," a response that goes beyond just network isolation, even into **stopping/isolating the relevant Pod/task.** Since the process-lineage information is included in the finding, the investigation is fast.
- **Sanitizing finding fields is mandatory**: an important caveat the official clearly states — because Runtime Monitoring findings **include file paths, etc., that an attacker can control,** *"When processing Runtime Monitoring findings outside of GuardDuty console, you must sanitize finding fields. For example, you can HTML encode finding fields."* When you display/forward findings on the web in auto-response or notifications, **always pass through sanitization like HTML escaping** (treat it as untrusted input). This is completely consistent with the "validate/escape external input" of my [route conventions](/blog/aws-ecs-fargate-production-guide).

---

## 8. Decision table: on which surface, automatic or manual, to put it in

Let me drop the above into practical judgment. **"On which workload, on which surface, automatic or manual"** in one sheet.

| Your situation | Should you put it in | Surface | Automatic / manual | Reason |
| --- | --- | --- | --- | --- |
| Running production APIs/workers on Fargate | **Put it in** (the lightest introduction) | ECS-Fargate | **automatic only** | GuardDuty fully manages sidecar injection. ECS attack sequence is also unlocked |
| Operating a production cluster on EKS (EC2 nodes) | **Put it in** | EKS | **automatic** in principle | GuardDuty updates the add-on. Low operational load |
| Want to strictly pin the version / put it on change management on EKS | Put it in | EKS | **manual** | Hold the add-on's lifecycle yourself. Confirming the K8s correspondence table is self-responsibility |
| Running sensitive workloads on EC2, already SSM-managed | **Put it in** | EC2 | **automatic** | Since SSM exists, automatic deployment is straightforward |
| EC2 but can't/won't use SSM | Put it in | EC2 | **manual** | SSM-independent. Creating the VPC endpoint is self-responsibility |
| Running EKS on **Fargate** | **Can't put it in** | — | — | **Unsupported** (EKS on Fargate / Hybrid Nodes are out of support) |
| **ECS** on EC2 nodes (not Fargate) | Can put it in (Runtime detection is possible) | EC2 | automatic/manual | But **the ECS attack sequence is unsupported** (Fargate-ECS or EC2 is the premise) |
| Small-scale workloads for dev/verification | **Forgo in principle** | — | — | The detection value worth the vCPU billing is thin. Explicitly exclude with an exclusion tag |

**There are only two axes for the decision**: ① **is that workload of an importance worth "runtime visibility / attack sequences"** (= is it worth paying the cost), and ② **do you need to hold version management yourself** (= automatic vs. manual). With these two axes, it falls into the table above.

---

## 9. Summary: GuardDuty Runtime Monitoring production cheat sheet

A quick reference for when you're lost.

- **What layer it is**: Runtime Monitoring is an optional protection plan that observes **the inside of the workload** (OS-level processes, files, networking) with an **eBPF security agent.** It complements agentless foundational detection (the outline).
- **The three distribution surfaces**: **EKS** = the managed add-on `aws-guardduty-agent` (DaemonSet) / **ECS-Fargate** = the injected sidecar (**automatic only**) / **EC2** = via SSM. Understanding that the distribution differs per surface is the first step.
- **Automatic vs. manual**: when in doubt, **automatic** (low operational load). **Manual is only when you want to pin the version / put it on change management.** Fargate has no manual option. With manual, **creating the VPC endpoint is self-responsibility.**
- **Terraform**: `aws_guardduty_detector_feature`'s `RUNTIME_MONITORING` + the three toggles (`EKS_ADDON_MANAGEMENT` / `ECS_FARGATE_AGENT_MANAGEMENT` / `EC2_AGENT_MANAGEMENT`). For new builds use `RUNTIME_MONITORING` (`EKS_RUNTIME_MONITORING` is in the consolidation direction).
- **Coverage confirmation (mandatory)**: enabled ≠ protected. **Healthy = the state where the three points Runtime Monitoring enabled + VPC endpoint + agent placement are all in place.** With `Unhealthy`, findings are zero. Confirm mechanically with `get-coverage-statistics` / `list-coverage` and incorporate into CI.
- **Lightness in numbers**: **EC2 is up to 10% of total vCPU,** **the EKS add-on is CPU 200m–1000m, memory 256Mi–1024Mi** (configurable from v1.5.0 onward). Measure with Container Insights.
- **Cost discipline**: GuardDuty's biggest cost driver, **proportional to protected vCPU.** **Narrow to important workloads.** However, **the VPC Flow Logs analysis cost for EC2 where the agent runs is exempted** (no double charge) — incorporate it into the estimate. Measure with the 30-day free trial.
- **The trap of support scope**: **EKS on Fargate / EKS Hybrid Nodes are unsupported.** **ECS on EC2 nodes doesn't support the ECS attack sequence** (Fargate-ECS or EC2 is the premise). Always confirm before designing.
- **The support matrix**: the supported OS/kernel/K8s versions are **updated frequently.** **Don't hard-code; confirm the official verified platforms.** Don't forget kernel requirements like `CONFIG_DEBUG_INFO_BTF=y`.

Runtime Monitoring is not "a box that, once enabled, sees inside for you," but a feature that **produces value only once you've fully designed '① on which surface, ② how to distribute, ③ confirm coverage, and ④ control cost.'** The biggest leverage lies in the discipline of **"narrow to important workloads, guarantee coverage Healthy in CI, and optimize cost including the Flow Logs exemption."**

I cross-implemented IAM, observability, and DR on a multi-account [serverless payment platform](/case-studies/payment-platform-reliability), and have run production workloads on [ECS on Fargate](/blog/aws-ecs-fargate-production-guide). I design the introduction of Runtime Monitoring with the same philosophy — **① discern the important workloads and narrow the scope, ② choose automatic/manual by requirements, ③ incorporate coverage Healthy into the verification path, and ④ read the cost incorporating vCPU billing and the Flow Logs exemption.** Structurally crush the "thought-I-enabled-it" hole, and fully put detection into operation.

**"How to design Runtime Monitoring for your EKS / Fargate / EC2, how much to leave to automatic management, and how to hold down cost" — from selecting the distribution surface to Terraform implementation, coverage verification, cost estimation, and attack-sequence integration, I can accompany you fast and safely with one person × generative AI (Claude Code).** Even from the requirement-organizing stage, please feel free to consult me.

---

### References (official documentation)

- [GuardDuty Runtime Monitoring](https://docs.aws.amazon.com/guardduty/latest/ug/runtime-monitoring.html) — Runtime Monitoring's definition, supported resources, the security agent's observation targets
- [How Runtime Monitoring works](https://docs.aws.amazon.com/guardduty/latest/ug/how-does-runtime-monitoring-work.html) — the two stages of enablement + agent management, VPC endpoint, **the VPC Flow Logs double-charge exemption**
- [Managing GuardDuty security agents](https://docs.aws.amazon.com/guardduty/latest/ug/runtime-monitoring-managing-agents.html) — automatic/manual management, per-surface agent management, the premise of Healthy/Unhealthy
- [Reviewing runtime coverage statistics and troubleshooting](https://docs.aws.amazon.com/guardduty/latest/ug/runtime-monitoring-assessing-coverage.html) — definition of the coverage status (Healthy = the three conditions met / Unhealthy = zero findings)
- [Prerequisites for Amazon EC2 instance support](https://docs.aws.amazon.com/guardduty/latest/ug/prereq-runtime-monitoring-ec2-support.html) — SSM management, supported architectures, **the EC2 CPU limit (10% of total vCPU)**, kernel requirements
- [Prerequisites for Amazon EKS cluster support](https://docs.aws.amazon.com/guardduty/latest/ug/prereq-runtime-monitoring-eks-support.html) — verified platforms (OS/kernel/K8s correspondence table), **the EKS add-on's CPU/memory limits**, Fargate-EKS / Hybrid Nodes unsupported
- [Configure GuardDuty security agent (add-on) parameters for Amazon EKS](https://docs.aws.amazon.com/guardduty/latest/ug/guardduty-configure-security-agent-eks-addon.html) — v1.5.0+'s CPU/memory, `PriorityClass`, `dnsPolicy` settings
- [GuardDuty Runtime Monitoring finding types](https://docs.aws.amazon.com/guardduty/latest/ug/findings-runtime-monitoring.html) — the list of `*:Runtime/*` finding types and the sanitization caveat
- [Amazon GuardDuty pricing](https://aws.amazon.com/guardduty/pricing/) — Runtime Monitoring's protected-vCPU-based billing, the 30-day free trial
