"This production setting — who changed it, and when?" — whether you can answer this question, the first to fly on the night of an incident, instantly changes the time to recovery.
For a system where "money moves," like a payment platform, this is not a matter of spirit. Whether you can keep, in a tamper-proof form, when, who, with what permission, from where, called which AWS API is the starting point for everything: compliance audits, fraud investigation, and root-cause analysis of failures. I designed and led the reliability layer of a serverless (Lambda + DynamoDB) payment platform and maintained zero double-charges in production, but underlying that is the discipline of building, from the start, a state where "correctness can be proven with code and an audit trail." Its core is AWS CloudTrail.
This article is an implementation guide for designing and operating CloudTrail at production quality. Without ending at "just enable it," we'll assemble it end-to-end — multi-region trail, encryption, integrity validation, real-time detection, long-term investigation, and cost optimization — with real Terraform / TypeScript / SQL code.
The rules of this article: specs, pricing, conditions, and feature statuses are all cross-checked against the AWS official docs (as of June 2026). Pricing and managed features in particular are revised fast, so always check the official pricing page and the latest docs before going to production. Account IDs (
111122223333), bucket names, and regions are illustrative.
0. Mental Model: CloudTrail Is the "API Ledger of an AWS Account"
Before starting design, let's fix in one line what CloudTrail is and is not.
CloudTrail = a service for governance, compliance, operational auditing, and risk auditing that records operations performed within an AWS account as "events." Operations via the console, CLI, SDK, or API — wherever they come through — are recorded.
The official definition is exactly this.
Actions taken by a user, role, or an AWS service are recorded as events in CloudTrail. Events include actions taken in the AWS Management Console, AWS Command Line Interface, and AWS SDKs and APIs.
From this come three consequences that matter in the field.
- CloudTrail is not application observability. Where OpenTelemetry looks at "what happened inside the app (traces, metrics, logs)," CloudTrail looks at "who hit which API on the AWS management and data planes." The two are complementary, and they're different things (for the app side's three pillars, go to Observability with OpenTelemetry).
- CloudTrail doesn't line up logs "in order." As the docs state explicitly, logs are not a stack trace, and events do not appear in a particular order. You sort by
eventTimeyourself to track the timeline. - "Enabled" and "usable as evidence" are different. As shown later, the default event history is 90 days, management events only. Permanent retention, tamper detection, and capturing all event types require you to create a trail yourself.
1. The Overall Map: The Four Event Types and "Event History vs. Trail"
The shortest route to understanding CloudTrail is to grasp two axes separately: "what gets recorded (event type)" and "where it accumulates (event history / trail / Lake)."
1-1. The Four Event Types That Get Recorded
The docs define four kinds. By default, only management events are recorded; data / Insights / network activity are not.
| Event type | What it records | Default | Billing |
|---|---|---|---|
| Management | Control-plane operations (RunInstances, CreateUser, ConsoleLogin, etc.). Read/write selectable separately | Recording ON | The first copy is free per region |
| Data | Data-plane operations (S3 object GetObject, Lambda Invoke, DynamoDB PutItem, etc.). High volume | OFF | Billed from the first copy |
| Insights | Anomaly detection of API call rate / error rate. Continuously analyzes both management and data | OFF | Billed per analysis target |
| Network activity | API activity via VPC endpoints. Detects approach by credentials from outside the org | OFF | Billed |
Network activity events are a relatively new feature that went GA in February 2025 (starting with 5 services — S3, EC2, KMS, Secrets Manager, CloudTrail — with the supported services continually expanding). Because you can see "who is using a VPC endpoint, from where," it's effective for detective controls of a data perimeter.
1-2. Where It Accumulates: Event History, Trail, CloudTrail Lake
This is the biggest misconception point. "CloudTrail is on by default" is half right and half dangerous.
-
Event history — automatically on and free from account creation. But the constraints are strong.
The Event history provides a viewable, searchable, downloadable, and immutable record of the past 90 days of management events in an AWS Region.
That is, "the past 90 days, management events only, a single region." Data events are not included, and it vanishes after 90 days. It is not a permanent record for use as evidence.
-
Trail — the setting that "continuously delivers events to S3" (optionally to CloudWatch Logs / EventBridge too). Retention beyond 90 days, data events, integrity validation, and all-region aggregation all presuppose a trail. This is what to do first in production.
-
CloudTrail Lake — a managed audit data lake you can query with Trino SQL. As discussed later, it stopped accepting new customers as of May 31, 2026 (existing customers can keep using it). For a new build, Athena + S3 becomes the realistic option.
The starting point of design: don't rely on event history. Create one multi-region trail and deliver it permanently to S3. This is the foundation. In the next chapter we'll pin down an "unbreakable initial setup" with Terraform.
2. The First Step: A Production-Quality Initial Setup of a Multi-Region Trail with Terraform
Creating a trail is instantaneous, but to make it production quality you need to satisfy the following five from the start. These are exactly the official security best practices (the full list is organized in §8).
- Multi-region (don't drop events from any region)
- SSE-KMS encryption (confidentiality at rest)
- Log-file integrity validation (detect tampering / deletion)
- Least-privilege S3 bucket policy (restrict to the trail with
aws:SourceArn) - CloudWatch Logs integration (the foundation of real-time monitoring)
There's a trap where the default differs between the console and CLI/API. The docs state plainly "All trails created using the CloudTrail console are multi-Region trails," while creating via CLI/API or Terraform defaults to single-region. So in IaC you must explicitly set
is_multi_region_trail = true.
2-1. The Delivery S3 Bucket (Versioning, Public Blocking, Encryption)
# 監査ログ専用バケット。本来は「ログアーカイブ専用アカウント」に隔離するのが理想(§8)。
resource "aws_s3_bucket" "trail" {
bucket = "prod-audit-trail-111122223333"
}
# 改ざん・誤削除に備えてバージョニングは必須
resource "aws_s3_bucket_versioning" "trail" {
bucket = aws_s3_bucket.trail.id
versioning_configuration { status = "Enabled" }
}
# 監査ログが公開されることは絶対にあってはならない
resource "aws_s3_bucket_public_access_block" "trail" {
bucket = aws_s3_bucket.trail.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
# S3側の既定暗号化(CloudTrail自身もKMSで暗号化するが、多層で固める)
resource "aws_s3_bucket_server_side_encryption_configuration" "trail" {
bucket = aws_s3_bucket.trail.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.trail.arn
}
bucket_key_enabled = true # KMSリクエストを集約してコスト削減
}
}
2-2. The Least-Privilege Bucket Policy (Constrain the Trail with aws:SourceArn)
For CloudTrail to write to the bucket, it needs two permissions: an ACL check and the write. The best practice is to restrict to "only writes from this trail" with the aws:SourceArn condition (preventing the confused-deputy problem).
data "aws_caller_identity" "current" {}
locals {
trail_arn = "arn:aws:cloudtrail:us-east-1:${data.aws_caller_identity.current.account_id}:trail/org-audit-trail"
}
data "aws_iam_policy_document" "trail_bucket" {
# ① CloudTrail がバケットACLを確認する許可
statement {
sid = "AWSCloudTrailAclCheck"
actions = ["s3:GetBucketAcl"]
resources = [aws_s3_bucket.trail.arn]
principals {
type = "Service"
identifiers = ["cloudtrail.amazonaws.com"]
}
condition {
test = "StringEquals"
variable = "aws:SourceArn"
values = [local.trail_arn]
}
}
# ② ログオブジェクトの書き込み許可(bucket-owner-full-control 必須)
statement {
sid = "AWSCloudTrailWrite"
actions = ["s3:PutObject"]
resources = ["${aws_s3_bucket.trail.arn}/AWSLogs/${data.aws_caller_identity.current.account_id}/*"]
principals {
type = "Service"
identifiers = ["cloudtrail.amazonaws.com"]
}
condition {
test = "StringEquals"
variable = "s3:x-amz-acl"
values = ["bucket-owner-full-control"]
}
condition {
test = "StringEquals"
variable = "aws:SourceArn"
values = [local.trail_arn]
}
}
}
resource "aws_s3_bucket_policy" "trail" {
bucket = aws_s3_bucket.trail.id
policy = data.aws_iam_policy_document.trail_bucket.json
}
When making it an organization (AWS Organizations) trail,
②'s resource path becomes the organization ID path (AWSLogs/o-xxxxxxxxxx/<account>/...) rather than the account ID. This is easy to forget when usingis_organization_trail = true.
2-3. The KMS Key (Allow CloudTrail to Encrypt)
For SSE-KMS, allow cloudtrail.amazonaws.com to encrypt in the key policy. With the double condition of kms:EncryptionContext and aws:SourceArn, don't let unrelated services use the key.
data "aws_iam_policy_document" "trail_kms" {
# アカウント管理者(鍵の管理権限)
statement {
sid = "EnableRoot"
actions = ["kms:*"]
resources = ["*"]
principals {
type = "AWS"
identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"]
}
}
# CloudTrail にデータキー生成を許可(証跡ARNと暗号化コンテキストで限定)
statement {
sid = "AllowCloudTrailEncrypt"
actions = ["kms:GenerateDataKey*"]
resources = ["*"]
principals {
type = "Service"
identifiers = ["cloudtrail.amazonaws.com"]
}
condition {
test = "StringEquals"
variable = "aws:SourceArn"
values = [local.trail_arn]
}
condition {
test = "StringLike"
variable = "kms:EncryptionContext:aws:cloudtrail:arn"
values = ["arn:aws:cloudtrail:*:${data.aws_caller_identity.current.account_id}:trail/*"]
}
}
# ログ読者がKMSで復号できるように(最小権限で)
statement {
sid = "AllowDecryptForReaders"
actions = ["kms:Decrypt", "kms:DescribeKey"]
resources = ["*"]
principals {
type = "AWS"
identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/SecurityAuditor"]
}
}
}
resource "aws_kms_key" "trail" {
description = "CloudTrail log encryption key"
enable_key_rotation = true # 年次自動ローテーション
deletion_window_in_days = 30
policy = data.aws_iam_policy_document.trail_kms.json
}
2-4. The Trail Itself (Multi-Region, Integrity Validation, CloudWatch Logs Integration)
resource "aws_cloudwatch_log_group" "trail" {
name = "/aws/cloudtrail/org-audit"
retention_in_days = 365 # CloudWatch Logs側の保持(S3とは別管理)
}
# CloudTrail が CloudWatch Logs に書き込むためのロール(最小権限・割愛気味に提示)
resource "aws_iam_role" "cloudtrail_cw" {
name = "CloudTrail_CloudWatchLogs_Role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { Service = "cloudtrail.amazonaws.com" }
Action = "sts:AssumeRole"
}]
})
}
resource "aws_iam_role_policy" "cloudtrail_cw" {
role = aws_iam_role.cloudtrail_cw.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["logs:CreateLogStream", "logs:PutLogEvents"]
Resource = "${aws_cloudwatch_log_group.trail.arn}:*"
}]
})
}
resource "aws_cloudtrail" "org_audit" {
name = "org-audit-trail"
s3_bucket_name = aws_s3_bucket.trail.id
kms_key_id = aws_kms_key.trail.arn
is_multi_region_trail = true # CLI/Terraform既定は単一リージョン。必ず明示!
include_global_service_events = true # IAM等グローバルサービスのイベントも取得
enable_log_file_validation = true # ★整合性検証(digest生成)をON
enable_logging = true
cloud_watch_logs_group_arn = "${aws_cloudwatch_log_group.trail.arn}:*"
cloud_watch_logs_role_arn = aws_iam_role.cloudtrail_cw.arn
# is_organization_trail = true # AWS Organizations管理アカウントで全メンバーに適用
# バケットポリシーが先に存在しないと作成が失敗するため明示依存
depends_on = [aws_s3_bucket_policy.trail]
}
With this, we have the foundation to "deliver management events from all regions permanently to S3 and CloudWatch Logs, with encryption + tamper detection." Per the docs, delivery from the trail to S3 takes about 5 minutes on average (not a guaranteed value).
3. How to Read a Log: The Record JSON and userIdentity (the Starting Point of Forensics)
Both detection and investigation come down, in the end, to whether you can correctly read a single event's JSON. Below is an example of an event that should never happen — "the root user logged into the console without MFA, from an unfamiliar IP" (the current format is eventVersion 1.11).
{
"eventVersion": "1.11",
"userIdentity": {
"type": "Root",
"principalId": "111122223333",
"arn": "arn:aws:iam::111122223333:root",
"accountId": "111122223333"
},
"eventTime": "2026-06-27T02:14:51Z",
"eventSource": "signin.amazonaws.com",
"eventName": "ConsoleLogin",
"awsRegion": "us-east-1",
"sourceIPAddress": "203.0.113.42",
"userAgent": "Mozilla/5.0 ...",
"requestParameters": null,
"responseElements": { "ConsoleLogin": "Success" },
"additionalEventData": { "MFAUsed": "No" },
"eventID": "8a9b0c1d-2e3f-4a5b-6c7d-8e9f0a1b2c3d",
"eventType": "AwsConsoleSignIn",
"recipientAccountId": "111122223333"
}
Nail down the fields to read, along with the official definitions.
| Field | Meaning (per official) | Use in investigation |
|---|---|---|
userIdentity | "Who" called it. IAM identity information | The star. Drilled into in the table below |
eventSource / eventName | "Which service's, which operation" (iam.amazonaws.com / CreateUser, etc.) | Identifying the operation; the axis for filters |
eventTime | Request completion time (UTC) | Reconstructing the timeline (sort key) |
sourceIPAddress | Request source IP (AWS-internal shows AWS Internal) | Identifying suspicious sources |
errorCode / errorMessage | The code and description on error (AccessDenied, etc.) | Signs of attack / insufficient permission |
readOnly | Whether it's a read-only operation (true/false) | Extract only "change-type" operations |
eventCategory | Management / Data / NetworkActivity | Sorting by type |
recipientAccountId | The account that received this event | Detecting cross-account operations |
tlsDetails | TLS version, cipher suite, FQDN | Inventorying old TLS connections |
sessionCredentialFromConsole | Whether it's from a console session (shown only when true) | Distinguishing human vs. automation |
userIdentity.type: Memorize the Correct Spelling
The value of type, which represents "who," is the key for investigation queries. The official values (current), precisely:
| type | What it is |
|---|---|
Root | The root user. Must not appear in normal operation |
IAMUser | An IAM user |
AssumedRole | A session that assumed a role (carries sessionContext) |
Role | A service role, etc. |
FederatedUser | STS federation |
AWSService | An AWS service acting on behalf |
AWSAccount | Another account |
IdentityCenterUser | An IAM Identity Center user (not IAMIdentityCenter) |
SAMLUser / WebIdentityUser | SAML / Web identity federation |
When it's AssumedRole, userIdentity.sessionContext.sessionIssuer (from which role) and sessionContext.attributes.mfaAuthenticated (MFA or not) are decisively important. "Whose role, assumed as a session with MFA" is found out here.
4. Battle by Scenario: Detection, Monitoring, Investigation, Data Events
Once the foundation is built, change CloudTrail from "just sitting there" to "a working audit foundation." Implement four patterns by use.
4.1 Real-Time Detection: EventBridge → Lambda/SNS
The highest-value detection is the moment "the attacker comes to stop the trail itself." The first thing they do after intrusion is stop the trail (destroying evidence). Detect this instantly via StopLogging / DeleteTrail / UpdateTrail / PutEventSelectors.
EventBridge can react in near real time to API calls CloudTrail recorded (the detail-type is AWS API Call via CloudTrail; console sign-in is AWS Console Sign In via CloudTrail).
resource "aws_cloudwatch_event_rule" "trail_tampering" {
name = "detect-cloudtrail-tampering"
description = "CloudTrail証跡の停止・削除・改変を即検知する"
event_pattern = jsonencode({
"detail-type" = ["AWS API Call via CloudTrail"]
detail = {
eventSource = ["cloudtrail.amazonaws.com"]
eventName = ["StopLogging", "DeleteTrail", "UpdateTrail", "PutEventSelectors"]
}
})
}
resource "aws_cloudwatch_event_target" "to_lambda" {
rule = aws_cloudwatch_event_rule.trail_tampering.name
arn = aws_lambda_function.audit_alert.arn
}
resource "aws_lambda_permission" "allow_eventbridge" {
statement_id = "AllowExecutionFromEventBridge"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.audit_alert.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.trail_tampering.arn
}
If you want to detect root login, just swap the pattern.
{
"detail-type": ["AWS Console Sign In via CloudTrail"],
"detail": { "userIdentity": { "type": ["Root"] }, "eventName": ["ConsoleLogin"] }
}
The receiving Lambda thoroughly applies "don't trust even AWS-originated events at the boundary," strictly validating only the fields it uses with Zod before shaping and notifying. It uses eventID (a GUID unique per event) as the dedup key to be resilient to re-delivery.
import { SNSClient, PublishCommand } from "@aws-sdk/client-sns";
import { z } from "zod";
import type { EventBridgeEvent } from "aws-lambda";
const sns = new SNSClient({});
const TOPIC_ARN = process.env.ALERT_TOPIC_ARN;
if (!TOPIC_ARN) throw new Error("ALERT_TOPIC_ARN is not set"); // 起動時に落とす
// CloudTrailレコードのうち、通知に使うフィールドだけを境界で検証する
const CloudTrailDetail = z.object({
eventID: z.string().uuid(),
eventName: z.string(),
eventSource: z.string(),
awsRegion: z.string(),
sourceIPAddress: z.string().optional(),
errorCode: z.string().optional(),
userIdentity: z.object({
type: z.string(),
arn: z.string().optional(),
}),
});
export const handler = async (
event: EventBridgeEvent<"AWS API Call via CloudTrail", unknown>,
): Promise<void> => {
const detail = CloudTrailDetail.parse(event.detail); // 不正形状なら即例外
const actor = detail.userIdentity.arn ?? detail.userIdentity.type;
const subject = `🚨 [監査] ${detail.eventName} by ${detail.userIdentity.type}`;
const message = [
`操作: ${detail.eventName} (${detail.eventSource})`,
`実行者: ${actor}`,
`リージョン: ${detail.awsRegion}`,
`送信元IP: ${detail.sourceIPAddress ?? "不明"}`,
detail.errorCode ? `結果: 失敗 (${detail.errorCode})` : "結果: 成功",
`eventID: ${detail.eventID}`,
].join("\n");
await sns.send(
new PublishCommand({
TopicArn: TOPIC_ARN,
Subject: subject.slice(0, 100), // SNS Subjectは最大100文字
Message: message,
MessageDeduplicationId: detail.eventID, // FIFOトピック使用時の冪等キー
}),
);
};
The premise of detection: to catch this reliably with EventBridge, you need a trail logging in that region (detecting data events in particular requires a trail). That's why we created the "multi-region trail" in §2 first. The official EventBridge tutorial also begins Step 1 with creating a trail.
4.2 CloudWatch Logs Metric Filters + Alarms
If you want to fire not on "notification of individual events" but on "how many times it happened in a given period," CloudWatch Logs metric filters + alarms fit. The premise is the setting that flows the trail to CloudWatch Logs (completed in §2-4).
The official docs explicitly give three filters as examples — "security group change," "console sign-in failure," and "IAM policy change." Below is the IAM-policy-change example (metric name IAMPolicyEventCount, fires even on one occurrence in 5 minutes).
resource "aws_cloudwatch_log_metric_filter" "iam_policy_changes" {
name = "IAMPolicyChanges"
log_group_name = aws_cloudwatch_log_group.trail.name
pattern = "{ ($.eventName = DeleteGroupPolicy) || ($.eventName = DeleteRolePolicy) || ($.eventName = DeleteUserPolicy) || ($.eventName = PutGroupPolicy) || ($.eventName = PutRolePolicy) || ($.eventName = PutUserPolicy) || ($.eventName = CreatePolicy) || ($.eventName = DeletePolicy) || ($.eventName = CreatePolicyVersion) || ($.eventName = DeletePolicyVersion) || ($.eventName = AttachRolePolicy) || ($.eventName = DetachRolePolicy) || ($.eventName = AttachUserPolicy) || ($.eventName = DetachUserPolicy) || ($.eventName = AttachGroupPolicy) || ($.eventName = DetachGroupPolicy) }"
metric_transformation {
name = "IAMPolicyEventCount"
namespace = "CloudTrailMetrics"
value = "1"
}
}
resource "aws_cloudwatch_metric_alarm" "iam_policy_changes" {
alarm_name = "IAMPolicyChanges"
namespace = "CloudTrailMetrics"
metric_name = "IAMPolicyEventCount"
comparison_operator = "GreaterThanOrEqualToThreshold"
threshold = 1
evaluation_periods = 1
period = 300
statistic = "Sum"
alarm_actions = [aws_sns_topic.security_alerts.arn]
}
An honest note: the staple filter sets you often see online — "root account usage," "unauthorized API calls," "sign-in without MFA," "NACL changes," etc. — are not on CloudTrail's official page for this topic. Their source is the CIS AWS Foundations Benchmark or Security Hub controls. They're highly worth implementing (I add them too), but not conflating "the three the official docs exemplify" with "those you assemble yourself from a benchmark" is the manner of a trustworthy designer. Combine with GuardDuty / Security Hub and many of these can be detected as managed (§8).
4.3 Investigate Beyond 90 Days with Athena
Incident investigation's royal road is "cross-query the raw logs accumulated in S3 when needed." You can auto-create an Athena table from the CloudTrail console, but in production, apply partition projection to cut scan volume — i.e., cost and execution time.
CREATE EXTERNAL TABLE cloudtrail_logs (
eventVersion STRING,
userIdentity STRUCT<
type: STRING, principalId: STRING, arn: STRING, accountId: STRING, userName: STRING,
sessionContext: STRUCT<attributes: STRUCT<mfaAuthenticated: STRING, creationDate: STRING>>
>,
eventTime STRING, eventSource STRING, eventName STRING, awsRegion STRING,
sourceIPAddress STRING, userAgent STRING, errorCode STRING, errorMessage STRING,
requestParameters STRING, responseElements STRING, eventID STRING,
readOnly BOOLEAN, eventType STRING, recipientAccountId STRING
)
PARTITIONED BY (`account` STRING, `region` STRING, `date` STRING)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://prod-audit-trail-111122223333/AWSLogs/111122223333/CloudTrail/'
TBLPROPERTIES (
'projection.enabled' = 'true',
'projection.account.type' = 'enum',
'projection.account.values' = '111122223333',
'projection.region.type' = 'enum',
'projection.region.values' = 'us-east-1,ap-northeast-1',
'projection.date.type' = 'date',
'projection.date.range' = '2024/01/01,NOW',
'projection.date.format' = 'yyyy/MM/dd',
'projection.date.interval' = '1',
'projection.date.interval.unit' = 'DAYS',
'storage.location.template' =
's3://prod-audit-trail-111122223333/AWSLogs/${account}/CloudTrail/${region}/${date}'
);
Three queries often used in investigation:
-- ① 「誰がこのセキュリティグループを開けたか」(直近1週間・パーティションで絞る)
SELECT eventTime, userIdentity.arn AS who, sourceIPAddress, eventName, requestParameters
FROM cloudtrail_logs
WHERE region = 'ap-northeast-1' AND date >= '2026/06/20'
AND eventSource = 'ec2.amazonaws.com'
AND eventName IN ('AuthorizeSecurityGroupIngress', 'RevokeSecurityGroupIngress')
ORDER BY eventTime DESC;
-- ② AccessDenied が急増しているプリンシパル(攻撃 or 権限設計ミスの兆候)
SELECT userIdentity.arn AS who, count(*) AS denied
FROM cloudtrail_logs
WHERE date >= '2026/06/01' AND errorCode = 'AccessDenied'
GROUP BY 1 ORDER BY denied DESC LIMIT 20;
-- ③ ルートアカウントの利用(本番では原則ゼロであるべき)
SELECT eventTime, eventName, sourceIPAddress
FROM cloudtrail_logs
WHERE date >= '2026/06/01' AND userIdentity.type = 'Root'
ORDER BY eventTime DESC;
The key to cost is partitions. Athena is billed by the amount of data scanned. Always put the projection columns
account/region/datein theWHEREso it doesn't read irrelevant partitions. This alone shrinks tens of GB of scanning to hundreds of MB.
4.4 Narrow Data Events Surgically with "Advanced Event Selectors"
Data events (S3 objects, Lambda Invoke, DynamoDB item operations) are high volume and billed from the first copy, so "all ON" is an accident. With advanced event selectors, narrow surgically to only the audited resources, writes only.
resource "aws_cloudtrail" "data_events" {
name = "payments-evidence-data-trail"
s3_bucket_name = aws_s3_bucket.trail.id
kms_key_id = aws_kms_key.trail.arn
is_multi_region_trail = true
enable_log_file_validation = true
# 高度なセレクタを使うと既定の「管理イベント記録」が上書きされる。
# 管理イベントを残したいなら、明示的にManagementセレクタを足す(重要な罠)。
advanced_event_selector {
name = "Log all management events"
field_selector {
field = "eventCategory"
equals = ["Management"]
}
}
# 証拠保管バケットへの「書き込み系オブジェクト操作」だけを記録
advanced_event_selector {
name = "Audit writes on the payment evidence bucket only"
field_selector {
field = "eventCategory"
equals = ["Data"]
}
field_selector {
field = "resources.type"
equals = ["AWS::S3::Object"]
}
field_selector {
field = "resources.ARN"
starts_with = ["arn:aws:s3:::prod-payments-evidence/"]
}
field_selector {
field = "readOnly"
equals = ["false"] # GetObjectのような読み取りは除外してコストを抑える
}
}
depends_on = [aws_s3_bucket_policy.trail]
}
The basic event selectors (basic) cover only three kinds — S3 objects, Lambda, DynamoDB. The many other resource types (RDS, SQS, SNS, Bedrock, etc., continually expanding) and field-level narrowing are exclusive to advanced selectors. CloudTrail Lake event data stores can also use only advanced selectors.
5. CloudTrail Lake: Honestly About the Current State (New Customers Cut Off May 31, 2026)
CloudTrail Lake is a managed audit data lake you can query with Trino SQL. Where a trail "puts files in S3," Lake provides "an immutable data store you can cross-analyze with SQL."
But — since this concerns the article's trustworthiness, I'll write it honestly. The official docs, as of June 2026, state clearly:
AWS CloudTrail Lake will no longer be open to new customers starting May 31, 2026. If you would like to use CloudTrail Lake, sign up prior to that date. Existing customers can continue to use the service as normal.
That is, new sign-ups end as of May 31, 2026. Existing customers can keep using it as before, but if you're newly building an audit-analysis foundation, the realistic choice is the Athena + S3 of §4.3. Articles that unconditionally recommend "use CloudTrail Lake" haven't accounted for this change.
For existing customers, let me nail down only the key points of Lake.
- Event Data Store (EDS) — an immutable collection of events chosen with advanced selectors. Encrypted by CloudTrail by default.
- Retention period — with "one-year extendable," default 366 days, max 3,653 days (about 10 years); with "seven-year," about 2,557 days (about 7 years). Supports long-term compliance retention.
- SQL — fully leverage Trino's
SELECTsyntax and functions.JOINacross multiple EDSs is also possible. - Natural-language query via generative AI (query generator) — generates immediately usable SQL from an English prompt (GA). Meanwhile, the query-result summarization feature is in preview — these two differ in status, so don't conflate them.
-- Lake(Trino)で「証跡が止められた瞬間」を横断検索する例
SELECT eventTime, userIdentity.arn, eventName, sourceIPAddress
FROM <event_data_store_id>
WHERE eventName IN ('StopLogging', 'DeleteTrail')
AND eventTime > '2026-06-01 00:00:00'
ORDER BY eventTime DESC;
6. Log-File Integrity Validation: Guaranteeing Non-Repudiation
An audit log being "there" alone is insufficient; you need to be able to prove "neither tampered with nor deleted." What guarantees this is log-file integrity validation (already done in §2-4 with enable_log_file_validation = true).
The mechanism is solid.
- CloudTrail computes a hash of each delivered log file and, every hour, generates and delivers a digest file referencing that hour's logs.
- The digest file is signed with CloudTrail's private key and contains the signature of the previous digest — this forms a chain that can also detect the deletion of the digest file itself.
- The algorithms used are, per the docs, hash = SHA-256, signature = SHA-256 with RSA. With this, "altering, deleting, or forging logs without being detected is computationally infeasible."
Validation is a one-shot CLI command.
aws cloudtrail validate-logs \
--trail-arn arn:aws:cloudtrail:us-east-1:111122223333:trail/org-audit-trail \
--start-time 2026-06-25T00:00:00Z \
--region us-east-1
Why it matters. As the docs say, a validated log can affirmatively assert "that the log file was not altered" and "that a particular credential performed a particular API activity" — the heart of forensics and non-repudiation. When you say "we have audit logs" for a system handling payments or personal information, what truly has meaning is "we have verifiable audit logs."
7. The Reality of Pricing and Cost Optimization
CloudTrail can be either "nearly free" or "expensive before you notice." Nail down the boundaries precisely (us-east-1, as of 2026. Since revisions happen, confirm on the official pricing page).
| Target | Price | Default |
|---|---|---|
| Management events | The first copy is free per region / from the 2nd copy $2.00 / 100K | Recording ON |
| Data events | $0.10 / 100K (billed from the first copy) / aggregation +$0.03 / 100K | OFF |
| Insights events | management $0.35 / 100K, data $0.03 / 100K (per insight type, per analysis target) | OFF |
| Network activity | $0.10 / 100K | OFF |
| CloudTrail Lake ingestion | one-year extendable $0.75/GB (CloudTrail events) / seven-year is tiered (up to 5TB $2.5, up to 25TB $1, beyond $0.50/GB) | — |
| CloudTrail Lake query | $0.005 / GB scanned | — |
| S3 / CloudWatch Logs / KMS / SNS / Athena | separately metered by each service | — |
The official wording, precisely.
The first copy of management events within each region is delivered free of charge. ... For data events, all deliveries incur CloudTrail costs, including the first.
From this, three cost rules that matter in the field:
- A single-region (or single multi-region) management-event trail is essentially nearly free. What's billed is the S3 storage fee (usually a few cents to a few dollars a month) and, if you use KMS, a small amount of KMS request fees. So you should not begrudge "one trail first" on cost grounds.
- The "2nd copy" trap. As in the official example, adding a single-region trail that catches the same management events while a multi-region trail exists bills the latter. The overlap between an organization trail and member accounts is the same. Add trails "for auditing," "for developers," and a 2nd copy piles up before you know it.
- Beware the explosion of KMS events. The docs warn about this too — heavy use of SSE-KMS on S3 puts a large volume of KMS management events on CloudTrail and pushes up cost. You can drop the noise with "Exclude AWS KMS events" / "Exclude Amazon RDS Data API events" at trail creation (advanced selectors can narrow both management and data).
Weigh "effect" against "amount" for data events and Insights, and enable them surgically as in §4.4. The moment you go full-open across all resources including reads, the bill changes by an order of magnitude.
8. The Security Best-Practices Checklist (Official)
The official "Security best practices in AWS CloudTrail" is organized in two lines, Detective and Preventative. You can use it as-is as the final pre-production checklist.
Detective
- Create a trail — event history (90 days, management events only) is not a permanent record. A trail is the premise.
- Make it a multi-region trail — capture all regions + global service events. Continuously monitor with the AWS Config rule
multi-region-cloud-trail-enabled. - Enable log-file integrity validation — detect tampering, deletion, and delivery gaps with SHA-256 / SHA-256 with RSA (§6).
- Integrate with CloudWatch Logs — monitoring and alerting for specific events (§4.2). Monitor with
cloud-trail-cloud-watch-logs-enabled. - Use GuardDuty — ML-based threat detection. Continuously analyzes multiple logs including CloudTrail.
- Use Security Hub (CSPM) — evaluate configuration with detective controls.
Preventative
- Aggregate into a dedicated, centralized S3 bucket — a log-archive-dedicated account + a centralized bucket. With Organizations, an organization trail.
- Encrypt with SSE-KMS — CloudTrail encrypts by default, but control the key with a CMK (§2-3). Monitor with
cloud-trail-encryption-enabled. - Add condition keys to the SNS topic policy — add
aws:SourceArn(optionallyaws:SourceAccount) to prevent unauthorized access. - Least privilege on the log-storage bucket — review the bucket policy and restrict with the
aws:SourceArncondition (§2-2). - Enable S3 MFA Delete — additional authentication for version deletion and versioning changes (not usable together with lifecycle).
- Object lifecycle management — implement retention policies with lifecycle rules (e.g., move to an archive tier after one year).
- Restrict the grant of
AWSCloudTrail_FullAccess— holders of this policy can disable or reconfigure auditing. Limit to the minimum number of administrators.
CloudTrail stands on the same philosophy as defense-in-depth with WAF, IAM least privilege for DynamoDB, and keyless CI/CD with OIDC — "don't trust the client / defend at an unbreakable layer." Within that, CloudTrail is the last bastion of detective controls that "proves, non-repudiably and after the fact, what happened."
9. Summary: A CloudTrail Design Cheat Sheet
| Question | Conclusion |
|---|---|
| What to do first? | One multi-region trail. Permanent S3 delivery + SSE-KMS + integrity validation + a least-privilege bucket policy |
| Is event history enough? | NO. 90 days, management events only, a single region. Not a permanent record |
| What's the cost boundary? | The first copy of management events is free per region / data events are billed from the first copy. Beware 2nd-copy overlap and KMS-event explosion |
| Real-time detection? | EventBridge → Lambda/SNS. Top priority is detecting "trail stop (evidence destruction)" |
| Firing on aggregation? | CloudWatch Logs metric filters + alarms (official examples: SG change, sign-in failure, IAM policy change) |
| Investigating beyond 90 days? | Athena + partition projection. Cut scan volume (= cost) with projection columns in the WHERE |
| Proving it's untampered? | Log-file integrity validation (digest / SHA-256 / validate-logs). The heart of non-repudiation |
| What about CloudTrail Lake? | New customers cut off as of 2026/5/31 (existing can continue). For new builds, Athena+S3 is the realistic answer |
| How to add data events? | Surgically with advanced selectors. Narrow to target resources and writes only |
CloudTrail is not a service of "enable it and you're done." Only by assembling trail design, encryption, integrity, detection, investigation, and cost as a single audit foundation can you answer "who did what, when" instantly even on the night of an incident, and withstand a compliance audit.
I designed, from the start, a state where "correctness can be proven with code and an audit trail" in a serverless payment platform, and maintained zero double-charges in production. With one person × generative AI (Claude Code), I build production-quality AWS audit and security foundations like these, fast and safely, in a verifiable form. If you're struggling with AWS audit and governance design, feel free to reach out from Contact.