# AWS CloudTrail Lake Practical Guide (2026 Edition): Analyzing Events with Trino SQL and How to Choose Between It and Athena+S3 — The Realistic Answer After New-Customer Onboarding Ended

> A guide to using / sizing up CloudTrail Lake in practice. Explained with real queries faithful to the official docs: the immutable event data store and cross-cutting analysis with Trino SQL, 14 managed dashboards and natural-language query generation, and — given the end of new-customer onboarding on May 31, 2026 — how to choose between and migrate to Athena+S3 / CloudWatch.

- Published: 2026-06-27
- Author: 友田 陽大
- Tags: AWS, CloudTrail, CloudTrail Lake, Athena, SQL, アーキテクチャ設計
- URL: https://tomodahinata.com/en/blog/aws-cloudtrail-lake-trino-sql-athena-migration-guide
- Category: AWS CloudTrail audit & governance
- Pillar guide: https://tomodahinata.com/en/blog/aws-cloudtrail-audit-logging-governance-security-guide

## Key points

- CloudTrail Lake ended new-customer onboarding after 2026/5/31. Existing customers can continue as usual; for new builds, Athena+S3 or CloudWatch is the realistic answer.
- The essence of Lake is 'an immutable, ORC-columnar event data store + all Trino SELECT statements.' It becomes an asset for cross-cutting SQL analysis that goes beyond the limits of event history, which can only search 90 days and a single attribute.
- A set of practical, copy-paste-ready Trino queries (top API calls, AccessDenied spikes, logins without MFA, cross-account, IAM-change tracking, JOIN across multiple EDS), faithful to the official samples.
- The new-build decision is 'managed / immutable / instant SQL = CloudWatch, Lake's successor' vs. 'cheap scan unit price / partition projection / self-managed = S3+Athena.' Recommendations are presented by scenario.
- Lake and Athena can be bridged via EDS → Glue Data Catalog → Athena federation. Natural-language query generation is GA; result summarization is preview.

---

"I want to aggregate audit logs older than 90 days across attributes with SQL," "When an incident occurs, I want to narrow down in one go, with multiple conditions, who did what to which resource, when, and from which IP" — these are walls you always hit in the audit and security field.

But CloudTrail's **event history** (the thing you can view without a trail) has the constraints of 90 days, management events only, a single Region, and single-attribute search, and it's utterly no match for such cross-cutting analysis. That's exactly why AWS provides separate mechanisms for analyzing audit logs with SQL — **CloudTrail Lake** and **Athena**.

I led the reliability layer of a serverless payment platform and kept double charges in production at zero. What I keenly felt there is that **audit logs don't become an asset just by "accumulating" them**. The first-pass triage of an incident, and answering a compliance audit, ultimately come down to "whether the structure lets you query the logs cross-sectionally with SQL." This article is a practical guide to turning CloudTrail Lake into that "analyzable asset."

That said, in 2026, this article has **an inconvenient truth to convey first**.

> **CloudTrail Lake ended new-customer onboarding as of May 31, 2026.** Existing customers can continue using it as before ([CloudTrail Lake availability change](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-lake-service-availability-change.html)). That is, this article is written so it can be read two ways: (a) **for those already using Lake to master it**, and (b) **for those about to build an audit-log analysis platform to choose the optimal answer on the premise that "Lake can't be chosen."** I judged that placing this reality first is the most honest and the most useful.

---

## 0. Mental model: Lake is "an immutable audit data lake you can analyze cross-sectionally with SQL"

Before getting into the fine features, let's pin down the essence in one picture. What is CloudTrail Lake? Borrowing the official words, it's **"an audit solution that complements your compliance stack and supports near-real-time troubleshooting."** In substance, it boils down to these three points.

1. Accumulate AWS / non-AWS activity events in an **immutable event data store (EDS)**.
2. Accumulated events are internally **converted from row-based JSON to Apache ORC (columnar)** and can be retrieved fast.
3. On top of that, you can do SQL analysis that cuts across fields using **all valid Trino `SELECT` statements and functions**.

```text
[AWS / non-AWS events]
        │  Selected/filtered by advanced event selectors
        ▼
┌─────────────────────────────┐
│  Event Data Store (EDS)     │  ← Immutable, ORC columnar, KMS-encrypted, retention tiers
│  1 category/EDS, org-EDS OK │
└─────────────────────────────┘
        │  All Trino SELECT, JOIN across multiple EDS
        ▼
[Trino SQL query] → results (retained up to 7 days / can save to S3)
        │
        ├─→ 14 managed dashboards / custom / Highlights
        ├─→ Natural language → SQL (generative-AI query generation: GA)
        └─→ Federate to Glue Data Catalog → query in Athena
```

> This article is the CloudTrail cluster's "Lake / SQL-analysis" piece (a spoke). The trail-creation procedure, the overall picture of event types, the basic security design, and the DDL for Athena tables (partition projection) are consolidated in the [CloudTrail audit-logging, governance, and security complete guide](/blog/aws-cloudtrail-audit-logging-governance-security-guide) (the pillar article). This article doesn't reprint the DDL in full and instead **digs into Lake itself and "how to choose between Lake and Athena."**

---

## 1. The reality of the onboarding end, and the first decision

First, let's settle your standing. Which side are you on right now?

| Your situation | Conclusion | The path to take |
| --- | --- | --- |
| Already operating a Lake EDS | **Can continue using it. Use Sections 2–4 of this article as-is** | Use the existing EDS to the fullest. Consider migration too if needed (described later) |
| About to newly build an audit-log analysis platform | **Lake can't be chosen. Athena+S3 or CloudWatch is the realistic answer** | Choose with the decision matrix in Section 5 |
| An existing customer planning to add member accounts to the org | **Check right now whether you have an organization EDS** | With an organization EDS, new accounts are auto-covered. With account EDS only, new accounts can't be onboarded |

### 1-1. What existing customers "can / can't continue" (official)

The official [availability change](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-lake-service-availability-change.html) clearly divides the support scope for existing customers by EDS type. There's a lot of misunderstanding here, so let me quote precisely.

- If you have an **organization event data store (organization EDS)**: Lake functions as before. **Support for member accounts newly added to the org, and expansion to additional Regions, also continue.**
- If you have **only an account event data store (account EDS)**: existing accounts continue to be supported, and you can expand to new Regions. However, **onboarding accounts newly added to the org is not supported.** If you want to cover new accounts too, you need to create an organization EDS or migrate to CloudWatch.
- Note that Lake enters a state of receiving **only critical bug fixes and security updates**. Don't expect new features.

> Important: this is not a story of "Lake stopping." The official docs explicitly state "**AWS CloudTrail continues to be fully supported. Only CloudTrail Lake ended new-customer onboarding. Trails, Insights, and aggregated events are unaffected.**" There's no need to panic and rebuild even your trails.

### 1-2. The migration destination AWS recommends is CloudWatch (not only Athena+S3)

Accuracy is everything here, so let me separate my opinion from the official line. **The migration destination the official docs recommend first is Amazon CloudWatch.** A mechanism to directly import a CloudTrail Lake EDS into CloudWatch (the console's **Export to CloudWatch**, or the CLI's `aws logs create-import-task`) is provided, carrying features Lake customers had strongly requested — OCSF/OTel support, OpenSearch-native analysis, and third-party connectors.

```bash
# 既存EDSをCloudWatchへインポート（CLI例：公式の create-import-task）
aws logs create-import-task \
  --import-source-arn "arn:aws:cloudtrail:ap-northeast-1:123456789012:eventdatastore/EXAMPLE-eds-id" \
  --import-role-arn "arn:aws:iam::123456789012:role/CloudTrailLakeExportRole" \
  --import-filter '{"startEventTime": 1704067200, "endEventTime": 1706745600}'

# 非同期処理なので進捗を確認
aws logs describe-import-tasks --import-id "EXAMPLE-import-id"
```

> A migration pitfall (official note): **data from before 2023 is not migrated from CloudTrail Lake to CloudWatch.** If you need events from before 2023, keep querying them directly in Lake, or move them to an S3 bucket. That's exactly why, rather than streaming the whole volume at once, the official best practice is to **first pilot-migrate a narrow range like 24 hours**, verify the schema, queries, dashboards, and IAM roles work as expected, and then migrate to production.

So why does this article also place **Athena+S3** in a leading role? There are three reasons.

1. Many teams **already have an S3 destination for trail delivery**, separate from Lake/CloudWatch. Just by defining an Athena table there, they can start SQL analysis with no additional ingestion cost.
2. **The scan unit price is cheap** (Lake is $0.005/GB scanned, Athena is $5/TB ≒ $0.005/GB, but Athena can physically reduce the range it reads with partition projection), so you can beat cost down with structure.
3. Lake and Athena **can be bridged with federation** (Section 6). Not an either-or — you can take both concurrent use and a migration gradient.

In conclusion, **for a new build, place "trail → S3 → Athena" as the foundation, and choose CloudWatch if you want a managed orientation / OCSF / OpenSearch** — that's the realistic answer in 2026. The fit and unfit of each are tabulated in Section 5.

---

## 2. How Lake works: correctly understanding the event data store (EDS)

The key to mastering Lake is accurately grasping **the nature of the EDS**, its foundation. Use it while leaving this vague, and you'll later have irreversible accidents like "I got the retention period wrong" or "I can't detach the KMS key."

### 2-1. The five essentials of an EDS

| Property | Content | Design implication |
| --- | --- | --- |
| **Immutable** | An EDS is an immutable collection of events. It can't be rewritten or tampered with afterward | Trustworthy as an audit trail. Becomes a forensic foundation |
| **1 category/EDS** | One EDS holds only one event category (management / data / network, etc.) | Separate management-event and data-event ones. Cross them with JOIN |
| **Selected with advanced event selectors** | Precisely control what to ingest with advanced event selectors | Ingestion volume = cost. Narrow it surgically from the start |
| **Multi-Region / multi-account** | With an organization EDS, aggregate multiple accounts and multiple Regions into one | Cut across the whole org's audit logs in one EDS with SQL |
| **KMS-encrypted by default** | CloudTrail encrypts by default. Optionally, you can also specify a customer KMS key | **If you specify a customer KMS key, you can't detach or change it later.** Be careful |

### 2-2. The retention tier is a choice between "extendable one year" or "seven years"

The most important parameter when creating an EDS is the retention period. The official docs have retention types corresponding to two billing models.

| Retention type | Default | Maximum | Use-case guideline |
| --- | --- | --- | --- |
| **One-year-extendable** | 366 days | 3,653 days (about 10 years) | Flexibly, from ordinary operational analysis to long-term retention |
| **Seven-year** | About 2,557 days (about 7 years) | About 2,557 days (about 7 years, fixed) | Compliance uses where 7-year retention is a fixed requirement |

> As intuition: "**If you just want to analyze for now, One-year-extendable; if 7-year retention is fixed by law or contract, Seven-year.**" The longer you make retention, the more storage charges (the one-year-extendable portion is $0.023/GB/month) kick in. I leave the deep dive on pricing to the [CloudTrail pricing & cost-optimization guide](/blog/aws-cloudtrail-pricing-cost-optimization-guide) and here just carve in the single point "retention = cost."

### 2-3. You can "copy-ingest" existing trail events

An EDS doesn't only accumulate future events; you can **copy events already accumulated in an existing trail into the EDS**, creating a snapshot up to some past point. This pays off in the field where "I adopted Lake later but want to analyze past logs with SQL too." Since it's ingestion of existing logs rather than new ingestion, you can rehouse past incident investigations onto Lake's SQL.

### 2-4. The organization EDS: the whole company's audit on one SQL surface

If you bundle multiple accounts with AWS Organizations, creating an **organization EDS** lets you aggregate all member accounts' events into one EDS. As we saw in 1-1, even after the onboarding end, **the organization EDS automatically covers new member accounts**. The design of multi-account auditing itself is detailed in the [organization trail / multi-account audit guide](/blog/aws-cloudtrail-organization-trail-multi-account-audit-guide), so refer to that too on the premise of combining it with Lake.

---

## 3. The Trino SQL cookbook: a set of practical, copy-paste-ready queries

This is the leading role of this article. CloudTrail Lake supports **all valid Trino `SELECT` statements and functions** ([SQL constraints](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/query-limitations.html)). Only `SELECT` — no query that modifies or tampers with data passes at all. The queries below are faithful to the field paths of the official [sample query set](https://github.com/aws-samples/cloud-trail-lake-query-samples) and [query-generation examples](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/lake-query-generator.html).

> **Usage premises (three)**
> 1. In `FROM`, write the **EDS's ID** (the tail of the ARN). In this article I denote it `$EDS_ID`. Replace it with your ID at execution time.
> 2. **Always include an `eventTime` range.** Query billing is incurred against **the amount of data scanned** ($0.005/GB). Narrowing the period is itself cost reduction.
> 3. Access nested fields with dot notation (`userIdentity.arn`), and map types with `element_at(map, 'key')`.

### 3-1. Top API callers (who is calling which API)

A basic operational query that surfaces "which principal is calling IAM the most." Change the service and it's usable generically.

```sql
SELECT
    COUNT(*) AS apiCount,
    eventName,
    recipientAccountId,
    userIdentity.principalId
FROM $EDS_ID
WHERE userIdentity.principalId IS NOT NULL
    AND eventSource = 'iam.amazonaws.com'
    AND eventTime >= '2026-06-01 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
GROUP BY eventName, recipientAccountId, userIdentity.principalId
ORDER BY apiCount DESC
LIMIT 50;
```

### 3-2. Detecting AccessDenied spikes by principal

A surge of authorization errors is either a misconfiguration or a sign of an attacker probing permissions. Aggregate who is being denied on which actions.

```sql
SELECT
    userIdentity.arn AS principal,
    eventName,
    eventSource,
    COUNT(*) AS deniedCount
FROM $EDS_ID
WHERE (errorCode = 'AccessDenied' OR errorMessage = 'Access Denied')
    AND eventTime >= '2026-06-20 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
GROUP BY userIdentity.arn, eventName, eventSource
ORDER BY deniedCount DESC
LIMIT 100;
```

> Why look at both `errorCode` and `errorMessage`: depending on the service, it may be recorded as `AccessDenied` (errorCode) or as `Access Denied` (errorMessage). The official query-generation example also picks up both with an OR. I leave the forensic-perspective incident-response framework to the [threat-detection / incident-response guide](/blog/aws-cloudtrail-security-threat-detection-incident-response-guide). This article's queries lean strictly toward "analysis / aggregation."

### 3-3. Surfacing root-user usage

Root-user operations are themselves the most important audit signal. Extract them with `userIdentity.type = 'Root'`.

```sql
SELECT
    eventTime,
    eventName,
    eventSource,
    sourceIPAddress,
    awsRegion,
    userIdentity.arn
FROM $EDS_ID
WHERE userIdentity.type = 'Root'
    AND eventTime >= '2026-06-01 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
ORDER BY eventTime DESC;
```

### 3-4. Console logins without MFA

One of the most important security queries, straight from the official sample. Pick up sign-in events where `MFAUsed` is `No`.

```sql
SELECT
    eventTime,
    userIdentity.arn,
    sourceIPAddress,
    awsRegion,
    element_at(additionalEventData, 'MFAUsed') AS mfaUsed
FROM $EDS_ID
WHERE eventSource = 'signin.amazonaws.com'
    AND eventName = 'ConsoleLogin'
    AND element_at(additionalEventData, 'MFAUsed') = 'No'
    AND eventTime >= '2026-06-01 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
ORDER BY eventTime DESC;
```

> `additionalEventData` is a map type, so you extract the key value with `element_at(additionalEventData, 'MFAUsed')`. This is faithful to the field path of the official sample `console-login-with-no-mfa.sql`.

### 3-5. Cross-account access (from another account to your resources)

Events where "the caller's account ID (`userIdentity.accountId`) and the recipient's account ID (`recipientAccountId`) don't match" are cross-account access. Use this to detect unintended external access.

```sql
SELECT
    eventTime,
    userIdentity.principalId,
    eventName,
    eventSource,
    userIdentity.accountId   AS callerAccount,
    recipientAccountId       AS resourceAccount
FROM $EDS_ID
WHERE userIdentity.accountId != recipientAccountId
    AND eventTime >= '2026-06-01 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
ORDER BY eventTime DESC;
```

### 3-6. A timeline of IAM policy changes

Permission changes are the core of auditing. Lay out change-type events like `PutRolePolicy` chronologically, together with the target role name and policy document. The point is to extract values from the map type `requestParameters` with `element_at`.

```sql
SELECT
    eventTime,
    recipientAccountId,
    eventName,
    userIdentity.arn,
    element_at(requestParameters, 'roleName')       AS roleName,
    element_at(requestParameters, 'policyDocument')  AS policyDocument
FROM $EDS_ID
WHERE eventName IN ('PutRolePolicy', 'AttachRolePolicy', 'PutUserPolicy', 'CreatePolicyVersion')
    AND eventTime >= '2026-06-01 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
ORDER BY eventTime DESC;
```

### 3-7. Daily read/write trend

Split read-type and write-type with the `readOnly` flag and aggregate daily. This becomes a starting point for finding abnormal peaks in activity. A `CASE` aggregation faithful to the official query-generation example.

```sql
SELECT
    date(eventTime) AS eventDate,
    SUM(CASE WHEN readOnly = true  THEN 1 ELSE 0 END) AS readEvents,
    SUM(CASE WHEN readOnly = false THEN 1 ELSE 0 END) AS writeEvents
FROM $EDS_ID
WHERE eventTime >= '2026-06-01 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
GROUP BY date(eventTime)
ORDER BY eventDate ASC;
```

### 3-8. Activity by source IP

Find concentrated access from a suspicious single IP. Output the count and the number of unique principals by `sourceIPAddress`.

```sql
SELECT
    sourceIPAddress,
    COUNT(*) AS eventCount,
    COUNT(DISTINCT userIdentity.arn) AS distinctPrincipals
FROM $EDS_ID
WHERE eventTime >= '2026-06-20 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
GROUP BY sourceIPAddress
ORDER BY eventCount DESC
LIMIT 50;
```

### 3-9. JOIN across multiple EDS (e.g. matching the production EDS and the audit EDS)

Lake supports **advanced queries that span multiple EDS** ([multi-table query support](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/query-limitations.html)). The supported JOIN-type operators are `UNION` / `UNION ALL` / `EXCEPT` / `INTERSECT` / `LEFT JOIN` / `RIGHT JOIN` / `INNER JOIN`.

```sql
-- 2つのEDS（本番=edsA, 監査=edsB）を eventId で突き合わせる
SELECT
    edsA.eventName  AS prodEvent,
    edsB.eventName  AS auditEvent,
    edsA.eventTime
FROM eds1_id AS edsA
LEFT JOIN eds2_id AS edsB
    ON edsA.eventId = edsB.eventId
WHERE edsA.eventTime >= '2026-06-01 00:00:00'
    AND edsA.eventTime <  '2026-06-27 00:00:00'
ORDER BY edsA.eventTime DESC;
```

```sql
-- 3つのEDSのイベントを縦に連結して横断的に並べる
SELECT eventId, eventName FROM eds1_id
UNION
SELECT eventId, eventName FROM eds2_id
UNION ALL
SELECT eventId, eventName FROM eds3_id
ORDER BY eventId
LIMIT 100;
```

> Query results are retained for **up to 7 days** and, if needed, can be **saved to an S3 bucket** (the CLI's `start-query --delivery-s3uri`). Queries running for a long time (over an hour) may time out, but even then you can retrieve the processed partial results. That's exactly why narrowing the range with `eventTime` pays off on both cost and completion-rate fronts.

---

## 4. Dashboards and generative AI: visualization and "analysis without writing SQL"

If only people who can write SQL can analyze, the org's auditing doesn't run. Lake has dashboards and generative-AI assistance.

### 4-1. Three kinds of dashboards

| Kind | Content | Editing | Update |
| --- | --- | --- | --- |
| **Managed dashboards** | **14** ready-to-use dashboards provided by CloudTrail | Not editable (but can be saved as custom) | — |
| **Custom dashboards** | You build them yourself. **Up to 10 widgets**, each widget = one SQL query | Editable | Manual or scheduled update |
| **Highlights dashboard** | A managed "at-a-glance" summary. **Updates every 6 hours** and shows **the most recent 24 hours** | — | Automatic every 6 hours |

The point is that **the substance of each widget is one SQL query**. That is, the Section 3 cookbook can be diverted directly into custom-dashboard widgets. Build one audit dashboard lining up "logins without MFA," "root usage," and "AccessDenied spikes," and daily checks are done without writing SQL each time.

### 4-2. Natural-language query generation is "GA," result summarization is "preview"

This is easy to confuse, so let me clearly separate the maturity.

| Feature | Maturity | Overview | Constraints / Regions |
| --- | --- | --- | --- |
| **Query generation** (natural language → SQL) | **GA (general availability)** | Generates executable SQL from an English prompt. The CLI is `generate-query` | Requires IAM `cloudtrail:GenerateQuery`, **English only, 3–500 characters, generation itself is free**, about 7 supported Regions |
| **Query result summarization** (result → natural language) | **preview (subject to change)** | Summarizes execution results in English natural language | Available Regions are **Tokyo, N. Virginia, Oregon** |

Query generation (GA) builds SQL from an English question like "what were the top errors last month?" even if you're not well-versed in SQL or CloudTrail fields. In the CLI you use it like this.

```bash
# 英語プロンプトからSQLを生成（生成は無料。実行時のみスキャン課金）
aws cloudtrail generate-query \
  --event-data-stores "arn:aws:cloudtrail:ap-northeast-1:123456789012:eventdatastore/EXAMPLE-eds-id" \
  --prompt "Show me all console login events for the past week"

# 返ってきた QueryAlias でそのまま実行
aws cloudtrail start-query --query-alias AWSCloudTrail-EXAMPLE-UUID
```

A generation example (official) goes like this. For the prompt "Show any events with access denied errors for the past three weeks," it returns SQL that includes `errorCode = 'AccessDenied'` and an `eventTime` range.

> Cautions (official): **don't include personal or confidential information** in the prompt. This is generative AI (an LLM), and it's recommended that **the returned SQL always be checked by a person** before execution. AI is strictly a tool to "speed up the draft," and the final check is done by a person — the same as the "fast with AI, verification by humans" principle I've held to on the payment platform. Remove `cloudtrail:GenerateQuery` from the IAM policy and you can explicitly opt out of the feature.

---

## 5. Lake vs. Athena thorough comparison: how do new builds ultimately choose?

This is **the main event for those about to build**. Since Lake can't be chosen for new builds, the real options become a choice between **"S3 + Athena (self-managed)"** and **"CloudWatch (the managed Lake successor)."** Let's pin down the decision axes in a table.

### 5-1. Decision matrix

| Aspect | CloudTrail Lake (existing customers only) | S3 + Athena | CloudWatch (the officially recommended migration destination) |
| --- | --- | --- | --- |
| Usable for new builds | **No (ended 2026/5/31)** | **Yes** | **Yes** |
| Degree of managedness | High (auto ingestion, conversion, retention) | Low (self-manage tables, partitions, lifecycle) | High |
| Immutability / tamper resistance | **The EDS is immutable** | Separately ensured via S3 Object Lock, etc. | Inherits CloudTrail safety features |
| Query engine | **Trino (all SELECT)** | Trino-family (Athena/Presto) | Logs QL / SQL / PPL (OpenSearch) |
| Immediacy of cross-cutting analysis | **Instant SQL** (ready once ingested) | Table DDL definition required | Analyze after ingestion |
| Scan cost | $0.005/GB | $5/TB (≒ $0.005/GB) + **reduce the range read with partition projection** | OpenSearch-analysis pricing model |
| Built-in dashboards | **14 managed + custom + Highlights** | Self-built (QuickSight, etc.) | OpenSearch / built-in analysis |
| Initial setup | Light (just create an EDS) | Medium (S3 delivery + Glue catalog + DDL) | Medium (pipeline configuration) |
| OCSF/OTel, third-party integration | Limited | Self-built | **Native support, abundant connectors** |

### 5-2. Recommendations by scenario

- **"S3 trail delivery already exists. I want to minimize cost. I can write SQL myself."** → **S3 + Athena**. You can analyze with no additional ingestion charges, and the greatest weapon is being able to physically reduce scan volume with partition projection. Small-to-medium-scale audit analysis is fully covered by this.
- **"I want to centralize security, operations, and compliance logs. I want OCSF normalization, OpenSearch-native analysis, and third-party connectors."** → **CloudWatch**. The path the official docs push first as Lake's successor.
- **"I have an existing Lake EDS and want to keep using data from 2023 onward."** → **continue with Lake** for now, while pilot-migrating to CloudWatch as needed. Keep data from before 2023 in Lake or move it to S3.

### 5-3. Implementation guidelines when choosing S3 + Athena (key points only)

There are three crux points when choosing Athena for a new build.

1. Use **partition projection** to narrow the range read with a `region/year/month/day` hierarchy. This is the lever that makes scan volume = cost matter the most.
2. Place table definitions in the **Glue Data Catalog** and manage the schema centrally.
3. Send old partitions to Glacier with a lifecycle, and **separate storage cost from analysis targets**.

> The complete version of the DDL for Athena tables (a `CREATE EXTERNAL TABLE` with partition projection for CloudTrail logs) is posted in the [CloudTrail audit-logging, governance, and security complete guide](/blog/aws-cloudtrail-audit-logging-governance-security-guide). Duplicating the full text in this article would double maintenance, so the DDL is unified in the pillar. The **details of cost calculation** for Lake and Athena (the actual $/TB scanned, the effect of lifecycle) are left to the [pricing & cost-optimization guide](/blog/aws-cloudtrail-pricing-cost-optimization-guide). This section's role is the decision of "**which to choose**."

---

## 6. External sources and federation: widening Lake's boundary

Lake isn't only for inside AWS. And Lake and Athena can be bridged.

### 6-1. Ingesting activity from outside AWS (PutAuditEvents)

You can ingest activity events from on-premises, SaaS, and hybrid environments into Lake through the **`PutAuditEvents` API and "channels."** Furthermore, more than a dozen partner integrations (direct integrations / solution integrations) are provided, letting you cross-analyze AWS audit logs and non-AWS logs with the **same EDS and the same Trino SQL**. Being able to follow "AWS operations" and "external SaaS operations" on one timeline is extremely powerful in incident investigation.

### 6-2. Federate the EDS to the Glue Data Catalog → query in Athena

This is the feature that **connects** Lake and Athena. **Federate an EDS to the AWS Glue Data Catalog** and the metadata is registered in Glue, letting you **query the same event data with SQL from Amazon Athena** via Lake Formation.

Let me precisely grasp the official behavior ([federation](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/query-federation.html)).

- When you enable federation, CloudTrail creates in the Glue Data Catalog a **managed database named `aws:cloudtrail`** (if none exists) and a **managed federated table named after the EDS's ID**.
- Lake Formation handles fine-grained access control of the federated resource. Delete the federation role or revoke permissions in Lake Formation, and you can no longer query from Athena.
- A user querying this data in Athena must be allowed **`lakeformation:GetDataAccess`** in IAM (included in the `AmazonAthenaFullAccess` managed policy).
- **Federation itself incurs no CloudTrail charge.** What is incurred is only the **Athena query charge**.
- Even if you disable federation, **none of Lake's data is deleted.** You just can't query from Athena anymore; in Lake you can keep querying.
- Caution: **an EDS with federation enabled can't be deleted.** To delete it, you first need to disable federation (and termination protection, if enabled).

The core of the minimal permissions needed by the federation role is read access to the EDS data (and decryption permission too if KMS-encrypted).

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "LakeFederationEDSDataAccess",
      "Effect": "Allow",
      "Action": "cloudtrail:GetEventDataStoreData",
      "Resource": "arn:aws:cloudtrail:ap-northeast-1:123456789012:eventdatastore/EXAMPLE-eds-id"
    },
    {
      "Sid": "LakeFederationKMSDecryptAccess",
      "Effect": "Allow",
      "Action": ["kms:Decrypt", "kms:GenerateDataKey"],
      "Resource": "arn:aws:kms:ap-northeast-1:123456789012:key/EXAMPLE-key-id"
    }
  ]
}
```

> A hint on how to use them: **Lake's native Trino queries are for "the instant, dashboards-included," while Athena via federation is for "when you want to JOIN with other Athena tables (VPC Flow Logs, ALB logs, etc.)."** Federation can also be used as a "migration gradient" from Lake to Athena — you can take a migration design that first creates a state where both can query the same data, then gradually shifts analysis to the Athena side.

---

## 7. Summary: a Lake / Athena / CloudWatch usage cheat sheet

Finally, let me close with a quick-reference table usable for decisions starting tomorrow.

| Your question | Answer |
| --- | --- |
| I'm about to build audit-log analysis. Can I use Lake? | **No (new-customer onboarding ended 2026/5/31)** |
| For a new build, I want the cheapest, self-built | **S3 + Athena** (reduce scans with partition projection) |
| For a new build, I want managed / OCSF / OpenSearch | **CloudWatch** (the officially recommended migration destination) |
| I have an existing Lake EDS | **OK to continue using.** Use the Section 3 cookbook to the fullest |
| What to do with existing Lake going forward | Verify CloudWatch with a pilot migration. **Keep data from before 2023 in Lake/S3** |
| Planning to add a new account to the org (existing customer) | Check right now whether you have an **organization EDS** (without it, you can't onboard new accounts) |
| I want to JOIN Lake data with other Athena tables | **Federate to Glue** → query in Athena |
| I want members who can't write SQL to analyze too | **Natural-language query generation (GA)** + 14 managed dashboards |
| I want to aggregate >90 days, cross-attribute, with SQL (the original motive) | Not possible with event history. **Lake / Athena / CloudWatch — one of them — is required** |

The end of CloudTrail Lake's new-customer onboarding is not "the end," but **a good opportunity to rethink the design of audit-log analysis**. The essence never changes — **changing the structure of audit logs from "data you merely accumulate" into "an asset you can query cross-sectionally with SQL."** Whether with Lake, Athena, or CloudWatch, as long as you don't drop that single point, you can do an incident's first-pass triage and an answer to an audit fast and accurately by structure, not by gut.

On a serverless payment platform, I ensured the "correctness" of reliability and cost with the structure of code and maintained zero double charges in production. An audit-log platform can be designed with the same philosophy — by building in "an analyzable structure" from the start, it becomes a log platform strong in emergencies and cheap in peacetime.

> **For those who want to redesign audit logs into an "analyzable asset."** From mastering an existing Lake, to migration design to Athena/CloudWatch, to aggregating multi-account auditing, to a design that holds cost down with structure, I accompany you, going deep into the on-the-ground implementation. Feel free to [get in touch](/contact).

---

### Primary sources (official documentation)

- [CloudTrail Lake availability change (new-customer onboarding end / migration)](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-lake-service-availability-change.html)
- [Working with AWS CloudTrail Lake](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-lake.html)
- [CloudTrail Lake SQL constraints (Trino / JOIN / constraints)](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/query-limitations.html)
- [Create queries from natural language prompts (query generation: GA)](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/lake-query-generator.html)
- [Federate an event data store (Glue/Athena integration)](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/query-federation.html)
- [CloudTrail Lake sample queries (GitHub: aws-samples)](https://github.com/aws-samples/cloud-trail-lake-query-samples)
- [AWS CloudTrail Pricing](https://aws.amazon.com/cloudtrail/pricing/)
