AWS CloudTrail Lake Practical Guide (2026 Edition): Analyzing Events with Trino SQL and How to Choose Between It and Athena+S3 — The Realistic Answer After New-Customer Onboarding Ended

"I want to aggregate audit logs older than 90 days across attributes with SQL," "When an incident occurs, I want to narrow down in one go, with multiple conditions, who did what to which resource, when, and from which IP" — these are walls you always hit in the audit and security field.

But CloudTrail's event history (the thing you can view without a trail) has the constraints of 90 days, management events only, a single Region, and single-attribute search, and it's utterly no match for such cross-cutting analysis. That's exactly why AWS provides separate mechanisms for analyzing audit logs with SQL — CloudTrail Lake and Athena.

I led the reliability layer of a serverless payment platform and kept double charges in production at zero. What I keenly felt there is that audit logs don't become an asset just by "accumulating" them. The first-pass triage of an incident, and answering a compliance audit, ultimately come down to "whether the structure lets you query the logs cross-sectionally with SQL." This article is a practical guide to turning CloudTrail Lake into that "analyzable asset."

That said, in 2026, this article has an inconvenient truth to convey first.

CloudTrail Lake ended new-customer onboarding as of May 31, 2026. Existing customers can continue using it as before (CloudTrail Lake availability change). That is, this article is written so it can be read two ways: (a) for those already using Lake to master it, and (b) for those about to build an audit-log analysis platform to choose the optimal answer on the premise that "Lake can't be chosen." I judged that placing this reality first is the most honest and the most useful.

0. Mental model: Lake is "an immutable audit data lake you can analyze cross-sectionally with SQL"

Before getting into the fine features, let's pin down the essence in one picture. What is CloudTrail Lake? Borrowing the official words, it's "an audit solution that complements your compliance stack and supports near-real-time troubleshooting." In substance, it boils down to these three points.

Accumulate AWS / non-AWS activity events in an immutable event data store (EDS).
Accumulated events are internally converted from row-based JSON to Apache ORC (columnar) and can be retrieved fast.
On top of that, you can do SQL analysis that cuts across fields using all valid Trino SELECT statements and functions.

[AWS / non-AWS events]
        │  Selected/filtered by advanced event selectors
        ▼
┌─────────────────────────────┐
│  Event Data Store (EDS)     │  ← Immutable, ORC columnar, KMS-encrypted, retention tiers
│  1 category/EDS, org-EDS OK │
└─────────────────────────────┘
        │  All Trino SELECT, JOIN across multiple EDS
        ▼
[Trino SQL query] → results (retained up to 7 days / can save to S3)
        │
        ├─→ 14 managed dashboards / custom / Highlights
        ├─→ Natural language → SQL (generative-AI query generation: GA)
        └─→ Federate to Glue Data Catalog → query in Athena

This article is the CloudTrail cluster's "Lake / SQL-analysis" piece (a spoke). The trail-creation procedure, the overall picture of event types, the basic security design, and the DDL for Athena tables (partition projection) are consolidated in the CloudTrail audit-logging, governance, and security complete guide (the pillar article). This article doesn't reprint the DDL in full and instead digs into Lake itself and "how to choose between Lake and Athena."

1. The reality of the onboarding end, and the first decision

First, let's settle your standing. Which side are you on right now?

Your situation	Conclusion	The path to take
Already operating a Lake EDS	Can continue using it. Use Sections 2–4 of this article as-is	Use the existing EDS to the fullest. Consider migration too if needed (described later)
About to newly build an audit-log analysis platform	Lake can't be chosen. Athena+S3 or CloudWatch is the realistic answer	Choose with the decision matrix in Section 5
An existing customer planning to add member accounts to the org	Check right now whether you have an organization EDS	With an organization EDS, new accounts are auto-covered. With account EDS only, new accounts can't be onboarded

1-1. What existing customers "can / can't continue" (official)

The official availability change clearly divides the support scope for existing customers by EDS type. There's a lot of misunderstanding here, so let me quote precisely.

If you have an organization event data store (organization EDS): Lake functions as before. Support for member accounts newly added to the org, and expansion to additional Regions, also continue.
If you have only an account event data store (account EDS): existing accounts continue to be supported, and you can expand to new Regions. However, onboarding accounts newly added to the org is not supported. If you want to cover new accounts too, you need to create an organization EDS or migrate to CloudWatch.
Note that Lake enters a state of receiving only critical bug fixes and security updates. Don't expect new features.

Important: this is not a story of "Lake stopping." The official docs explicitly state "AWS CloudTrail continues to be fully supported. Only CloudTrail Lake ended new-customer onboarding. Trails, Insights, and aggregated events are unaffected." There's no need to panic and rebuild even your trails.

1-2. The migration destination AWS recommends is CloudWatch (not only Athena+S3)

Accuracy is everything here, so let me separate my opinion from the official line. The migration destination the official docs recommend first is Amazon CloudWatch. A mechanism to directly import a CloudTrail Lake EDS into CloudWatch (the console's Export to CloudWatch, or the CLI's aws logs create-import-task) is provided, carrying features Lake customers had strongly requested — OCSF/OTel support, OpenSearch-native analysis, and third-party connectors.

# 既存EDSをCloudWatchへインポート（CLI例：公式の create-import-task）
aws logs create-import-task \
  --import-source-arn "arn:aws:cloudtrail:ap-northeast-1:123456789012:eventdatastore/EXAMPLE-eds-id" \
  --import-role-arn "arn:aws:iam::123456789012:role/CloudTrailLakeExportRole" \
  --import-filter '{"startEventTime": 1704067200, "endEventTime": 1706745600}'

# 非同期処理なので進捗を確認
aws logs describe-import-tasks --import-id "EXAMPLE-import-id"

A migration pitfall (official note): data from before 2023 is not migrated from CloudTrail Lake to CloudWatch. If you need events from before 2023, keep querying them directly in Lake, or move them to an S3 bucket. That's exactly why, rather than streaming the whole volume at once, the official best practice is to first pilot-migrate a narrow range like 24 hours, verify the schema, queries, dashboards, and IAM roles work as expected, and then migrate to production.

So why does this article also place Athena+S3 in a leading role? There are three reasons.

Many teams already have an S3 destination for trail delivery, separate from Lake/CloudWatch. Just by defining an Athena table there, they can start SQL analysis with no additional ingestion cost.
The scan unit price is cheap (Lake is $0.005/GB scanned, Athena is $5/TB ≒ $0.005/GB, but Athena can physically reduce the range it reads with partition projection), so you can beat cost down with structure.
Lake and Athena can be bridged with federation (Section 6). Not an either-or — you can take both concurrent use and a migration gradient.

In conclusion, for a new build, place "trail → S3 → Athena" as the foundation, and choose CloudWatch if you want a managed orientation / OCSF / OpenSearch — that's the realistic answer in 2026. The fit and unfit of each are tabulated in Section 5.

2. How Lake works: correctly understanding the event data store (EDS)

The key to mastering Lake is accurately grasping the nature of the EDS, its foundation. Use it while leaving this vague, and you'll later have irreversible accidents like "I got the retention period wrong" or "I can't detach the KMS key."

2-1. The five essentials of an EDS

Property	Content	Design implication
Immutable	An EDS is an immutable collection of events. It can't be rewritten or tampered with afterward	Trustworthy as an audit trail. Becomes a forensic foundation
1 category/EDS	One EDS holds only one event category (management / data / network, etc.)	Separate management-event and data-event ones. Cross them with JOIN
Selected with advanced event selectors	Precisely control what to ingest with advanced event selectors	Ingestion volume = cost. Narrow it surgically from the start
Multi-Region / multi-account	With an organization EDS, aggregate multiple accounts and multiple Regions into one	Cut across the whole org's audit logs in one EDS with SQL
KMS-encrypted by default	CloudTrail encrypts by default. Optionally, you can also specify a customer KMS key	If you specify a customer KMS key, you can't detach or change it later. Be careful

2-2. The retention tier is a choice between "extendable one year" or "seven years"

The most important parameter when creating an EDS is the retention period. The official docs have retention types corresponding to two billing models.

Retention type	Default	Maximum	Use-case guideline
One-year-extendable	366 days	3,653 days (about 10 years)	Flexibly, from ordinary operational analysis to long-term retention
Seven-year	About 2,557 days (about 7 years)	About 2,557 days (about 7 years, fixed)	Compliance uses where 7-year retention is a fixed requirement

As intuition: "If you just want to analyze for now, One-year-extendable; if 7-year retention is fixed by law or contract, Seven-year." The longer you make retention, the more storage charges (the one-year-extendable portion is $0.023/GB/month) kick in. I leave the deep dive on pricing to the CloudTrail pricing & cost-optimization guide and here just carve in the single point "retention = cost."

2-3. You can "copy-ingest" existing trail events

An EDS doesn't only accumulate future events; you can copy events already accumulated in an existing trail into the EDS, creating a snapshot up to some past point. This pays off in the field where "I adopted Lake later but want to analyze past logs with SQL too." Since it's ingestion of existing logs rather than new ingestion, you can rehouse past incident investigations onto Lake's SQL.

2-4. The organization EDS: the whole company's audit on one SQL surface

If you bundle multiple accounts with AWS Organizations, creating an organization EDS lets you aggregate all member accounts' events into one EDS. As we saw in 1-1, even after the onboarding end, the organization EDS automatically covers new member accounts. The design of multi-account auditing itself is detailed in the organization trail / multi-account audit guide, so refer to that too on the premise of combining it with Lake.

3. The Trino SQL cookbook: a set of practical, copy-paste-ready queries

This is the leading role of this article. CloudTrail Lake supports all valid Trino SELECT statements and functions (SQL constraints). Only SELECT — no query that modifies or tampers with data passes at all. The queries below are faithful to the field paths of the official sample query set and query-generation examples.

Usage premises (three)

In FROM, write the EDS's ID (the tail of the ARN). In this article I denote it $EDS_ID. Replace it with your ID at execution time.

Always include an eventTime range. Query billing is incurred against the amount of data scanned ($0.005/GB). Narrowing the period is itself cost reduction.

Access nested fields with dot notation (userIdentity.arn), and map types with element_at(map, 'key').

3-1. Top API callers (who is calling which API)

A basic operational query that surfaces "which principal is calling IAM the most." Change the service and it's usable generically.

SELECT
    COUNT(*) AS apiCount,
    eventName,
    recipientAccountId,
    userIdentity.principalId
FROM $EDS_ID
WHERE userIdentity.principalId IS NOT NULL
    AND eventSource = 'iam.amazonaws.com'
    AND eventTime >= '2026-06-01 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
GROUP BY eventName, recipientAccountId, userIdentity.principalId
ORDER BY apiCount DESC
LIMIT 50;

3-2. Detecting AccessDenied spikes by principal

A surge of authorization errors is either a misconfiguration or a sign of an attacker probing permissions. Aggregate who is being denied on which actions.

SELECT
    userIdentity.arn AS principal,
    eventName,
    eventSource,
    COUNT(*) AS deniedCount
FROM $EDS_ID
WHERE (errorCode = 'AccessDenied' OR errorMessage = 'Access Denied')
    AND eventTime >= '2026-06-20 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
GROUP BY userIdentity.arn, eventName, eventSource
ORDER BY deniedCount DESC
LIMIT 100;

Why look at both errorCode and errorMessage: depending on the service, it may be recorded as AccessDenied (errorCode) or as Access Denied (errorMessage). The official query-generation example also picks up both with an OR. I leave the forensic-perspective incident-response framework to the threat-detection / incident-response guide. This article's queries lean strictly toward "analysis / aggregation."

3-3. Surfacing root-user usage

Root-user operations are themselves the most important audit signal. Extract them with userIdentity.type = 'Root'.

SELECT
    eventTime,
    eventName,
    eventSource,
    sourceIPAddress,
    awsRegion,
    userIdentity.arn
FROM $EDS_ID
WHERE userIdentity.type = 'Root'
    AND eventTime >= '2026-06-01 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
ORDER BY eventTime DESC;

3-4. Console logins without MFA

One of the most important security queries, straight from the official sample. Pick up sign-in events where MFAUsed is No.

SELECT
    eventTime,
    userIdentity.arn,
    sourceIPAddress,
    awsRegion,
    element_at(additionalEventData, 'MFAUsed') AS mfaUsed
FROM $EDS_ID
WHERE eventSource = 'signin.amazonaws.com'
    AND eventName = 'ConsoleLogin'
    AND element_at(additionalEventData, 'MFAUsed') = 'No'
    AND eventTime >= '2026-06-01 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
ORDER BY eventTime DESC;

additionalEventData is a map type, so you extract the key value with element_at(additionalEventData, 'MFAUsed'). This is faithful to the field path of the official sample console-login-with-no-mfa.sql.

3-5. Cross-account access (from another account to your resources)

Events where "the caller's account ID (userIdentity.accountId) and the recipient's account ID (recipientAccountId) don't match" are cross-account access. Use this to detect unintended external access.

SELECT
    eventTime,
    userIdentity.principalId,
    eventName,
    eventSource,
    userIdentity.accountId   AS callerAccount,
    recipientAccountId       AS resourceAccount
FROM $EDS_ID
WHERE userIdentity.accountId != recipientAccountId
    AND eventTime >= '2026-06-01 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
ORDER BY eventTime DESC;

3-6. A timeline of IAM policy changes

Permission changes are the core of auditing. Lay out change-type events like PutRolePolicy chronologically, together with the target role name and policy document. The point is to extract values from the map type requestParameters with element_at.

SELECT
    eventTime,
    recipientAccountId,
    eventName,
    userIdentity.arn,
    element_at(requestParameters, 'roleName')       AS roleName,
    element_at(requestParameters, 'policyDocument')  AS policyDocument
FROM $EDS_ID
WHERE eventName IN ('PutRolePolicy', 'AttachRolePolicy', 'PutUserPolicy', 'CreatePolicyVersion')
    AND eventTime >= '2026-06-01 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
ORDER BY eventTime DESC;

3-7. Daily read/write trend

Split read-type and write-type with the readOnly flag and aggregate daily. This becomes a starting point for finding abnormal peaks in activity. A CASE aggregation faithful to the official query-generation example.

SELECT
    date(eventTime) AS eventDate,
    SUM(CASE WHEN readOnly = true  THEN 1 ELSE 0 END) AS readEvents,
    SUM(CASE WHEN readOnly = false THEN 1 ELSE 0 END) AS writeEvents
FROM $EDS_ID
WHERE eventTime >= '2026-06-01 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
GROUP BY date(eventTime)
ORDER BY eventDate ASC;

3-8. Activity by source IP

Find concentrated access from a suspicious single IP. Output the count and the number of unique principals by sourceIPAddress.

SELECT
    sourceIPAddress,
    COUNT(*) AS eventCount,
    COUNT(DISTINCT userIdentity.arn) AS distinctPrincipals
FROM $EDS_ID
WHERE eventTime >= '2026-06-20 00:00:00'
    AND eventTime <  '2026-06-27 00:00:00'
GROUP BY sourceIPAddress
ORDER BY eventCount DESC
LIMIT 50;

3-9. JOIN across multiple EDS (e.g. matching the production EDS and the audit EDS)

Lake supports advanced queries that span multiple EDS (multi-table query support). The supported JOIN-type operators are UNION / UNION ALL / EXCEPT / INTERSECT / LEFT JOIN / RIGHT JOIN / INNER JOIN.

-- 2つのEDS（本番=edsA, 監査=edsB）を eventId で突き合わせる
SELECT
    edsA.eventName  AS prodEvent,
    edsB.eventName  AS auditEvent,
    edsA.eventTime
FROM eds1_id AS edsA
LEFT JOIN eds2_id AS edsB
    ON edsA.eventId = edsB.eventId
WHERE edsA.eventTime >= '2026-06-01 00:00:00'
    AND edsA.eventTime <  '2026-06-27 00:00:00'
ORDER BY edsA.eventTime DESC;

-- 3つのEDSのイベントを縦に連結して横断的に並べる
SELECT eventId, eventName FROM eds1_id
UNION
SELECT eventId, eventName FROM eds2_id
UNION ALL
SELECT eventId, eventName FROM eds3_id
ORDER BY eventId
LIMIT 100;

Query results are retained for up to 7 days and, if needed, can be saved to an S3 bucket (the CLI's start-query --delivery-s3uri). Queries running for a long time (over an hour) may time out, but even then you can retrieve the processed partial results. That's exactly why narrowing the range with eventTime pays off on both cost and completion-rate fronts.

4. Dashboards and generative AI: visualization and "analysis without writing SQL"

If only people who can write SQL can analyze, the org's auditing doesn't run. Lake has dashboards and generative-AI assistance.

4-1. Three kinds of dashboards

Kind	Content	Editing	Update
Managed dashboards	14 ready-to-use dashboards provided by CloudTrail	Not editable (but can be saved as custom)	—
Custom dashboards	You build them yourself. Up to 10 widgets, each widget = one SQL query	Editable	Manual or scheduled update
Highlights dashboard	A managed "at-a-glance" summary. Updates every 6 hours and shows the most recent 24 hours	—	Automatic every 6 hours

The point is that the substance of each widget is one SQL query. That is, the Section 3 cookbook can be diverted directly into custom-dashboard widgets. Build one audit dashboard lining up "logins without MFA," "root usage," and "AccessDenied spikes," and daily checks are done without writing SQL each time.

4-2. Natural-language query generation is "GA," result summarization is "preview"

This is easy to confuse, so let me clearly separate the maturity.

Feature	Maturity	Overview	Constraints / Regions
Query generation (natural language → SQL)	GA (general availability)	Generates executable SQL from an English prompt. The CLI is `generate-query`	Requires IAM `cloudtrail:GenerateQuery`, English only, 3–500 characters, generation itself is free, about 7 supported Regions
Query result summarization (result → natural language)	preview (subject to change)	Summarizes execution results in English natural language	Available Regions are Tokyo, N. Virginia, Oregon

Query generation (GA) builds SQL from an English question like "what were the top errors last month?" even if you're not well-versed in SQL or CloudTrail fields. In the CLI you use it like this.

# 英語プロンプトからSQLを生成（生成は無料。実行時のみスキャン課金）
aws cloudtrail generate-query \
  --event-data-stores "arn:aws:cloudtrail:ap-northeast-1:123456789012:eventdatastore/EXAMPLE-eds-id" \
  --prompt "Show me all console login events for the past week"

# 返ってきた QueryAlias でそのまま実行
aws cloudtrail start-query --query-alias AWSCloudTrail-EXAMPLE-UUID

A generation example (official) goes like this. For the prompt "Show any events with access denied errors for the past three weeks," it returns SQL that includes errorCode = 'AccessDenied' and an eventTime range.

Cautions (official): don't include personal or confidential information in the prompt. This is generative AI (an LLM), and it's recommended that the returned SQL always be checked by a person before execution. AI is strictly a tool to "speed up the draft," and the final check is done by a person — the same as the "fast with AI, verification by humans" principle I've held to on the payment platform. Remove cloudtrail:GenerateQuery from the IAM policy and you can explicitly opt out of the feature.

5. Lake vs. Athena thorough comparison: how do new builds ultimately choose?

This is the main event for those about to build. Since Lake can't be chosen for new builds, the real options become a choice between "S3 + Athena (self-managed)" and "CloudWatch (the managed Lake successor)." Let's pin down the decision axes in a table.

5-1. Decision matrix

Aspect	CloudTrail Lake (existing customers only)	S3 + Athena	CloudWatch (the officially recommended migration destination)
Usable for new builds	No (ended 2026/5/31)	Yes	Yes
Degree of managedness	High (auto ingestion, conversion, retention)	Low (self-manage tables, partitions, lifecycle)	High
Immutability / tamper resistance	The EDS is immutable	Separately ensured via S3 Object Lock, etc.	Inherits CloudTrail safety features
Query engine	Trino (all SELECT)	Trino-family (Athena/Presto)	Logs QL / SQL / PPL (OpenSearch)
Immediacy of cross-cutting analysis	Instant SQL (ready once ingested)	Table DDL definition required	Analyze after ingestion
Scan cost	$0.005/GB	$5/TB (≒ $0.005/GB) + reduce the range read with partition projection	OpenSearch-analysis pricing model
Built-in dashboards	14 managed + custom + Highlights	Self-built (QuickSight, etc.)	OpenSearch / built-in analysis
Initial setup	Light (just create an EDS)	Medium (S3 delivery + Glue catalog + DDL)	Medium (pipeline configuration)
OCSF/OTel, third-party integration	Limited	Self-built	Native support, abundant connectors

5-2. Recommendations by scenario

"S3 trail delivery already exists. I want to minimize cost. I can write SQL myself." → S3 + Athena. You can analyze with no additional ingestion charges, and the greatest weapon is being able to physically reduce scan volume with partition projection. Small-to-medium-scale audit analysis is fully covered by this.
"I want to centralize security, operations, and compliance logs. I want OCSF normalization, OpenSearch-native analysis, and third-party connectors." → CloudWatch. The path the official docs push first as Lake's successor.
"I have an existing Lake EDS and want to keep using data from 2023 onward." → continue with Lake for now, while pilot-migrating to CloudWatch as needed. Keep data from before 2023 in Lake or move it to S3.

5-3. Implementation guidelines when choosing S3 + Athena (key points only)

There are three crux points when choosing Athena for a new build.

Use partition projection to narrow the range read with a region/year/month/day hierarchy. This is the lever that makes scan volume = cost matter the most.
Place table definitions in the Glue Data Catalog and manage the schema centrally.
Send old partitions to Glacier with a lifecycle, and separate storage cost from analysis targets.

The complete version of the DDL for Athena tables (a CREATE EXTERNAL TABLE with partition projection for CloudTrail logs) is posted in the CloudTrail audit-logging, governance, and security complete guide. Duplicating the full text in this article would double maintenance, so the DDL is unified in the pillar. The details of cost calculation for Lake and Athena (the actual $/TB scanned, the effect of lifecycle) are left to the pricing & cost-optimization guide. This section's role is the decision of "which to choose."

6. External sources and federation: widening Lake's boundary

Lake isn't only for inside AWS. And Lake and Athena can be bridged.

6-1. Ingesting activity from outside AWS (PutAuditEvents)

You can ingest activity events from on-premises, SaaS, and hybrid environments into Lake through the PutAuditEvents API and "channels." Furthermore, more than a dozen partner integrations (direct integrations / solution integrations) are provided, letting you cross-analyze AWS audit logs and non-AWS logs with the same EDS and the same Trino SQL. Being able to follow "AWS operations" and "external SaaS operations" on one timeline is extremely powerful in incident investigation.

6-2. Federate the EDS to the Glue Data Catalog → query in Athena

This is the feature that connects Lake and Athena. Federate an EDS to the AWS Glue Data Catalog and the metadata is registered in Glue, letting you query the same event data with SQL from Amazon Athena via Lake Formation.

Let me precisely grasp the official behavior (federation).

When you enable federation, CloudTrail creates in the Glue Data Catalog a managed database named aws:cloudtrail (if none exists) and a managed federated table named after the EDS's ID.
Lake Formation handles fine-grained access control of the federated resource. Delete the federation role or revoke permissions in Lake Formation, and you can no longer query from Athena.
A user querying this data in Athena must be allowed lakeformation:GetDataAccess in IAM (included in the AmazonAthenaFullAccess managed policy).
Federation itself incurs no CloudTrail charge. What is incurred is only the Athena query charge.
Even if you disable federation, none of Lake's data is deleted. You just can't query from Athena anymore; in Lake you can keep querying.
Caution: an EDS with federation enabled can't be deleted. To delete it, you first need to disable federation (and termination protection, if enabled).

The core of the minimal permissions needed by the federation role is read access to the EDS data (and decryption permission too if KMS-encrypted).

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "LakeFederationEDSDataAccess",
      "Effect": "Allow",
      "Action": "cloudtrail:GetEventDataStoreData",
      "Resource": "arn:aws:cloudtrail:ap-northeast-1:123456789012:eventdatastore/EXAMPLE-eds-id"
    },
    {
      "Sid": "LakeFederationKMSDecryptAccess",
      "Effect": "Allow",
      "Action": ["kms:Decrypt", "kms:GenerateDataKey"],
      "Resource": "arn:aws:kms:ap-northeast-1:123456789012:key/EXAMPLE-key-id"
    }
  ]
}

A hint on how to use them: Lake's native Trino queries are for "the instant, dashboards-included," while Athena via federation is for "when you want to JOIN with other Athena tables (VPC Flow Logs, ALB logs, etc.)." Federation can also be used as a "migration gradient" from Lake to Athena — you can take a migration design that first creates a state where both can query the same data, then gradually shifts analysis to the Athena side.

7. Summary: a Lake / Athena / CloudWatch usage cheat sheet

Finally, let me close with a quick-reference table usable for decisions starting tomorrow.

Your question	Answer
I'm about to build audit-log analysis. Can I use Lake?	No (new-customer onboarding ended 2026/5/31)
For a new build, I want the cheapest, self-built	S3 + Athena (reduce scans with partition projection)
For a new build, I want managed / OCSF / OpenSearch	CloudWatch (the officially recommended migration destination)
I have an existing Lake EDS	OK to continue using. Use the Section 3 cookbook to the fullest
What to do with existing Lake going forward	Verify CloudWatch with a pilot migration. Keep data from before 2023 in Lake/S3
Planning to add a new account to the org (existing customer)	Check right now whether you have an organization EDS (without it, you can't onboard new accounts)
I want to JOIN Lake data with other Athena tables	Federate to Glue → query in Athena
I want members who can't write SQL to analyze too	Natural-language query generation (GA) + 14 managed dashboards
I want to aggregate >90 days, cross-attribute, with SQL (the original motive)	Not possible with event history. Lake / Athena / CloudWatch — one of them — is required

The end of CloudTrail Lake's new-customer onboarding is not "the end," but a good opportunity to rethink the design of audit-log analysis. The essence never changes — changing the structure of audit logs from "data you merely accumulate" into "an asset you can query cross-sectionally with SQL." Whether with Lake, Athena, or CloudWatch, as long as you don't drop that single point, you can do an incident's first-pass triage and an answer to an audit fast and accurately by structure, not by gut.

On a serverless payment platform, I ensured the "correctness" of reliability and cost with the structure of code and maintained zero double charges in production. An audit-log platform can be designed with the same philosophy — by building in "an analyzable structure" from the start, it becomes a log platform strong in emergencies and cheap in peacetime.

For those who want to redesign audit logs into an "analyzable asset." From mastering an existing Lake, to migration design to Athena/CloudWatch, to aggregating multi-account auditing, to a design that holds cost down with structure, I accompany you, going deep into the on-the-ground implementation. Feel free to get in touch.

AWS CloudTrail Lake Practical Guide (2026 Edition): Analyzing Events with Trino SQL and How to Choose Between It and Athena+S3 — The Realistic Answer After New-Customer Onboarding Ended

0. Mental model: Lake is "an immutable audit data lake you can analyze cross-sectionally with SQL"

1. The reality of the onboarding end, and the first decision

1-1. What existing customers "can / can't continue" (official)

1-2. The migration destination AWS recommends is CloudWatch (not only Athena+S3)

2. How Lake works: correctly understanding the event data store (EDS)

2-1. The five essentials of an EDS

2-2. The retention tier is a choice between "extendable one year" or "seven years"

2-3. You can "copy-ingest" existing trail events

2-4. The organization EDS: the whole company's audit on one SQL surface

3. The Trino SQL cookbook: a set of practical, copy-paste-ready queries

3-1. Top API callers (who is calling which API)

3-2. Detecting AccessDenied spikes by principal

3-3. Surfacing root-user usage

3-4. Console logins without MFA

3-5. Cross-account access (from another account to your resources)

3-6. A timeline of IAM policy changes

3-7. Daily read/write trend

3-8. Activity by source IP

3-9. JOIN across multiple EDS (e.g. matching the production EDS and the audit EDS)

4. Dashboards and generative AI: visualization and "analysis without writing SQL"

4-1. Three kinds of dashboards

4-2. Natural-language query generation is "GA," result summarization is "preview"

5. Lake vs. Athena thorough comparison: how do new builds ultimately choose?

5-1. Decision matrix

5-2. Recommendations by scenario

5-3. Implementation guidelines when choosing S3 + Athena (key points only)

6. External sources and federation: widening Lake's boundary

6-1. Ingesting activity from outside AWS (PutAuditEvents)

6-2. Federate the EDS to the Glue Data Catalog → query in Athena

7. Summary: a Lake / Athena / CloudWatch usage cheat sheet

Primary sources (official documentation)

The Complete AWS CloudTrail Guide (2026 Edition): Designing API Activity Auditing, Trails, CloudTrail Lake, Athena Analysis, and Real-Time Detection at Production Quality

Preparing for Compliance Audits with AWS CloudTrail (2026 Edition): Leaving the Audit Trail for PCI DSS, SOC 2, ISO 27001, and HIPAA as Tamper-Proof Evidence

Building a Company-Wide CloudTrail Audit Platform with AWS Organizations (2026 Edition): Aggregating Every Account's Trail with Organization Trails, Delegated Administrators, a Log Archive Account, SCPs, and Control Tower

AWS CloudTrail Pricing & Cost-Optimization Complete Guide (2026 Edition): The Free Boundary, the Double-Billing Trap, the Data-Event Explosion, and the Cost Design of CloudTrail Lake/Athena

Also worth reading

Echo file-upload production design: receiving safely with multipart, S3 streaming, presigned URLs, and validation

A thorough explanation of GuardDuty Extended Threat Detection and attack-sequence findings: reading and responding to weak-signal correlation, the 24-hour window, and Critical multi-stage attacks

Designing AWS Threat Detection in Production with Amazon GuardDuty: Protection Plans, Extended Threat Detection, Org-Wide Bulk Enablement, and EventBridge Automated Response, in Real Code

0. Mental model: Lake is "an immutable audit data lake you can analyze cross-sectionally with SQL"

1. The reality of the onboarding end, and the first decision

1-1. What existing customers "can / can't continue" (official)

1-2. The migration destination AWS recommends is CloudWatch (not only Athena+S3)

2. How Lake works: correctly understanding the event data store (EDS)

2-1. The five essentials of an EDS

2-2. The retention tier is a choice between "extendable one year" or "seven years"

2-3. You can "copy-ingest" existing trail events

2-4. The organization EDS: the whole company's audit on one SQL surface

3. The Trino SQL cookbook: a set of practical, copy-paste-ready queries

3-1. Top API callers (who is calling which API)

3-2. Detecting AccessDenied spikes by principal

3-3. Surfacing root-user usage

3-4. Console logins without MFA

3-5. Cross-account access (from another account to your resources)

3-6. A timeline of IAM policy changes

3-7. Daily read/write trend

3-8. Activity by source IP

3-9. JOIN across multiple EDS (e.g. matching the production EDS and the audit EDS)

4. Dashboards and generative AI: visualization and "analysis without writing SQL"

4-1. Three kinds of dashboards

4-2. Natural-language query generation is "GA," result summarization is "preview"

5. Lake vs. Athena thorough comparison: how do new builds ultimately choose?

5-1. Decision matrix

5-2. Recommendations by scenario

5-3. Implementation guidelines when choosing S3 + Athena (key points only)

6. External sources and federation: widening Lake's boundary

6-1. Ingesting activity from outside AWS (PutAuditEvents)

6-2. Federate the EDS to the Glue Data Catalog → query in Athena

7. Summary: a Lake / Athena / CloudWatch usage cheat sheet

Primary sources (official documentation)

Related articles

The Complete AWS CloudTrail Guide (2026 Edition): Designing API Activity Auditing, Trails, CloudTrail Lake, Athena Analysis, and Real-Time Detection at Production Quality

Preparing for Compliance Audits with AWS CloudTrail (2026 Edition): Leaving the Audit Trail for PCI DSS, SOC 2, ISO 27001, and HIPAA as Tamper-Proof Evidence

Building a Company-Wide CloudTrail Audit Platform with AWS Organizations (2026 Edition): Aggregating Every Account's Trail with Organization Trails, Delegated Administrators, a Log Archive Account, SCPs, and Control Tower

AWS CloudTrail Pricing & Cost-Optimization Complete Guide (2026 Edition): The Free Boundary, the Double-Billing Trap, the Data-Event Explosion, and the Cost Design of CloudTrail Lake/Athena

Also worth reading

Echo file-upload production design: receiving safely with multipart, S3 streaming, presigned URLs, and validation

A thorough explanation of GuardDuty Extended Threat Detection and attack-sequence findings: reading and responding to weak-signal correlation, the 24-hour window, and Critical multi-stage attacks

Designing AWS Threat Detection in Production with Amazon GuardDuty: Protection Plans, Extended Threat Detection, Org-Wide Bulk Enablement, and EventBridge Automated Response, in Real Code