"I want to aggregate audit logs older than 90 days across attributes with SQL," "When an incident occurs, I want to narrow down in one go, with multiple conditions, who did what to which resource, when, and from which IP" — these are walls you always hit in the audit and security field.
But CloudTrail's event history (the thing you can view without a trail) has the constraints of 90 days, management events only, a single Region, and single-attribute search, and it's utterly no match for such cross-cutting analysis. That's exactly why AWS provides separate mechanisms for analyzing audit logs with SQL — CloudTrail Lake and Athena.
I led the reliability layer of a serverless payment platform and kept double charges in production at zero. What I keenly felt there is that audit logs don't become an asset just by "accumulating" them. The first-pass triage of an incident, and answering a compliance audit, ultimately come down to "whether the structure lets you query the logs cross-sectionally with SQL." This article is a practical guide to turning CloudTrail Lake into that "analyzable asset."
That said, in 2026, this article has an inconvenient truth to convey first.
CloudTrail Lake ended new-customer onboarding as of May 31, 2026. Existing customers can continue using it as before (CloudTrail Lake availability change). That is, this article is written so it can be read two ways: (a) for those already using Lake to master it, and (b) for those about to build an audit-log analysis platform to choose the optimal answer on the premise that "Lake can't be chosen." I judged that placing this reality first is the most honest and the most useful.
0. Mental model: Lake is "an immutable audit data lake you can analyze cross-sectionally with SQL"
Before getting into the fine features, let's pin down the essence in one picture. What is CloudTrail Lake? Borrowing the official words, it's "an audit solution that complements your compliance stack and supports near-real-time troubleshooting." In substance, it boils down to these three points.
- Accumulate AWS / non-AWS activity events in an immutable event data store (EDS).
- Accumulated events are internally converted from row-based JSON to Apache ORC (columnar) and can be retrieved fast.
- On top of that, you can do SQL analysis that cuts across fields using all valid Trino
SELECTstatements and functions.
[AWS / non-AWS events]
│ Selected/filtered by advanced event selectors
▼
┌─────────────────────────────┐
│ Event Data Store (EDS) │ ← Immutable, ORC columnar, KMS-encrypted, retention tiers
│ 1 category/EDS, org-EDS OK │
└─────────────────────────────┘
│ All Trino SELECT, JOIN across multiple EDS
▼
[Trino SQL query] → results (retained up to 7 days / can save to S3)
│
├─→ 14 managed dashboards / custom / Highlights
├─→ Natural language → SQL (generative-AI query generation: GA)
└─→ Federate to Glue Data Catalog → query in Athena
This article is the CloudTrail cluster's "Lake / SQL-analysis" piece (a spoke). The trail-creation procedure, the overall picture of event types, the basic security design, and the DDL for Athena tables (partition projection) are consolidated in the CloudTrail audit-logging, governance, and security complete guide (the pillar article). This article doesn't reprint the DDL in full and instead digs into Lake itself and "how to choose between Lake and Athena."
1. The reality of the onboarding end, and the first decision
First, let's settle your standing. Which side are you on right now?
| Your situation | Conclusion | The path to take |
|---|---|---|
| Already operating a Lake EDS | Can continue using it. Use Sections 2–4 of this article as-is | Use the existing EDS to the fullest. Consider migration too if needed (described later) |
| About to newly build an audit-log analysis platform | Lake can't be chosen. Athena+S3 or CloudWatch is the realistic answer | Choose with the decision matrix in Section 5 |
| An existing customer planning to add member accounts to the org | Check right now whether you have an organization EDS | With an organization EDS, new accounts are auto-covered. With account EDS only, new accounts can't be onboarded |
1-1. What existing customers "can / can't continue" (official)
The official availability change clearly divides the support scope for existing customers by EDS type. There's a lot of misunderstanding here, so let me quote precisely.
- If you have an organization event data store (organization EDS): Lake functions as before. Support for member accounts newly added to the org, and expansion to additional Regions, also continue.
- If you have only an account event data store (account EDS): existing accounts continue to be supported, and you can expand to new Regions. However, onboarding accounts newly added to the org is not supported. If you want to cover new accounts too, you need to create an organization EDS or migrate to CloudWatch.
- Note that Lake enters a state of receiving only critical bug fixes and security updates. Don't expect new features.
Important: this is not a story of "Lake stopping." The official docs explicitly state "AWS CloudTrail continues to be fully supported. Only CloudTrail Lake ended new-customer onboarding. Trails, Insights, and aggregated events are unaffected." There's no need to panic and rebuild even your trails.
1-2. The migration destination AWS recommends is CloudWatch (not only Athena+S3)
Accuracy is everything here, so let me separate my opinion from the official line. The migration destination the official docs recommend first is Amazon CloudWatch. A mechanism to directly import a CloudTrail Lake EDS into CloudWatch (the console's Export to CloudWatch, or the CLI's aws logs create-import-task) is provided, carrying features Lake customers had strongly requested — OCSF/OTel support, OpenSearch-native analysis, and third-party connectors.
# 既存EDSをCloudWatchへインポート(CLI例:公式の create-import-task)
aws logs create-import-task \
--import-source-arn "arn:aws:cloudtrail:ap-northeast-1:123456789012:eventdatastore/EXAMPLE-eds-id" \
--import-role-arn "arn:aws:iam::123456789012:role/CloudTrailLakeExportRole" \
--import-filter '{"startEventTime": 1704067200, "endEventTime": 1706745600}'
# 非同期処理なので進捗を確認
aws logs describe-import-tasks --import-id "EXAMPLE-import-id"
A migration pitfall (official note): data from before 2023 is not migrated from CloudTrail Lake to CloudWatch. If you need events from before 2023, keep querying them directly in Lake, or move them to an S3 bucket. That's exactly why, rather than streaming the whole volume at once, the official best practice is to first pilot-migrate a narrow range like 24 hours, verify the schema, queries, dashboards, and IAM roles work as expected, and then migrate to production.
So why does this article also place Athena+S3 in a leading role? There are three reasons.
- Many teams already have an S3 destination for trail delivery, separate from Lake/CloudWatch. Just by defining an Athena table there, they can start SQL analysis with no additional ingestion cost.
- The scan unit price is cheap (Lake is $0.005/GB scanned, Athena is $5/TB ≒ $0.005/GB, but Athena can physically reduce the range it reads with partition projection), so you can beat cost down with structure.
- Lake and Athena can be bridged with federation (Section 6). Not an either-or — you can take both concurrent use and a migration gradient.
In conclusion, for a new build, place "trail → S3 → Athena" as the foundation, and choose CloudWatch if you want a managed orientation / OCSF / OpenSearch — that's the realistic answer in 2026. The fit and unfit of each are tabulated in Section 5.
2. How Lake works: correctly understanding the event data store (EDS)
The key to mastering Lake is accurately grasping the nature of the EDS, its foundation. Use it while leaving this vague, and you'll later have irreversible accidents like "I got the retention period wrong" or "I can't detach the KMS key."
2-1. The five essentials of an EDS
| Property | Content | Design implication |
|---|---|---|
| Immutable | An EDS is an immutable collection of events. It can't be rewritten or tampered with afterward | Trustworthy as an audit trail. Becomes a forensic foundation |
| 1 category/EDS | One EDS holds only one event category (management / data / network, etc.) | Separate management-event and data-event ones. Cross them with JOIN |
| Selected with advanced event selectors | Precisely control what to ingest with advanced event selectors | Ingestion volume = cost. Narrow it surgically from the start |
| Multi-Region / multi-account | With an organization EDS, aggregate multiple accounts and multiple Regions into one | Cut across the whole org's audit logs in one EDS with SQL |
| KMS-encrypted by default | CloudTrail encrypts by default. Optionally, you can also specify a customer KMS key | If you specify a customer KMS key, you can't detach or change it later. Be careful |
2-2. The retention tier is a choice between "extendable one year" or "seven years"
The most important parameter when creating an EDS is the retention period. The official docs have retention types corresponding to two billing models.
| Retention type | Default | Maximum | Use-case guideline |
|---|---|---|---|
| One-year-extendable | 366 days | 3,653 days (about 10 years) | Flexibly, from ordinary operational analysis to long-term retention |
| Seven-year | About 2,557 days (about 7 years) | About 2,557 days (about 7 years, fixed) | Compliance uses where 7-year retention is a fixed requirement |
As intuition: "If you just want to analyze for now, One-year-extendable; if 7-year retention is fixed by law or contract, Seven-year." The longer you make retention, the more storage charges (the one-year-extendable portion is $0.023/GB/month) kick in. I leave the deep dive on pricing to the CloudTrail pricing & cost-optimization guide and here just carve in the single point "retention = cost."
2-3. You can "copy-ingest" existing trail events
An EDS doesn't only accumulate future events; you can copy events already accumulated in an existing trail into the EDS, creating a snapshot up to some past point. This pays off in the field where "I adopted Lake later but want to analyze past logs with SQL too." Since it's ingestion of existing logs rather than new ingestion, you can rehouse past incident investigations onto Lake's SQL.
2-4. The organization EDS: the whole company's audit on one SQL surface
If you bundle multiple accounts with AWS Organizations, creating an organization EDS lets you aggregate all member accounts' events into one EDS. As we saw in 1-1, even after the onboarding end, the organization EDS automatically covers new member accounts. The design of multi-account auditing itself is detailed in the organization trail / multi-account audit guide, so refer to that too on the premise of combining it with Lake.
3. The Trino SQL cookbook: a set of practical, copy-paste-ready queries
This is the leading role of this article. CloudTrail Lake supports all valid Trino SELECT statements and functions (SQL constraints). Only SELECT — no query that modifies or tampers with data passes at all. The queries below are faithful to the field paths of the official sample query set and query-generation examples.
Usage premises (three)
- In
FROM, write the EDS's ID (the tail of the ARN). In this article I denote it$EDS_ID. Replace it with your ID at execution time.- Always include an
eventTimerange. Query billing is incurred against the amount of data scanned ($0.005/GB). Narrowing the period is itself cost reduction.- Access nested fields with dot notation (
userIdentity.arn), and map types withelement_at(map, 'key').
3-1. Top API callers (who is calling which API)
A basic operational query that surfaces "which principal is calling IAM the most." Change the service and it's usable generically.
SELECT
COUNT(*) AS apiCount,
eventName,
recipientAccountId,
userIdentity.principalId
FROM $EDS_ID
WHERE userIdentity.principalId IS NOT NULL
AND eventSource = 'iam.amazonaws.com'
AND eventTime >= '2026-06-01 00:00:00'
AND eventTime < '2026-06-27 00:00:00'
GROUP BY eventName, recipientAccountId, userIdentity.principalId
ORDER BY apiCount DESC
LIMIT 50;
3-2. Detecting AccessDenied spikes by principal
A surge of authorization errors is either a misconfiguration or a sign of an attacker probing permissions. Aggregate who is being denied on which actions.
SELECT
userIdentity.arn AS principal,
eventName,
eventSource,
COUNT(*) AS deniedCount
FROM $EDS_ID
WHERE (errorCode = 'AccessDenied' OR errorMessage = 'Access Denied')
AND eventTime >= '2026-06-20 00:00:00'
AND eventTime < '2026-06-27 00:00:00'
GROUP BY userIdentity.arn, eventName, eventSource
ORDER BY deniedCount DESC
LIMIT 100;
Why look at both
errorCodeanderrorMessage: depending on the service, it may be recorded asAccessDenied(errorCode) or asAccess Denied(errorMessage). The official query-generation example also picks up both with an OR. I leave the forensic-perspective incident-response framework to the threat-detection / incident-response guide. This article's queries lean strictly toward "analysis / aggregation."
3-3. Surfacing root-user usage
Root-user operations are themselves the most important audit signal. Extract them with userIdentity.type = 'Root'.
SELECT
eventTime,
eventName,
eventSource,
sourceIPAddress,
awsRegion,
userIdentity.arn
FROM $EDS_ID
WHERE userIdentity.type = 'Root'
AND eventTime >= '2026-06-01 00:00:00'
AND eventTime < '2026-06-27 00:00:00'
ORDER BY eventTime DESC;
3-4. Console logins without MFA
One of the most important security queries, straight from the official sample. Pick up sign-in events where MFAUsed is No.
SELECT
eventTime,
userIdentity.arn,
sourceIPAddress,
awsRegion,
element_at(additionalEventData, 'MFAUsed') AS mfaUsed
FROM $EDS_ID
WHERE eventSource = 'signin.amazonaws.com'
AND eventName = 'ConsoleLogin'
AND element_at(additionalEventData, 'MFAUsed') = 'No'
AND eventTime >= '2026-06-01 00:00:00'
AND eventTime < '2026-06-27 00:00:00'
ORDER BY eventTime DESC;
additionalEventDatais a map type, so you extract the key value withelement_at(additionalEventData, 'MFAUsed'). This is faithful to the field path of the official sampleconsole-login-with-no-mfa.sql.
3-5. Cross-account access (from another account to your resources)
Events where "the caller's account ID (userIdentity.accountId) and the recipient's account ID (recipientAccountId) don't match" are cross-account access. Use this to detect unintended external access.
SELECT
eventTime,
userIdentity.principalId,
eventName,
eventSource,
userIdentity.accountId AS callerAccount,
recipientAccountId AS resourceAccount
FROM $EDS_ID
WHERE userIdentity.accountId != recipientAccountId
AND eventTime >= '2026-06-01 00:00:00'
AND eventTime < '2026-06-27 00:00:00'
ORDER BY eventTime DESC;
3-6. A timeline of IAM policy changes
Permission changes are the core of auditing. Lay out change-type events like PutRolePolicy chronologically, together with the target role name and policy document. The point is to extract values from the map type requestParameters with element_at.
SELECT
eventTime,
recipientAccountId,
eventName,
userIdentity.arn,
element_at(requestParameters, 'roleName') AS roleName,
element_at(requestParameters, 'policyDocument') AS policyDocument
FROM $EDS_ID
WHERE eventName IN ('PutRolePolicy', 'AttachRolePolicy', 'PutUserPolicy', 'CreatePolicyVersion')
AND eventTime >= '2026-06-01 00:00:00'
AND eventTime < '2026-06-27 00:00:00'
ORDER BY eventTime DESC;
3-7. Daily read/write trend
Split read-type and write-type with the readOnly flag and aggregate daily. This becomes a starting point for finding abnormal peaks in activity. A CASE aggregation faithful to the official query-generation example.
SELECT
date(eventTime) AS eventDate,
SUM(CASE WHEN readOnly = true THEN 1 ELSE 0 END) AS readEvents,
SUM(CASE WHEN readOnly = false THEN 1 ELSE 0 END) AS writeEvents
FROM $EDS_ID
WHERE eventTime >= '2026-06-01 00:00:00'
AND eventTime < '2026-06-27 00:00:00'
GROUP BY date(eventTime)
ORDER BY eventDate ASC;
3-8. Activity by source IP
Find concentrated access from a suspicious single IP. Output the count and the number of unique principals by sourceIPAddress.
SELECT
sourceIPAddress,
COUNT(*) AS eventCount,
COUNT(DISTINCT userIdentity.arn) AS distinctPrincipals
FROM $EDS_ID
WHERE eventTime >= '2026-06-20 00:00:00'
AND eventTime < '2026-06-27 00:00:00'
GROUP BY sourceIPAddress
ORDER BY eventCount DESC
LIMIT 50;
3-9. JOIN across multiple EDS (e.g. matching the production EDS and the audit EDS)
Lake supports advanced queries that span multiple EDS (multi-table query support). The supported JOIN-type operators are UNION / UNION ALL / EXCEPT / INTERSECT / LEFT JOIN / RIGHT JOIN / INNER JOIN.
-- 2つのEDS(本番=edsA, 監査=edsB)を eventId で突き合わせる
SELECT
edsA.eventName AS prodEvent,
edsB.eventName AS auditEvent,
edsA.eventTime
FROM eds1_id AS edsA
LEFT JOIN eds2_id AS edsB
ON edsA.eventId = edsB.eventId
WHERE edsA.eventTime >= '2026-06-01 00:00:00'
AND edsA.eventTime < '2026-06-27 00:00:00'
ORDER BY edsA.eventTime DESC;
-- 3つのEDSのイベントを縦に連結して横断的に並べる
SELECT eventId, eventName FROM eds1_id
UNION
SELECT eventId, eventName FROM eds2_id
UNION ALL
SELECT eventId, eventName FROM eds3_id
ORDER BY eventId
LIMIT 100;
Query results are retained for up to 7 days and, if needed, can be saved to an S3 bucket (the CLI's
start-query --delivery-s3uri). Queries running for a long time (over an hour) may time out, but even then you can retrieve the processed partial results. That's exactly why narrowing the range witheventTimepays off on both cost and completion-rate fronts.
4. Dashboards and generative AI: visualization and "analysis without writing SQL"
If only people who can write SQL can analyze, the org's auditing doesn't run. Lake has dashboards and generative-AI assistance.
4-1. Three kinds of dashboards
| Kind | Content | Editing | Update |
|---|---|---|---|
| Managed dashboards | 14 ready-to-use dashboards provided by CloudTrail | Not editable (but can be saved as custom) | — |
| Custom dashboards | You build them yourself. Up to 10 widgets, each widget = one SQL query | Editable | Manual or scheduled update |
| Highlights dashboard | A managed "at-a-glance" summary. Updates every 6 hours and shows the most recent 24 hours | — | Automatic every 6 hours |
The point is that the substance of each widget is one SQL query. That is, the Section 3 cookbook can be diverted directly into custom-dashboard widgets. Build one audit dashboard lining up "logins without MFA," "root usage," and "AccessDenied spikes," and daily checks are done without writing SQL each time.
4-2. Natural-language query generation is "GA," result summarization is "preview"
This is easy to confuse, so let me clearly separate the maturity.
| Feature | Maturity | Overview | Constraints / Regions |
|---|---|---|---|
| Query generation (natural language → SQL) | GA (general availability) | Generates executable SQL from an English prompt. The CLI is generate-query | Requires IAM cloudtrail:GenerateQuery, English only, 3–500 characters, generation itself is free, about 7 supported Regions |
| Query result summarization (result → natural language) | preview (subject to change) | Summarizes execution results in English natural language | Available Regions are Tokyo, N. Virginia, Oregon |
Query generation (GA) builds SQL from an English question like "what were the top errors last month?" even if you're not well-versed in SQL or CloudTrail fields. In the CLI you use it like this.
# 英語プロンプトからSQLを生成(生成は無料。実行時のみスキャン課金)
aws cloudtrail generate-query \
--event-data-stores "arn:aws:cloudtrail:ap-northeast-1:123456789012:eventdatastore/EXAMPLE-eds-id" \
--prompt "Show me all console login events for the past week"
# 返ってきた QueryAlias でそのまま実行
aws cloudtrail start-query --query-alias AWSCloudTrail-EXAMPLE-UUID
A generation example (official) goes like this. For the prompt "Show any events with access denied errors for the past three weeks," it returns SQL that includes errorCode = 'AccessDenied' and an eventTime range.
Cautions (official): don't include personal or confidential information in the prompt. This is generative AI (an LLM), and it's recommended that the returned SQL always be checked by a person before execution. AI is strictly a tool to "speed up the draft," and the final check is done by a person — the same as the "fast with AI, verification by humans" principle I've held to on the payment platform. Remove
cloudtrail:GenerateQueryfrom the IAM policy and you can explicitly opt out of the feature.
5. Lake vs. Athena thorough comparison: how do new builds ultimately choose?
This is the main event for those about to build. Since Lake can't be chosen for new builds, the real options become a choice between "S3 + Athena (self-managed)" and "CloudWatch (the managed Lake successor)." Let's pin down the decision axes in a table.
5-1. Decision matrix
| Aspect | CloudTrail Lake (existing customers only) | S3 + Athena | CloudWatch (the officially recommended migration destination) |
|---|---|---|---|
| Usable for new builds | No (ended 2026/5/31) | Yes | Yes |
| Degree of managedness | High (auto ingestion, conversion, retention) | Low (self-manage tables, partitions, lifecycle) | High |
| Immutability / tamper resistance | The EDS is immutable | Separately ensured via S3 Object Lock, etc. | Inherits CloudTrail safety features |
| Query engine | Trino (all SELECT) | Trino-family (Athena/Presto) | Logs QL / SQL / PPL (OpenSearch) |
| Immediacy of cross-cutting analysis | Instant SQL (ready once ingested) | Table DDL definition required | Analyze after ingestion |
| Scan cost | $0.005/GB | $5/TB (≒ $0.005/GB) + reduce the range read with partition projection | OpenSearch-analysis pricing model |
| Built-in dashboards | 14 managed + custom + Highlights | Self-built (QuickSight, etc.) | OpenSearch / built-in analysis |
| Initial setup | Light (just create an EDS) | Medium (S3 delivery + Glue catalog + DDL) | Medium (pipeline configuration) |
| OCSF/OTel, third-party integration | Limited | Self-built | Native support, abundant connectors |
5-2. Recommendations by scenario
- "S3 trail delivery already exists. I want to minimize cost. I can write SQL myself." → S3 + Athena. You can analyze with no additional ingestion charges, and the greatest weapon is being able to physically reduce scan volume with partition projection. Small-to-medium-scale audit analysis is fully covered by this.
- "I want to centralize security, operations, and compliance logs. I want OCSF normalization, OpenSearch-native analysis, and third-party connectors." → CloudWatch. The path the official docs push first as Lake's successor.
- "I have an existing Lake EDS and want to keep using data from 2023 onward." → continue with Lake for now, while pilot-migrating to CloudWatch as needed. Keep data from before 2023 in Lake or move it to S3.
5-3. Implementation guidelines when choosing S3 + Athena (key points only)
There are three crux points when choosing Athena for a new build.
- Use partition projection to narrow the range read with a
region/year/month/dayhierarchy. This is the lever that makes scan volume = cost matter the most. - Place table definitions in the Glue Data Catalog and manage the schema centrally.
- Send old partitions to Glacier with a lifecycle, and separate storage cost from analysis targets.
The complete version of the DDL for Athena tables (a
CREATE EXTERNAL TABLEwith partition projection for CloudTrail logs) is posted in the CloudTrail audit-logging, governance, and security complete guide. Duplicating the full text in this article would double maintenance, so the DDL is unified in the pillar. The details of cost calculation for Lake and Athena (the actual $/TB scanned, the effect of lifecycle) are left to the pricing & cost-optimization guide. This section's role is the decision of "which to choose."
6. External sources and federation: widening Lake's boundary
Lake isn't only for inside AWS. And Lake and Athena can be bridged.
6-1. Ingesting activity from outside AWS (PutAuditEvents)
You can ingest activity events from on-premises, SaaS, and hybrid environments into Lake through the PutAuditEvents API and "channels." Furthermore, more than a dozen partner integrations (direct integrations / solution integrations) are provided, letting you cross-analyze AWS audit logs and non-AWS logs with the same EDS and the same Trino SQL. Being able to follow "AWS operations" and "external SaaS operations" on one timeline is extremely powerful in incident investigation.
6-2. Federate the EDS to the Glue Data Catalog → query in Athena
This is the feature that connects Lake and Athena. Federate an EDS to the AWS Glue Data Catalog and the metadata is registered in Glue, letting you query the same event data with SQL from Amazon Athena via Lake Formation.
Let me precisely grasp the official behavior (federation).
- When you enable federation, CloudTrail creates in the Glue Data Catalog a managed database named
aws:cloudtrail(if none exists) and a managed federated table named after the EDS's ID. - Lake Formation handles fine-grained access control of the federated resource. Delete the federation role or revoke permissions in Lake Formation, and you can no longer query from Athena.
- A user querying this data in Athena must be allowed
lakeformation:GetDataAccessin IAM (included in theAmazonAthenaFullAccessmanaged policy). - Federation itself incurs no CloudTrail charge. What is incurred is only the Athena query charge.
- Even if you disable federation, none of Lake's data is deleted. You just can't query from Athena anymore; in Lake you can keep querying.
- Caution: an EDS with federation enabled can't be deleted. To delete it, you first need to disable federation (and termination protection, if enabled).
The core of the minimal permissions needed by the federation role is read access to the EDS data (and decryption permission too if KMS-encrypted).
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "LakeFederationEDSDataAccess",
"Effect": "Allow",
"Action": "cloudtrail:GetEventDataStoreData",
"Resource": "arn:aws:cloudtrail:ap-northeast-1:123456789012:eventdatastore/EXAMPLE-eds-id"
},
{
"Sid": "LakeFederationKMSDecryptAccess",
"Effect": "Allow",
"Action": ["kms:Decrypt", "kms:GenerateDataKey"],
"Resource": "arn:aws:kms:ap-northeast-1:123456789012:key/EXAMPLE-key-id"
}
]
}
A hint on how to use them: Lake's native Trino queries are for "the instant, dashboards-included," while Athena via federation is for "when you want to JOIN with other Athena tables (VPC Flow Logs, ALB logs, etc.)." Federation can also be used as a "migration gradient" from Lake to Athena — you can take a migration design that first creates a state where both can query the same data, then gradually shifts analysis to the Athena side.
7. Summary: a Lake / Athena / CloudWatch usage cheat sheet
Finally, let me close with a quick-reference table usable for decisions starting tomorrow.
| Your question | Answer |
|---|---|
| I'm about to build audit-log analysis. Can I use Lake? | No (new-customer onboarding ended 2026/5/31) |
| For a new build, I want the cheapest, self-built | S3 + Athena (reduce scans with partition projection) |
| For a new build, I want managed / OCSF / OpenSearch | CloudWatch (the officially recommended migration destination) |
| I have an existing Lake EDS | OK to continue using. Use the Section 3 cookbook to the fullest |
| What to do with existing Lake going forward | Verify CloudWatch with a pilot migration. Keep data from before 2023 in Lake/S3 |
| Planning to add a new account to the org (existing customer) | Check right now whether you have an organization EDS (without it, you can't onboard new accounts) |
| I want to JOIN Lake data with other Athena tables | Federate to Glue → query in Athena |
| I want members who can't write SQL to analyze too | Natural-language query generation (GA) + 14 managed dashboards |
| I want to aggregate >90 days, cross-attribute, with SQL (the original motive) | Not possible with event history. Lake / Athena / CloudWatch — one of them — is required |
The end of CloudTrail Lake's new-customer onboarding is not "the end," but a good opportunity to rethink the design of audit-log analysis. The essence never changes — changing the structure of audit logs from "data you merely accumulate" into "an asset you can query cross-sectionally with SQL." Whether with Lake, Athena, or CloudWatch, as long as you don't drop that single point, you can do an incident's first-pass triage and an answer to an audit fast and accurately by structure, not by gut.
On a serverless payment platform, I ensured the "correctness" of reliability and cost with the structure of code and maintained zero double charges in production. An audit-log platform can be designed with the same philosophy — by building in "an analyzable structure" from the start, it becomes a log platform strong in emergencies and cheap in peacetime.
For those who want to redesign audit logs into an "analyzable asset." From mastering an existing Lake, to migration design to Athena/CloudWatch, to aggregating multi-account auditing, to a design that holds cost down with structure, I accompany you, going deep into the on-the-ground implementation. Feel free to get in touch.
Primary sources (official documentation)
- CloudTrail Lake availability change (new-customer onboarding end / migration)
- Working with AWS CloudTrail Lake
- CloudTrail Lake SQL constraints (Trino / JOIN / constraints)
- Create queries from natural language prompts (query generation: GA)
- Federate an event data store (Glue/Athena integration)
- CloudTrail Lake sample queries (GitHub: aws-samples)
- AWS CloudTrail Pricing