"The files users upload — who guarantees they're not malware?" — this is the first question I throw at a place where I'm consulted about the security of a SaaS with an upload feature.
Usually what comes back is silence, or "we reject by extension," "we check the MIME on the front." But that's a name-tag check at the entrance, not a scan of the contents. An attacker can fake both the extension and the Content-Type. An uploaded PDF is actually an executable, and another user downloads and opens it — then your app becomes a distribution route for malware. And what's frightening is that it happens through the legitimate user flow.
This article is an implementation guide for designing and implementing, at production quality, a mechanism with GuardDuty Malware Protection for S3 of "auto-malware-scanning objects uploaded to S3 and flowing only those confirmed clean downstream." As the subject matter, I'll weave in my experience implementing IAM, observability, and DR across a serverless payment platform on multi-account AWS, and ensuring with idempotency, on a platform handling actual money, that "the same event received twice has a one-time side effect" — that idea is exactly the same as the design of safely handling at-least-once scan-result events.
The rule of this article: The specs, tag values, limits, pricing, and EventBridge event structure are based on the AWS official documentation (as of June 2026). Because limits, pricing, and supported Regions get revised, always confirm the latest values (the quotas / pricing pages) officially before going to production. And one more — GuardDuty Malware Protection for S3 is not a "full AV / EDR." It's one layer that detects known and some unknown malware with a scan engine, and it doesn't replace input validation, least-privilege IAM, encryption, or WAF. Start the automated processing that receives a scan result with ones that satisfy "idempotent, scope-narrowed, reversible."
0. Mental model: this is "a quarantine for uploads," not "a constant surveillance camera"
Before starting the design, let me separate, in one line, the two easily-confused S3 security features. Without fixing this first, the requirement and the feature pass each other by.
S3 Protection = monitor "access (API operations)" to S3 and detect suspicious behavior (CloudTrail data events). Malware Protection for S3 = malware-scan the "contents of files" uploaded to S3. The former is "who did what," the latter is "what came in."
From here, three consequences emerge. These are the foundation of the design decisions.
- What they look at differs. S3 Protection, included in GuardDuty threat detection, analyzes CloudTrail's S3 data events (
GetObject/PutObject/ListObjects/DeleteObject, etc.) and detects "suspicious access or data exfiltration using legitimate credentials." On the other hand, Malware Protection for S3 downloads the contents of a newly uploaded object and malware-scans it. The former looks at behavior, the latter at contents — different things (I'll settle it with a table in Section 1). - Malware Protection for S3 works without GuardDuty itself. This is the biggest feature of this article's lead feature. Without enabling the GuardDuty service, you can use just this feature standalone (independent). But in standalone mode, the account has no detector ID, so even if it detects malware no GuardDuty finding is generated. The result appears only on EventBridge's default bus, CloudWatch, and (optionally) object tags (Section 2).
- Detection alone isn't safe. You need the plumbing of "quarantine → isolate → promote." Even if you scan and learn "it's malware," it's meaningless if that contaminated object is still in a place readable downstream. The climax of this article is the plumbing that physically makes a contaminated object unreadable downstream and flows only clean ones (Section 5's secure upload pipeline + TBAC).
Grasp these three points and you'll see what to do is the three of "① don't mix up the 2 features → ② correctly enable the protection plan (standalone or with GuardDuty) → ③ turn scan results into 'quarantine / promote' with idempotent plumbing." Let's build them in order.
1. The settling table to not mix them up: S3 Protection vs. Malware Protection for S3
Because the names are similar, the most common accident in the field is "a mismatch between what you want to do and the feature you enabled." First, let me settle it head-on.
| Aspect | S3 Protection | Malware Protection for S3 |
|---|---|---|
| What it looks at | API access to S3 (CloudTrail data events) | The contents of an uploaded object (malware) |
| Threat it detects | Suspicious access, data exfiltration / destruction (misuse of leaked credentials, etc.) | Malware contained in an uploaded file |
| Trigger | Operations like GetObject / PutObject / ListObjects / DeleteObject | A new upload of an object (or a new version) |
| GuardDuty itself | Required (a feature of the protection plan) | OK even without it (standalone-operable) |
| Feature name / resource | feature S3_DATA_EVENTS (a detector feature) | Malware Protection plan (a per-bucket resource) |
| Result output | A GuardDuty finding | A finding (when with GuardDuty) / EventBridge + CloudWatch + tags |
| Billing unit | S3 data-event volume | Scanned GB + objects evaluated |
Borrowing the official wording, S3 Protection "helps you detect potential security risks for data, such as data exfiltration and destruction" — that is, detecting threats to data (exfiltration / destruction). Malware Protection for S3 "helps you detect potential presence of malware by scanning newly uploaded objects" — malware-scanning new uploads.
A design guideline: these two don't compete; they complement. For a bucket that "accepts uploads" and "holds sensitive data," it's ideal to enable both. Malware Protection for S3 watches "the bad things coming in," and S3 Protection watches "exfiltration mixed into legitimate access." Note that S3 Protection is a GuardDuty protection plan, so for how to enable it, see the
S3_DATA_EVENTSfeature in the pillar article. From here on, this article narrows to Malware Protection for S3.
Beware a third, even more confusing existence: there's another feature, "Malware Protection for EC2." This agentlessly scans the EBS volumes attached to EC2/containers, and its target is completely different from this article's for S3. Talk with just "Malware Protection" and the three get crossed, so always say for S3 / for EC2 to prevent accidents.
2. How Malware Protection for S3 works: the decisive difference between standalone and "with GuardDuty"
2.1 The scan-on-upload model
The mechanism is simple. When an object is newly uploaded to a bucket with protection enabled (officially a "protected bucket") (or a new version of an existing object is uploaded), GuardDuty automatically starts a malware scan.
What triggers the scan is S3's Object Created-family events — PutObject / POST Object / CopyObject / CompleteMultipartUpload. GuardDuty downloads the target object via AWS PrivateLink, and decrypts, reads, and scans it in an isolated environment (an internet-disconnected VPC) in the same Region. The temporary copy during the scan is KMS-encrypted, and the downloaded copy is deleted after the scan completes. That is, your data never leaves for the scan, and only the result metadata remains.
2.2 The two enablement approaches — the core of this article
Malware Protection for S3 has two ways to enable it. This difference divides the operational design.
| (a) Use it with GuardDuty | (b) Use it standalone (independent) | |
|---|---|---|
| The GuardDuty service | Enabled (a detector ID exists) | OK disabled (no detector ID) |
| A finding on malware detection | A GuardDuty finding is generated | No finding is generated |
| Receiving the result | A finding (+ export to S3/EventBridge) | Only the EventBridge default bus + CloudWatch + (optional) object tags |
| Correlation with existing GuardDuty detections | Possible (lines up with ETD and other detections) | Not possible (an isolated single feature) |
| Suited case | You already operate GuardDuty company-wide | A minimal configuration of "just want to scan uploads" |
Accurately grasp the decisive property of standalone mode the official docs write clearly:
"When you enable Malware Protection for S3 independently in an account, that account will not have an associated detector ID. ... when an S3 malware scan detects the presence of malware, no GuardDuty finding will get generated in your AWS account because all GuardDuty findings are associated with a detector ID."
That is — in standalone mode, "even if it finds malware, nothing appears on the GuardDuty dashboard." This is not a defect but a design. Because a GuardDuty finding is tied to a detector ID, no detector means no finding. Instead, the result appears on the EventBridge default event bus, CloudWatch metrics, and, if enabled, object tags.
So in a standalone-mode design, EventBridge and tags are "the only exit for detection." Build alerts on the premise that "a finding will come," and in standalone mode the alerts never ring. The pipeline described later is built around this EventBridge event.
2.3 Trade-off: which to choose
- If you already operate GuardDuty in your organization → (a) with GuardDuty. It rides naturally onto the existing incident-response plumbing (EventBridge → automated response) as a finding. The value of malware detection lining up on the same playing field as other threat signals is large.
- If "I want to scan just this upload bucket" / "I don't need all of GuardDuty right now" → (b) standalone. You can introduce the feature pinpoint, at minimal cost and least privilege. Enable GuardDuty itself later and findings will appear too.
My recommendation: first start small in standalone mode, receive scan results with EventBridge, and build the plumbing. The company-wide rollout of GuardDuty itself is a big decision in its own right, so don't take "upload quarantine," a single requirement, hostage to it. Even if you later promote to (a), the plumbing built around EventBridge can be reused as-is (the finding route is just added).
3. The enablement components: the Malware Protection plan, IAM role, prefixes, limits
3.1 What you can and can't do (fix the constraints first)
- Only your own account's buckets. Even a delegated GuardDuty administrator can't enable it on a member account's bucket (it's closed within the same account).
- Same Region only. A cross-Region bucket is out of scope.
- A per-bucket "Malware Protection plan" resource is created, with a unique plan ID. GuardDuty auto-creates and manages an EventBridge managed rule named
DO-NOT-DELETE-AmazonGuardDutyMalwareProtectionS3*(don't delete it by hand). - You can scope by prefix. Rather than the whole bucket, you can target only specific object prefixes (up to 5) for scanning. Effective for a design of "the upload receiver is only under
uploads/." - Supports KMS-encrypted buckets (decrypted inside the scan environment). But objects with SSE-C (customer-provided keys) can't be scanned (the later
ACCESS_DENIEDreasonSSE_C_ENCRYPTED_OBJECT).
3.2 The IAM role: least privilege making GuardDuty act "on your behalf"
Malware Protection for S3 requires an IAM role for GuardDuty to run scans in your account. The permissions this role needs are roughly the next 3 categories:
- Receive notification of new uploads (via the EventBridge managed rule)
- Read and decrypt the target object (
s3:GetObject+kms:Decryptif needed) - (Optional) tag after the scan (
s3:PutObjectTagging)
The role's trust policy lets the GuardDuty Malware Protection service principal sts:AssumeRole. It's safe to follow the official IAM policy template, and the recommended operation is to add target bucket names to the same role when adding buckets. Per the principle of least privilege, narrow the Resource to "only this bucket, this prefix" (the same shape as the least-privilege thinking at the data layer).
3.3 Limits (the exact official values, verify the latest)
| Limit | Default value | Adjustable | Note |
|---|---|---|---|
| Max S3 object size | 100 GB | No | If you need a larger target, consult AWS Support |
| Extracted file count | 100,000 | No | The max number of files expandable/analyzable in an archive |
| Max nesting depth | 100 | No | The max levels of archive nesting |
| Max protected buckets | 25 | No | Per account × Region |
Exceed these and the scan is skipped, and the result becomes UNSUPPORTED (example reasons: OBJECT_SIZE_LIMIT_EXCEEDED / EXTRACTED_FILE_LIMIT_EXCEEDED / EXTRACTED_LEVEL_LIMIT_EXCEEDED / EXTRACTION_RATIO_LIMIT_EXCEEDED). Always handle the point that "couldn't scan ≠ safe" in the later plumbing (Section 5).
3.4 On-demand scanning: for existing objects / re-scanning
Automatic scanning runs against new uploads, but for objects that existed before you enabled protection or to re-scan something already scanned, use on-demand scanning.
# 既存オブジェクト(最新バージョン)をオンデマンドでスキャン。
# 事前条件: 対象バケットで Malware Protection for S3 が有効 + 呼び出し元に
# AWS マネージドポリシー AmazonGuardDutyFullAccess_v2 が付与されていること。
aws guardduty send-object-malware-scan \
--s3-object '{"Bucket": "my-upload-landing", "Key": "uploads/legacy-file.pdf"}'
# 特定バージョンを指定してスキャンする場合は VersionId を渡す。
aws guardduty send-object-malware-scan \
--s3-object '{"Bucket": "my-upload-landing", "Key": "uploads/legacy-file.pdf", "VersionId": "d41d8cd9...EXAMPLE"}'
Cautions: on-demand scanning overrides the plan's prefix setting (you can target outside the prefix), and the limits and pricing apply the same as automatic scanning. And important — on-demand is not in the free tier. "A success response ≠ scan complete" but only accepted, so always confirm the result with EventBridge / tags / CloudWatch.
4. Reading the scan result: tags, status values, EventBridge events
To build automated processing, you need to accurately read the structure of the result. There are 3 exits — object tags, EventBridge events, and CloudWatch metrics.
4.1 Scan-result tags (optional, enabling them is mandatory "before upload")
Enable tagging and after the scan GuardDuty attaches a predefined tag to the object. The key and value are fixed officially:
Key: GuardDutyMalwareScanStatus
Value: NO_THREATS_FOUND | THREATS_FOUND | UNSUPPORTED | ACCESS_DENIED | FAILED
| Result value | Meaning | Scan status |
|---|---|---|
NO_THREATS_FOUND | No threats detected | Completed |
THREATS_FOUND | A threat detected | Completed |
UNSUPPORTED | Unscannable (password-protected, size/compression-ratio exceeded, unsupported S3 feature, etc.) | Skipped |
ACCESS_DENIED | Can't access the object (IAM role permissions, SSE-C, etc.) | Skipped |
FAILED | Couldn't scan due to an internal error | Failed |
A fatal pitfall: unless tagging is enabled "before" the object is uploaded, enabling it later won't tag that object. So the iron rule is the order "create the bucket → enable protection + tagging → then start accepting uploads." Also, the max tags attachable to an object is 10, and if the slots are full GuardDuty can't tag it, and instead a "post-scan tag failure" event appears on EventBridge.
4.2 The EventBridge scan-result event (the lead of automated processing)
GuardDuty always publishes the scan result to the default EventBridge event bus (both standalone and with GuardDuty). This is the entrance to automated processing. The detail-type is GuardDuty Malware Protection Object Scan Result, and the source is aws.guardduty.
The NO_THREATS_FOUND event (official schema, excerpt):
{
"detail-type": "GuardDuty Malware Protection Object Scan Result",
"source": "aws.guardduty",
"account": "111122223333",
"region": "us-east-1",
"resources": ["arn:aws:guardduty:us-east-1:111122223333:malware-protection-plan/b4c7f464ab3a4EXAMPLE"],
"detail": {
"schemaVersion": "1.0",
"scanStatus": "COMPLETED",
"resourceType": "S3_OBJECT",
"s3ObjectDetails": {
"bucketName": "amzn-s3-demo-bucket",
"objectKey": "uploads/report.pdf",
"eTag": "ASIAI44QH8DHBEXAMPLE",
"versionId": "d41d8cd98f00b204e9800998eEXAMPLE",
"s3Throttled": false
},
"scanResultDetails": {
"scanResultStatus": "NO_THREATS_FOUND",
"threats": null,
"statusReasons": null
}
}
}
For THREATS_FOUND, scanResultDetails.threats holds the detection name (by default it reports the first detected one, and scanStatus is COMPLETED):
{
"detail": {
"scanStatus": "COMPLETED",
"s3ObjectDetails": { "bucketName": "amzn-s3-demo-bucket", "objectKey": "uploads/evil.bin", "versionId": "..." },
"scanResultDetails": {
"scanResultStatus": "THREATS_FOUND",
"threats": [ { "name": "EICAR-Test-File (not a virus)" } ],
"statusReasons": null
}
}
}
When the scan was skipped, it's scanStatus: "SKIPPED", with scanResultStatus being UNSUPPORTED or ACCESS_DENIED, and further statusReasons holding the concrete reason (PASSWORD_PROTECTED, SSE_C_ENCRYPTED_OBJECT, OBJECT_SIZE_LIMIT_EXCEEDED, etc.).
At-least-once (must read): the official docs state clearly — "GuardDuty uses at-least-once delivery, which means you might receive multiple scan results for the same object. We recommend designing your applications to handle duplicate results." That is, the scan result of the same object can arrive multiple times. With the same thinking as preventing double charges on a payment platform, the result handler must be idempotent (Section 5's Lambda builds that out). Note that billing is once per object even with duplicates.
4.3 A note on the status model: status (scanStatus) and result (scanResultStatus) are different
Let me make it explicit because it's easy to confuse. scanStatus is "the scan's status" (COMPLETED / SKIPPED / FAILED), and scanResultStatus is "the result" (the 5 values above). Even with COMPLETED, the result is one of THREATS_FOUND or NO_THREATS_FOUND. "SKIPPED / FAILED / UNSUPPORTED / ACCESS_DENIED do not mean 'safe'" — it just couldn't be scanned. The iron rule of design is "explicitly allow (allowlist) only NO_THREATS_FOUND, and fall everything else to the isolation side." This becomes the core of the next chapter's plumbing.
5. A high-value application: a secure upload pipeline (landing → clean / quarantine)
This is the climax of this article. Scanning alone isn't safe. "Keep an object that might be contaminated in a state downstream absolutely can't read, and flow only those confirmed clean" — let's build that plumbing.
5.1 The big picture: 3 buckets + event-driven promotion
ユーザー
│ アップロード(署名付き URL など)
▼
landing バケット(Malware Protection for S3 有効・タグ付け ON)
│ ・下流は読めない(TBAC: 清浄タグが無いオブジェクトの GetObject を DENY)
│ ・GuardDuty が自動スキャン
▼
EventBridge(detail-type = "GuardDuty Malware Protection Object Scan Result")
▼
スキャン結果 Lambda(冪等)
├─ NO_THREATS_FOUND → clean バケットへ「昇格」(コピー)。下流はここだけ読む
├─ THREATS_FOUND → quarantine バケットへ隔離 + セキュリティ通知(人間へ)
└─ それ以外(UNSUPPORTED/ACCESS_DENIED/FAILED/SKIPPED) → quarantine + 要調査通知
(「スキャン不可 ≠ 安全」)
下流の消費者
└─ clean バケットだけを読む(landing は TBAC で物理的に読めない)
The crux of the design is a 2-stage defense:
- EventBridge-driven promotion — move only clean ones to
clean(downstream sees onlyclean). - TBAC (tag-based access control) — even if someone tries to read
landing, DENY theGetObjectof objects without theNO_THREATS_FOUNDtag with the S3 bucket policy. Because it's sealed at the storage-layer boundary, not app logic, a code bug can't break it (the same idea as making least privilege effective at the data layer).
5.2 The TBAC bucket policy: don't let it be read without the clean tag
A policy for the landing bucket of "can't be read unless NO_THREATS_FOUND," following the official template. Replace {{...}} with your values.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "NoReadUnlessClean",
"Effect": "Deny",
"NotPrincipal": {
"AWS": [
"arn:aws:sts::555555555555:assumed-role/IAM-role-name/GuardDutyMalwareProtection",
"arn:aws:iam::555555555555:role/IAM-role-name"
]
},
"Action": ["s3:GetObject", "s3:GetObjectVersion"],
"Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*",
"Condition": {
"StringNotEquals": {
"s3:ExistingObjectTag/GuardDutyMalwareScanStatus": "NO_THREATS_FOUND"
}
}
},
{
"Sid": "OnlyGuardDutyCanTagScanStatus",
"Effect": "Deny",
"NotPrincipal": {
"AWS": [
"arn:aws:sts::555555555555:assumed-role/IAM-role-name/GuardDutyMalwareProtection",
"arn:aws:iam::555555555555:role/IAM-role-name"
]
},
"Action": "s3:PutObjectTagging",
"Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*",
"Condition": {
"ForAnyValue:StringEquals": {
"s3:RequestObjectTagKeys": "GuardDutyMalwareScanStatus"
}
}
}
]
}
Reading this policy:
NoReadUnlessClean: DENY the read of objects whoses3:ExistingObjectTag/GuardDutyMalwareScanStatusis notNO_THREATS_FOUND. An object that isn't tagged yet (= scan not complete) is naturally notNO_THREATS_FOUNDeither, so it can't be read. It guarantees, at the storage layer, "no one can read until the scan finishes and it's confirmed clean."- Exclude only the GuardDuty role with
NotPrincipal: exclude the scan-execution role (and the.../GuardDutyMalwareProtectionGuardDuty assumes) for reading and tagging. OnlyGuardDutyCanTagScanStatus: a DENY making only GuardDuty able to attach theGuardDutyMalwareScanStatustag. Without this, someone could manually attachNO_THREATS_FOUNDto a contaminated object and slip past the gate.
Additional defense in an organization: if you use AWS Organizations, enforce company-wide with an SCP that "the
GuardDutyMalwareScanStatustag can't be tampered with" (the official docs guide you to use the EC2 example replaced withs3). If a tag becomes the basis of trust, preventing tag tampering is the prerequisite. Skip this and TBAC becomes "an unlocked safe."
5.3 The scan-result Lambda (Python, idempotent)
A Lambda that receives the EventBridge scan result and promotes only NO_THREATS_FOUND to clean, and isolates everything else to quarantine. Made idempotent on the premise of at-least-once delivery.
"""GuardDuty Malware Protection for S3 のスキャン結果に応答する Lambda。
設計原則:
- 冪等: EventBridge は at-least-once。同じオブジェクトの結果を2回受けても副作用は1回分。
- allowlist で安全側に倒す: 'NO_THREATS_FOUND' のときだけ昇格。それ以外は全て隔離。
(UNSUPPORTED/ACCESS_DENIED/FAILED/SKIPPED は『スキャン不可』であって『安全』ではない)
- 取り消し可能: landing からは消さずコピーで昇格/隔離。誤判定でも原本が残る。
- 可観測: 構造化ログ。脅威名は通知に載せるが、オブジェクトの中身は読まない・出さない。
"""
from __future__ import annotations
import json
import logging
import os
from typing import Any, Final
from urllib.parse import unquote_plus
import boto3
from botocore.exceptions import ClientError
logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client("s3")
sns = boto3.client("sns")
CLEAN_BUCKET: Final[str] = os.environ["CLEAN_BUCKET"]
QUARANTINE_BUCKET: Final[str] = os.environ["QUARANTINE_BUCKET"]
ALERT_TOPIC_ARN: Final[str] = os.environ["ALERT_TOPIC_ARN"]
# 昇格を許す唯一の結果値。これ以外は全部隔離側へ倒す(fail-closed)。
CLEAN_STATUS: Final[str] = "NO_THREATS_FOUND"
def handler(event: dict[str, Any], _context: object) -> dict[str, str]:
detail = event["detail"]
obj = detail["s3ObjectDetails"]
src_bucket: str = obj["bucketName"]
# S3 のイベントキーは URL エンコードされ得るのでデコードする。
key: str = unquote_plus(obj["objectKey"])
version_id: str | None = obj.get("versionId")
result: str = detail.get("scanResultDetails", {}).get("scanResultStatus", "FAILED")
threats = detail.get("scanResultDetails", {}).get("threats")
log = {"bucket": src_bucket, "key": key, "version": version_id, "result": result}
if result == CLEAN_STATUS:
dest = CLEAN_BUCKET
disposition = "promoted"
else:
dest = QUARANTINE_BUCKET
disposition = "quarantined"
# ── 昇格/隔離(冪等)──
# 同一バージョンを宛先キーに含めることで、再配信されても同じ宛先に上書きコピー=
# 何度実行しても結果は同じ(at-least-once に対する冪等性)。
dest_key = f"{key}" if version_id is None else f"{key}"
moved = _idempotent_copy(src_bucket, key, version_id, dest, dest_key)
# 脅威・スキャン不可は人間に通知(fail-closed の確認とトリアージ)。
if result != CLEAN_STATUS:
_alert(src_bucket, key, result, threats, disposition)
logger.info(json.dumps({**log, "disposition": disposition, "copied": moved}))
return {"disposition": disposition, "result": result}
def _idempotent_copy(
src_bucket: str, key: str, version_id: str | None, dest_bucket: str, dest_key: str
) -> bool:
"""landing から dest へコピー(冪等)。既に同じ版がコピー済みなら no-op。
冪等キー: 宛先に '元の versionId' をメタデータとして書き、再実行時に一致したらスキップ。
landing の原本は消さない(誤判定からの復旧余地を残す=取り消し可能)。
"""
# 既にコピー済みかを確認(同じ source version なら 2 回目はスキップ)。
try:
head = s3.head_object(Bucket=dest_bucket, Key=dest_key)
if head.get("Metadata", {}).get("source-version-id") == (version_id or ""):
return False # 既に同じ版を処理済み → no-op
except ClientError as exc:
if exc.response["Error"]["Code"] not in ("404", "NoSuchKey"):
raise # 想定外のエラーは握りつぶさない
copy_source: dict[str, str] = {"Bucket": src_bucket, "Key": key}
if version_id:
copy_source["VersionId"] = version_id
s3.copy_object(
Bucket=dest_bucket,
Key=dest_key,
CopySource=copy_source,
# 元バージョンを冪等キーとして残す。MetadataDirective=REPLACE で確実に書く。
Metadata={"source-version-id": version_id or ""},
MetadataDirective="REPLACE",
)
return True
def _alert(
bucket: str, key: str, result: str, threats: list[dict[str, str]] | None, disposition: str
) -> None:
"""脅威・スキャン不可をセキュリティ担当へ通知。中身は読まない・載せない。"""
threat_names = ", ".join(t.get("name", "?") for t in (threats or [])) or "n/a"
sns.publish(
TopicArn=ALERT_TOPIC_ARN,
Subject=f"[S3 Malware][{result}] {bucket}/{key}",
Message="\n".join(
[
f"bucket: {bucket}",
f"key: {key}",
f"scanResultStatus: {result}",
f"threats: {threat_names}",
f"disposition: {disposition}",
"note: 'NO_THREATS_FOUND' 以外は安全とみなさず隔離済み。要トリアージ。",
]
),
)
Let me make explicit the design decisions of this code.
- fail-closed (fall to the safe side): what gets promoted is the single one of
NO_THREATS_FOUND.UNSUPPORTED,ACCESS_DENIED,FAILED, and even if an unknown value comes, all fall to quarantine. "Treat what couldn't be judged as dangerous" — this is the default of security. - Idempotent: leave the original
versionIdas metadata on the destination, and detect withhead_objecton the second time onward to no-op. Even if the same result comes multiple times with at-least-once, the copy is one time's worth. This is the same shape as making the same event received twice billed once on the payment platform. - Reversible: promote / isolate by copying without deleting the
landingoriginal. Even if a misjudgment (false positive) is found later, the original remains, so you can revert. - Least privilege: this Lambda's execution role is limited to
s3:GetObject*onlanding,s3:PutObject*+head_objectonclean/quarantine, andsns:Publish, withResourcenarrowed to each bucket ARN. Because it spans buckets, also explicitly allow this role on the bucket-policy side of each bucket. - Don't handle contents: the Lambda doesn't read the object's bytes (the copy is S3 server-side
copy_object). It puts the threat name in the notification, but doesn't emit the file contents to logs or the notification (reconciling observability and security).
Wiring the EventBridge rule and the Lambda: create a rule that picks up
detail-type = "GuardDuty Malware Protection Object Scan Result", put the Lambda as the target, and attach a retry + DLQ (the same pattern as theretry_policy+dead_letter_configof the automated-response article). A non-empty DLQ = there's an object whose result couldn't be processed = a dangerous silence, so always make it an alert target.
5.4 Terraform: attach the Malware Protection plan to the landing bucket
With the aws_guardduty_malware_protection_plan resource, protect only the uploads/ prefix of the landing bucket and enable result tagging.
# landing バケットに Malware Protection for S3 を有効化する。
# role = GuardDuty が assume してスキャン・タグ付けするための IAM ロール ARN。
resource "aws_guardduty_malware_protection_plan" "landing" {
role = aws_iam_role.gd_malware_s3.arn
protected_resource {
s3_bucket {
bucket_name = aws_s3_bucket.landing.id
# スキャン対象をアップロード受け口に限定(最大5プレフィックス)。
# 受け口を絞ることで、無関係なオブジェクトのスキャン課金を避ける。
object_prefixes = ["uploads/"]
}
}
# スキャン結果をオブジェクトタグ(GuardDutyMalwareScanStatus)に書く。
# 5.2 の TBAC ポリシーはこのタグに依存するので ENABLED 必須。
actions {
tagging {
status = "ENABLED"
}
}
tags = { ManagedBy = "terraform", Purpose = "upload-malware-scan" }
}
A standalone-operation note: this
aws_guardduty_malware_protection_plancan be created even withoutaws_guardduty_detector— this is the Terraform expression of "standalone mode." If you want to make it with GuardDuty, separately enableaws_guardduty_detector(and theMALWARE_PROTECTION-family feature if you like), and detection will also appear as a finding. For the trust policy and permissions of the IAM role passed torole(s3:GetObject/kms:Decrypt/s3:PutObjectTagging+ for the EventBridge managed rule), minimize per the official template as in 3.2.
6. Cost: usage-based, the free tier, standalone vs. with GuardDuty
6.1 The billing model (verify the latest officially)
Malware Protection for S3's pricing is usage-based, different from other protection plans. Grasp the concept and you can read the budget.
| Billing target | Billing unit | In the free tier? |
|---|---|---|
| Scanned data volume | Per GB | Up to 1 GB per month free |
| Objects evaluated | Per request (object) | Up to 1,000 requests per month free |
| S3 object tagging | S3's tagging cost | Not in the free tier |
| The S3 APIs GuardDuty hits (GET/PUT etc.) | S3's API cost | (S3-side normal billing) |
The official free tier is "per account, per Region, up to 1,000 requests + 1 GB of data scanned per month free." Usage-based billing starts from the portion exceeding this. Note that on-demand scanning and tagging are not in the free tier.
Always confirm the monetary figures with the latest official values: in this article, I deliberately don't assert specific unit prices (USD/GB, USD/1,000 objects, etc.). Because pricing differs by Region and gets revised, it's correct to estimate the latest value for your Region on the GuardDuty pricing page. When you look at the US East (N. Virginia) figures too, treat them as a reference (verify needed). The cost-optimization story is dug into separately in the GuardDuty cost-optimization article.
6.2 The cost implications of standalone vs. with GuardDuty
- Standalone mode: you pay only Malware Protection for S3's usage-based billing. No cost of GuardDuty itself (foundational detection, other protection plans) is incurred. You can answer the requirement of "just upload quarantine" at minimal cost.
- With GuardDuty: on top of Malware Protection for S3's usage-based billing, the cost of GuardDuty itself and other enabled plans rides on. But you gain the value of malware detection riding onto existing incident response / correlation as a finding.
The crux of cost design: scoping by prefix (just
uploads/), and limiting the scan target to "the receiver where users actually upload," directly minimizes billing. Carelessly protect the whole bucket and it scans even temporary objects generated by internal processing, which can swell the object-evaluation count. Creating "billing proportional to assets" is the work of design here too.
7. Summary: a Malware Protection for S3 production cheat sheet
A quick-reference table for when you're unsure.
- Don't mix up the 2 features: S3 Protection = detect suspicious access / exfiltration with CloudTrail's S3 data events (GuardDuty required, feature
S3_DATA_EVENTS). Malware Protection for S3 = malware-scan the contents of new uploads. "Who did what" vs. "what came in." And it's also different from for EC2 (EBS scanning). - Standalone-operable: you can enable just Malware Protection for S3 without GuardDuty itself. But no detector ID = no GuardDuty finding generated on malware detection. The result is only the EventBridge default bus + CloudWatch + (optional) tags. A "finding-premised" alert doesn't ring in standalone mode.
- Mechanism: a per-bucket Malware Protection plan. Auto-scan on upload (
PutObject, etc.). Own account, same Region only; even a delegated administrator can't do a member's bucket. Scope to up to 5 by prefix, KMS-supported (SSE-C unsupported). Existing/re-scan isSendObjectMalwareScan(on-demand, not in the free tier). - Limits (verify the latest): max object 100 GB, extracted files 100,000, nesting 100, protected buckets 25/account/Region. Exceeding it is
UNSUPPORTED. - Result: the tag
GuardDutyMalwareScanStatus=NO_THREATS_FOUND/THREATS_FOUND/UNSUPPORTED/ACCESS_DENIED/FAILED. Tagging must be enabled before upload. EventBridge isdetail-type="GuardDuty Malware Protection Object Scan Result", at-least-once → idempotency required.scanStatus(status) andscanResultStatus(result) are different things. - The safe pipeline: landing (protection + tags) → EventBridge → an idempotent Lambda promotes only
NO_THREATS_FOUNDto clean, and quarantines everything else (fail-closed). Downstream reads only clean. Use a TBAC bucket policy to "DENY the GetObject of objects without the clean tag," and forbid tag tampering by anyone but GuardDuty (+ an SCP if in an organization). - Cost (verify the latest): usage-based on scanned GB + objects evaluated. 1,000 requests + 1 GB per month free (on-demand / tagging not in it). Scoping by prefix is itself the saving. Standalone mode has zero GuardDuty cost.
Malware Protection for S3 isn't "put it in the box and it's safe on its own"; its value is decided by "whether you can turn the scan result (especially other than NO_THREATS_FOUND) into plumbing that's idempotent, reversible, and fail-closed." The greatest leverage is in the design of the boundary (EventBridge + TBAC) that physically cuts off contamination from downstream and flows only clean, more than the detection itself.
On a multi-account serverless payment platform, I implemented IAM, observability, and DR across a platform handling actual money, carbon credits, and regional currencies, and ensured "correctness" with the structure of code and idempotency rather than operational vigilance — the idea of making the same event received twice have a one-time side effect in an at-least-once world can be diverted directly to processing scan results. I have no intention of claiming I operated Malware Protection for S3 in a specific client project. But this "secure upload pipeline (standalone operation, TBAC gating, idempotent promotion / isolation)" — based on the above real experience, I can design, implement, and deliver it.
"How do I build a malware-scanning quarantine into my company's upload feature — start standalone or put it on GuardDuty itself, how to protect downstream with TBAC, how to minimize cost." From the requirements-organizing stage through the implementation of Terraform / Lambda / bucket policies, I can accompany you fast and safely, one person × generative AI (Claude Code). Feel free to consult me.
Reference (official documentation)
- GuardDuty Malware Protection for S3 — the feature overview, the two enablement approaches (with GuardDuty / standalone), the reason no finding is generated in standalone mode, the own-account & same-Region constraint
- How does Malware Protection for S3 work? — the Malware Protection plan, the IAM role, prefixes, KMS decryption, the tag key
GuardDutyMalwareScanStatus, at-least-once delivery - Monitoring S3 object scans in Malware Protection for S3 — the exact values of scanStatus and scanResultStatus, the list of
statusReasons - Monitoring S3 object scans with Amazon EventBridge — the complete JSON schema of
detail-type="GuardDuty Malware Protection Object Scan Result" - Using tag-based access control (TBAC) with Malware Protection for S3 — the official template of the S3 bucket policy that DENYs unless
NO_THREATS_FOUND, tag-tampering prevention - On-demand S3 malware scan in GuardDuty — the
SendObjectMalwareScanAPI, existing/re-scan, not in the free tier - Quotas in Malware Protection for S3 — max object 100 GB, extracted files 100,000, nesting 100, protected buckets 25
- Pricing and usage cost for Malware Protection for S3 — the free tier (1,000 requests + 1 GB/month), tagging / on-demand not in it
- GuardDuty S3 Protection — CloudTrail S3 data-event monitoring (different from Malware Protection for S3)
- Terraform: aws_guardduty_malware_protection_plan —
role/protected_resource { s3_bucket { bucket_name, object_prefixes } }/actions { tagging { status } }