Skip to main content
友田 陽大
Amazon GuardDuty in production
セキュリティ
AWS
GuardDuty
S3
マルウェア対策

Auto-Scanning Uploaded Files with GuardDuty Malware Protection for S3: Standalone Operation, Scan-Result Gating, and the Difference from S3 Protection in Real Code

A production design guide for auto-malware-scanning uploaded S3 objects with GuardDuty Malware Protection for S3. Explained with real Terraform / Python / bucket-policy code: the difference from the easily-confused 'S3 Protection (CloudTrail data-event monitoring),' the standalone operation mode used without GuardDuty itself (no detector ID = no finding generated), scan-result tags (GuardDutyMalwareScanStatus) and EventBridge events, and a secure upload pipeline that promotes only NO_THREATS_FOUND to a clean bucket and seals off reads with tag-based access control (TBAC).

Published
Reading time
26 min read
Author
友田 陽大
Share
Contents

"The files users upload — who guarantees they're not malware?" — this is the first question I throw at a place where I'm consulted about the security of a SaaS with an upload feature.

Usually what comes back is silence, or "we reject by extension," "we check the MIME on the front." But that's a name-tag check at the entrance, not a scan of the contents. An attacker can fake both the extension and the Content-Type. An uploaded PDF is actually an executable, and another user downloads and opens it — then your app becomes a distribution route for malware. And what's frightening is that it happens through the legitimate user flow.

This article is an implementation guide for designing and implementing, at production quality, a mechanism with GuardDuty Malware Protection for S3 of "auto-malware-scanning objects uploaded to S3 and flowing only those confirmed clean downstream." As the subject matter, I'll weave in my experience implementing IAM, observability, and DR across a serverless payment platform on multi-account AWS, and ensuring with idempotency, on a platform handling actual money, that "the same event received twice has a one-time side effect" — that idea is exactly the same as the design of safely handling at-least-once scan-result events.

The rule of this article: The specs, tag values, limits, pricing, and EventBridge event structure are based on the AWS official documentation (as of June 2026). Because limits, pricing, and supported Regions get revised, always confirm the latest values (the quotas / pricing pages) officially before going to production. And one more — GuardDuty Malware Protection for S3 is not a "full AV / EDR." It's one layer that detects known and some unknown malware with a scan engine, and it doesn't replace input validation, least-privilege IAM, encryption, or WAF. Start the automated processing that receives a scan result with ones that satisfy "idempotent, scope-narrowed, reversible."


0. Mental model: this is "a quarantine for uploads," not "a constant surveillance camera"

Before starting the design, let me separate, in one line, the two easily-confused S3 security features. Without fixing this first, the requirement and the feature pass each other by.

S3 Protection = monitor "access (API operations)" to S3 and detect suspicious behavior (CloudTrail data events). Malware Protection for S3 = malware-scan the "contents of files" uploaded to S3. The former is "who did what," the latter is "what came in."

From here, three consequences emerge. These are the foundation of the design decisions.

  1. What they look at differs. S3 Protection, included in GuardDuty threat detection, analyzes CloudTrail's S3 data events (GetObject / PutObject / ListObjects / DeleteObject, etc.) and detects "suspicious access or data exfiltration using legitimate credentials." On the other hand, Malware Protection for S3 downloads the contents of a newly uploaded object and malware-scans it. The former looks at behavior, the latter at contents — different things (I'll settle it with a table in Section 1).
  2. Malware Protection for S3 works without GuardDuty itself. This is the biggest feature of this article's lead feature. Without enabling the GuardDuty service, you can use just this feature standalone (independent). But in standalone mode, the account has no detector ID, so even if it detects malware no GuardDuty finding is generated. The result appears only on EventBridge's default bus, CloudWatch, and (optionally) object tags (Section 2).
  3. Detection alone isn't safe. You need the plumbing of "quarantine → isolate → promote." Even if you scan and learn "it's malware," it's meaningless if that contaminated object is still in a place readable downstream. The climax of this article is the plumbing that physically makes a contaminated object unreadable downstream and flows only clean ones (Section 5's secure upload pipeline + TBAC).

Grasp these three points and you'll see what to do is the three of "① don't mix up the 2 features → ② correctly enable the protection plan (standalone or with GuardDuty) → ③ turn scan results into 'quarantine / promote' with idempotent plumbing." Let's build them in order.


1. The settling table to not mix them up: S3 Protection vs. Malware Protection for S3

Because the names are similar, the most common accident in the field is "a mismatch between what you want to do and the feature you enabled." First, let me settle it head-on.

AspectS3 ProtectionMalware Protection for S3
What it looks atAPI access to S3 (CloudTrail data events)The contents of an uploaded object (malware)
Threat it detectsSuspicious access, data exfiltration / destruction (misuse of leaked credentials, etc.)Malware contained in an uploaded file
TriggerOperations like GetObject / PutObject / ListObjects / DeleteObjectA new upload of an object (or a new version)
GuardDuty itselfRequired (a feature of the protection plan)OK even without it (standalone-operable)
Feature name / resourcefeature S3_DATA_EVENTS (a detector feature)Malware Protection plan (a per-bucket resource)
Result outputA GuardDuty findingA finding (when with GuardDuty) / EventBridge + CloudWatch + tags
Billing unitS3 data-event volumeScanned GB + objects evaluated

Borrowing the official wording, S3 Protection "helps you detect potential security risks for data, such as data exfiltration and destruction" — that is, detecting threats to data (exfiltration / destruction). Malware Protection for S3 "helps you detect potential presence of malware by scanning newly uploaded objects"malware-scanning new uploads.

A design guideline: these two don't compete; they complement. For a bucket that "accepts uploads" and "holds sensitive data," it's ideal to enable both. Malware Protection for S3 watches "the bad things coming in," and S3 Protection watches "exfiltration mixed into legitimate access." Note that S3 Protection is a GuardDuty protection plan, so for how to enable it, see the S3_DATA_EVENTS feature in the pillar article. From here on, this article narrows to Malware Protection for S3.

Beware a third, even more confusing existence: there's another feature, "Malware Protection for EC2." This agentlessly scans the EBS volumes attached to EC2/containers, and its target is completely different from this article's for S3. Talk with just "Malware Protection" and the three get crossed, so always say for S3 / for EC2 to prevent accidents.


2. How Malware Protection for S3 works: the decisive difference between standalone and "with GuardDuty"

2.1 The scan-on-upload model

The mechanism is simple. When an object is newly uploaded to a bucket with protection enabled (officially a "protected bucket") (or a new version of an existing object is uploaded), GuardDuty automatically starts a malware scan.

What triggers the scan is S3's Object Created-family events — PutObject / POST Object / CopyObject / CompleteMultipartUpload. GuardDuty downloads the target object via AWS PrivateLink, and decrypts, reads, and scans it in an isolated environment (an internet-disconnected VPC) in the same Region. The temporary copy during the scan is KMS-encrypted, and the downloaded copy is deleted after the scan completes. That is, your data never leaves for the scan, and only the result metadata remains.

2.2 The two enablement approaches — the core of this article

Malware Protection for S3 has two ways to enable it. This difference divides the operational design.

(a) Use it with GuardDuty(b) Use it standalone (independent)
The GuardDuty serviceEnabled (a detector ID exists)OK disabled (no detector ID)
A finding on malware detectionA GuardDuty finding is generatedNo finding is generated
Receiving the resultA finding (+ export to S3/EventBridge)Only the EventBridge default bus + CloudWatch + (optional) object tags
Correlation with existing GuardDuty detectionsPossible (lines up with ETD and other detections)Not possible (an isolated single feature)
Suited caseYou already operate GuardDuty company-wideA minimal configuration of "just want to scan uploads"

Accurately grasp the decisive property of standalone mode the official docs write clearly:

"When you enable Malware Protection for S3 independently in an account, that account will not have an associated detector ID. ... when an S3 malware scan detects the presence of malware, no GuardDuty finding will get generated in your AWS account because all GuardDuty findings are associated with a detector ID."

That is — in standalone mode, "even if it finds malware, nothing appears on the GuardDuty dashboard." This is not a defect but a design. Because a GuardDuty finding is tied to a detector ID, no detector means no finding. Instead, the result appears on the EventBridge default event bus, CloudWatch metrics, and, if enabled, object tags.

So in a standalone-mode design, EventBridge and tags are "the only exit for detection." Build alerts on the premise that "a finding will come," and in standalone mode the alerts never ring. The pipeline described later is built around this EventBridge event.

2.3 Trade-off: which to choose

  • If you already operate GuardDuty in your organization → (a) with GuardDuty. It rides naturally onto the existing incident-response plumbing (EventBridge → automated response) as a finding. The value of malware detection lining up on the same playing field as other threat signals is large.
  • If "I want to scan just this upload bucket" / "I don't need all of GuardDuty right now" → (b) standalone. You can introduce the feature pinpoint, at minimal cost and least privilege. Enable GuardDuty itself later and findings will appear too.

My recommendation: first start small in standalone mode, receive scan results with EventBridge, and build the plumbing. The company-wide rollout of GuardDuty itself is a big decision in its own right, so don't take "upload quarantine," a single requirement, hostage to it. Even if you later promote to (a), the plumbing built around EventBridge can be reused as-is (the finding route is just added).


3. The enablement components: the Malware Protection plan, IAM role, prefixes, limits

3.1 What you can and can't do (fix the constraints first)

  • Only your own account's buckets. Even a delegated GuardDuty administrator can't enable it on a member account's bucket (it's closed within the same account).
  • Same Region only. A cross-Region bucket is out of scope.
  • A per-bucket "Malware Protection plan" resource is created, with a unique plan ID. GuardDuty auto-creates and manages an EventBridge managed rule named DO-NOT-DELETE-AmazonGuardDutyMalwareProtectionS3* (don't delete it by hand).
  • You can scope by prefix. Rather than the whole bucket, you can target only specific object prefixes (up to 5) for scanning. Effective for a design of "the upload receiver is only under uploads/."
  • Supports KMS-encrypted buckets (decrypted inside the scan environment). But objects with SSE-C (customer-provided keys) can't be scanned (the later ACCESS_DENIED reason SSE_C_ENCRYPTED_OBJECT).

3.2 The IAM role: least privilege making GuardDuty act "on your behalf"

Malware Protection for S3 requires an IAM role for GuardDuty to run scans in your account. The permissions this role needs are roughly the next 3 categories:

  1. Receive notification of new uploads (via the EventBridge managed rule)
  2. Read and decrypt the target object (s3:GetObject + kms:Decrypt if needed)
  3. (Optional) tag after the scan (s3:PutObjectTagging)

The role's trust policy lets the GuardDuty Malware Protection service principal sts:AssumeRole. It's safe to follow the official IAM policy template, and the recommended operation is to add target bucket names to the same role when adding buckets. Per the principle of least privilege, narrow the Resource to "only this bucket, this prefix" (the same shape as the least-privilege thinking at the data layer).

3.3 Limits (the exact official values, verify the latest)

LimitDefault valueAdjustableNote
Max S3 object size100 GBNoIf you need a larger target, consult AWS Support
Extracted file count100,000NoThe max number of files expandable/analyzable in an archive
Max nesting depth100NoThe max levels of archive nesting
Max protected buckets25NoPer account × Region

Exceed these and the scan is skipped, and the result becomes UNSUPPORTED (example reasons: OBJECT_SIZE_LIMIT_EXCEEDED / EXTRACTED_FILE_LIMIT_EXCEEDED / EXTRACTED_LEVEL_LIMIT_EXCEEDED / EXTRACTION_RATIO_LIMIT_EXCEEDED). Always handle the point that "couldn't scan ≠ safe" in the later plumbing (Section 5).

3.4 On-demand scanning: for existing objects / re-scanning

Automatic scanning runs against new uploads, but for objects that existed before you enabled protection or to re-scan something already scanned, use on-demand scanning.

# 既存オブジェクト(最新バージョン)をオンデマンドでスキャン。
# 事前条件: 対象バケットで Malware Protection for S3 が有効 + 呼び出し元に
#           AWS マネージドポリシー AmazonGuardDutyFullAccess_v2 が付与されていること。
aws guardduty send-object-malware-scan \
  --s3-object '{"Bucket": "my-upload-landing", "Key": "uploads/legacy-file.pdf"}'

# 特定バージョンを指定してスキャンする場合は VersionId を渡す。
aws guardduty send-object-malware-scan \
  --s3-object '{"Bucket": "my-upload-landing", "Key": "uploads/legacy-file.pdf", "VersionId": "d41d8cd9...EXAMPLE"}'

Cautions: on-demand scanning overrides the plan's prefix setting (you can target outside the prefix), and the limits and pricing apply the same as automatic scanning. And important — on-demand is not in the free tier. "A success response ≠ scan complete" but only accepted, so always confirm the result with EventBridge / tags / CloudWatch.


4. Reading the scan result: tags, status values, EventBridge events

To build automated processing, you need to accurately read the structure of the result. There are 3 exits — object tags, EventBridge events, and CloudWatch metrics.

4.1 Scan-result tags (optional, enabling them is mandatory "before upload")

Enable tagging and after the scan GuardDuty attaches a predefined tag to the object. The key and value are fixed officially:

Key:    GuardDutyMalwareScanStatus
Value:  NO_THREATS_FOUND | THREATS_FOUND | UNSUPPORTED | ACCESS_DENIED | FAILED
Result valueMeaningScan status
NO_THREATS_FOUNDNo threats detectedCompleted
THREATS_FOUNDA threat detectedCompleted
UNSUPPORTEDUnscannable (password-protected, size/compression-ratio exceeded, unsupported S3 feature, etc.)Skipped
ACCESS_DENIEDCan't access the object (IAM role permissions, SSE-C, etc.)Skipped
FAILEDCouldn't scan due to an internal errorFailed

A fatal pitfall: unless tagging is enabled "before" the object is uploaded, enabling it later won't tag that object. So the iron rule is the order "create the bucket → enable protection + tagging → then start accepting uploads." Also, the max tags attachable to an object is 10, and if the slots are full GuardDuty can't tag it, and instead a "post-scan tag failure" event appears on EventBridge.

4.2 The EventBridge scan-result event (the lead of automated processing)

GuardDuty always publishes the scan result to the default EventBridge event bus (both standalone and with GuardDuty). This is the entrance to automated processing. The detail-type is GuardDuty Malware Protection Object Scan Result, and the source is aws.guardduty.

The NO_THREATS_FOUND event (official schema, excerpt):

{
  "detail-type": "GuardDuty Malware Protection Object Scan Result",
  "source": "aws.guardduty",
  "account": "111122223333",
  "region": "us-east-1",
  "resources": ["arn:aws:guardduty:us-east-1:111122223333:malware-protection-plan/b4c7f464ab3a4EXAMPLE"],
  "detail": {
    "schemaVersion": "1.0",
    "scanStatus": "COMPLETED",
    "resourceType": "S3_OBJECT",
    "s3ObjectDetails": {
      "bucketName": "amzn-s3-demo-bucket",
      "objectKey": "uploads/report.pdf",
      "eTag": "ASIAI44QH8DHBEXAMPLE",
      "versionId": "d41d8cd98f00b204e9800998eEXAMPLE",
      "s3Throttled": false
    },
    "scanResultDetails": {
      "scanResultStatus": "NO_THREATS_FOUND",
      "threats": null,
      "statusReasons": null
    }
  }
}

For THREATS_FOUND, scanResultDetails.threats holds the detection name (by default it reports the first detected one, and scanStatus is COMPLETED):

{
  "detail": {
    "scanStatus": "COMPLETED",
    "s3ObjectDetails": { "bucketName": "amzn-s3-demo-bucket", "objectKey": "uploads/evil.bin", "versionId": "..." },
    "scanResultDetails": {
      "scanResultStatus": "THREATS_FOUND",
      "threats": [ { "name": "EICAR-Test-File (not a virus)" } ],
      "statusReasons": null
    }
  }
}

When the scan was skipped, it's scanStatus: "SKIPPED", with scanResultStatus being UNSUPPORTED or ACCESS_DENIED, and further statusReasons holding the concrete reason (PASSWORD_PROTECTED, SSE_C_ENCRYPTED_OBJECT, OBJECT_SIZE_LIMIT_EXCEEDED, etc.).

At-least-once (must read): the official docs state clearly — "GuardDuty uses at-least-once delivery, which means you might receive multiple scan results for the same object. We recommend designing your applications to handle duplicate results." That is, the scan result of the same object can arrive multiple times. With the same thinking as preventing double charges on a payment platform, the result handler must be idempotent (Section 5's Lambda builds that out). Note that billing is once per object even with duplicates.

4.3 A note on the status model: status (scanStatus) and result (scanResultStatus) are different

Let me make it explicit because it's easy to confuse. scanStatus is "the scan's status" (COMPLETED / SKIPPED / FAILED), and scanResultStatus is "the result" (the 5 values above). Even with COMPLETED, the result is one of THREATS_FOUND or NO_THREATS_FOUND. "SKIPPED / FAILED / UNSUPPORTED / ACCESS_DENIED do not mean 'safe'" — it just couldn't be scanned. The iron rule of design is "explicitly allow (allowlist) only NO_THREATS_FOUND, and fall everything else to the isolation side." This becomes the core of the next chapter's plumbing.


5. A high-value application: a secure upload pipeline (landing → clean / quarantine)

This is the climax of this article. Scanning alone isn't safe. "Keep an object that might be contaminated in a state downstream absolutely can't read, and flow only those confirmed clean" — let's build that plumbing.

5.1 The big picture: 3 buckets + event-driven promotion

ユーザー
  │  アップロード(署名付き URL など)
  ▼
landing バケット(Malware Protection for S3 有効・タグ付け ON)
  │  ・下流は読めない(TBAC: 清浄タグが無いオブジェクトの GetObject を DENY)
  │  ・GuardDuty が自動スキャン
  ▼
EventBridge(detail-type = "GuardDuty Malware Protection Object Scan Result")
  ▼
スキャン結果 Lambda(冪等)
  ├─ NO_THREATS_FOUND → clean バケットへ「昇格」(コピー)。下流はここだけ読む
  ├─ THREATS_FOUND    → quarantine バケットへ隔離 + セキュリティ通知(人間へ)
  └─ それ以外(UNSUPPORTED/ACCESS_DENIED/FAILED/SKIPPED) → quarantine + 要調査通知
                                                          (「スキャン不可 ≠ 安全」)
下流の消費者
  └─ clean バケットだけを読む(landing は TBAC で物理的に読めない)

The crux of the design is a 2-stage defense:

  1. EventBridge-driven promotion — move only clean ones to clean (downstream sees only clean).
  2. TBAC (tag-based access control) — even if someone tries to read landing, DENY the GetObject of objects without the NO_THREATS_FOUND tag with the S3 bucket policy. Because it's sealed at the storage-layer boundary, not app logic, a code bug can't break it (the same idea as making least privilege effective at the data layer).

5.2 The TBAC bucket policy: don't let it be read without the clean tag

A policy for the landing bucket of "can't be read unless NO_THREATS_FOUND," following the official template. Replace {{...}} with your values.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "NoReadUnlessClean",
      "Effect": "Deny",
      "NotPrincipal": {
        "AWS": [
          "arn:aws:sts::555555555555:assumed-role/IAM-role-name/GuardDutyMalwareProtection",
          "arn:aws:iam::555555555555:role/IAM-role-name"
        ]
      },
      "Action": ["s3:GetObject", "s3:GetObjectVersion"],
      "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*",
      "Condition": {
        "StringNotEquals": {
          "s3:ExistingObjectTag/GuardDutyMalwareScanStatus": "NO_THREATS_FOUND"
        }
      }
    },
    {
      "Sid": "OnlyGuardDutyCanTagScanStatus",
      "Effect": "Deny",
      "NotPrincipal": {
        "AWS": [
          "arn:aws:sts::555555555555:assumed-role/IAM-role-name/GuardDutyMalwareProtection",
          "arn:aws:iam::555555555555:role/IAM-role-name"
        ]
      },
      "Action": "s3:PutObjectTagging",
      "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*",
      "Condition": {
        "ForAnyValue:StringEquals": {
          "s3:RequestObjectTagKeys": "GuardDutyMalwareScanStatus"
        }
      }
    }
  ]
}

Reading this policy:

  • NoReadUnlessClean: DENY the read of objects whose s3:ExistingObjectTag/GuardDutyMalwareScanStatus is not NO_THREATS_FOUND. An object that isn't tagged yet (= scan not complete) is naturally not NO_THREATS_FOUND either, so it can't be read. It guarantees, at the storage layer, "no one can read until the scan finishes and it's confirmed clean."
  • Exclude only the GuardDuty role with NotPrincipal: exclude the scan-execution role (and the .../GuardDutyMalwareProtection GuardDuty assumes) for reading and tagging.
  • OnlyGuardDutyCanTagScanStatus: a DENY making only GuardDuty able to attach the GuardDutyMalwareScanStatus tag. Without this, someone could manually attach NO_THREATS_FOUND to a contaminated object and slip past the gate.

Additional defense in an organization: if you use AWS Organizations, enforce company-wide with an SCP that "the GuardDutyMalwareScanStatus tag can't be tampered with" (the official docs guide you to use the EC2 example replaced with s3). If a tag becomes the basis of trust, preventing tag tampering is the prerequisite. Skip this and TBAC becomes "an unlocked safe."

5.3 The scan-result Lambda (Python, idempotent)

A Lambda that receives the EventBridge scan result and promotes only NO_THREATS_FOUND to clean, and isolates everything else to quarantine. Made idempotent on the premise of at-least-once delivery.

"""GuardDuty Malware Protection for S3 のスキャン結果に応答する Lambda。

設計原則:
  - 冪等: EventBridge は at-least-once。同じオブジェクトの結果を2回受けても副作用は1回分。
  - allowlist で安全側に倒す: 'NO_THREATS_FOUND' のときだけ昇格。それ以外は全て隔離。
    (UNSUPPORTED/ACCESS_DENIED/FAILED/SKIPPED は『スキャン不可』であって『安全』ではない)
  - 取り消し可能: landing からは消さずコピーで昇格/隔離。誤判定でも原本が残る。
  - 可観測: 構造化ログ。脅威名は通知に載せるが、オブジェクトの中身は読まない・出さない。
"""
from __future__ import annotations

import json
import logging
import os
from typing import Any, Final
from urllib.parse import unquote_plus

import boto3
from botocore.exceptions import ClientError

logger = logging.getLogger()
logger.setLevel(logging.INFO)

s3 = boto3.client("s3")
sns = boto3.client("sns")

CLEAN_BUCKET: Final[str] = os.environ["CLEAN_BUCKET"]
QUARANTINE_BUCKET: Final[str] = os.environ["QUARANTINE_BUCKET"]
ALERT_TOPIC_ARN: Final[str] = os.environ["ALERT_TOPIC_ARN"]

# 昇格を許す唯一の結果値。これ以外は全部隔離側へ倒す(fail-closed)。
CLEAN_STATUS: Final[str] = "NO_THREATS_FOUND"


def handler(event: dict[str, Any], _context: object) -> dict[str, str]:
    detail = event["detail"]
    obj = detail["s3ObjectDetails"]
    src_bucket: str = obj["bucketName"]
    # S3 のイベントキーは URL エンコードされ得るのでデコードする。
    key: str = unquote_plus(obj["objectKey"])
    version_id: str | None = obj.get("versionId")
    result: str = detail.get("scanResultDetails", {}).get("scanResultStatus", "FAILED")
    threats = detail.get("scanResultDetails", {}).get("threats")

    log = {"bucket": src_bucket, "key": key, "version": version_id, "result": result}

    if result == CLEAN_STATUS:
        dest = CLEAN_BUCKET
        disposition = "promoted"
    else:
        dest = QUARANTINE_BUCKET
        disposition = "quarantined"

    # ── 昇格/隔離(冪等)──
    # 同一バージョンを宛先キーに含めることで、再配信されても同じ宛先に上書きコピー=
    # 何度実行しても結果は同じ(at-least-once に対する冪等性)。
    dest_key = f"{key}" if version_id is None else f"{key}"
    moved = _idempotent_copy(src_bucket, key, version_id, dest, dest_key)

    # 脅威・スキャン不可は人間に通知(fail-closed の確認とトリアージ)。
    if result != CLEAN_STATUS:
        _alert(src_bucket, key, result, threats, disposition)

    logger.info(json.dumps({**log, "disposition": disposition, "copied": moved}))
    return {"disposition": disposition, "result": result}


def _idempotent_copy(
    src_bucket: str, key: str, version_id: str | None, dest_bucket: str, dest_key: str
) -> bool:
    """landing から dest へコピー(冪等)。既に同じ版がコピー済みなら no-op。

    冪等キー: 宛先に '元の versionId' をメタデータとして書き、再実行時に一致したらスキップ。
    landing の原本は消さない(誤判定からの復旧余地を残す=取り消し可能)。
    """
    # 既にコピー済みかを確認(同じ source version なら 2 回目はスキップ)。
    try:
        head = s3.head_object(Bucket=dest_bucket, Key=dest_key)
        if head.get("Metadata", {}).get("source-version-id") == (version_id or ""):
            return False  # 既に同じ版を処理済み → no-op
    except ClientError as exc:
        if exc.response["Error"]["Code"] not in ("404", "NoSuchKey"):
            raise  # 想定外のエラーは握りつぶさない

    copy_source: dict[str, str] = {"Bucket": src_bucket, "Key": key}
    if version_id:
        copy_source["VersionId"] = version_id

    s3.copy_object(
        Bucket=dest_bucket,
        Key=dest_key,
        CopySource=copy_source,
        # 元バージョンを冪等キーとして残す。MetadataDirective=REPLACE で確実に書く。
        Metadata={"source-version-id": version_id or ""},
        MetadataDirective="REPLACE",
    )
    return True


def _alert(
    bucket: str, key: str, result: str, threats: list[dict[str, str]] | None, disposition: str
) -> None:
    """脅威・スキャン不可をセキュリティ担当へ通知。中身は読まない・載せない。"""
    threat_names = ", ".join(t.get("name", "?") for t in (threats or [])) or "n/a"
    sns.publish(
        TopicArn=ALERT_TOPIC_ARN,
        Subject=f"[S3 Malware][{result}] {bucket}/{key}",
        Message="\n".join(
            [
                f"bucket: {bucket}",
                f"key: {key}",
                f"scanResultStatus: {result}",
                f"threats: {threat_names}",
                f"disposition: {disposition}",
                "note: 'NO_THREATS_FOUND' 以外は安全とみなさず隔離済み。要トリアージ。",
            ]
        ),
    )

Let me make explicit the design decisions of this code.

  • fail-closed (fall to the safe side): what gets promoted is the single one of NO_THREATS_FOUND. UNSUPPORTED, ACCESS_DENIED, FAILED, and even if an unknown value comes, all fall to quarantine. "Treat what couldn't be judged as dangerous" — this is the default of security.
  • Idempotent: leave the original versionId as metadata on the destination, and detect with head_object on the second time onward to no-op. Even if the same result comes multiple times with at-least-once, the copy is one time's worth. This is the same shape as making the same event received twice billed once on the payment platform.
  • Reversible: promote / isolate by copying without deleting the landing original. Even if a misjudgment (false positive) is found later, the original remains, so you can revert.
  • Least privilege: this Lambda's execution role is limited to s3:GetObject* on landing, s3:PutObject* + head_object on clean/quarantine, and sns:Publish, with Resource narrowed to each bucket ARN. Because it spans buckets, also explicitly allow this role on the bucket-policy side of each bucket.
  • Don't handle contents: the Lambda doesn't read the object's bytes (the copy is S3 server-side copy_object). It puts the threat name in the notification, but doesn't emit the file contents to logs or the notification (reconciling observability and security).

Wiring the EventBridge rule and the Lambda: create a rule that picks up detail-type = "GuardDuty Malware Protection Object Scan Result", put the Lambda as the target, and attach a retry + DLQ (the same pattern as the retry_policy + dead_letter_config of the automated-response article). A non-empty DLQ = there's an object whose result couldn't be processed = a dangerous silence, so always make it an alert target.

5.4 Terraform: attach the Malware Protection plan to the landing bucket

With the aws_guardduty_malware_protection_plan resource, protect only the uploads/ prefix of the landing bucket and enable result tagging.

# landing バケットに Malware Protection for S3 を有効化する。
# role = GuardDuty が assume してスキャン・タグ付けするための IAM ロール ARN。
resource "aws_guardduty_malware_protection_plan" "landing" {
  role = aws_iam_role.gd_malware_s3.arn

  protected_resource {
    s3_bucket {
      bucket_name = aws_s3_bucket.landing.id
      # スキャン対象をアップロード受け口に限定(最大5プレフィックス)。
      # 受け口を絞ることで、無関係なオブジェクトのスキャン課金を避ける。
      object_prefixes = ["uploads/"]
    }
  }

  # スキャン結果をオブジェクトタグ(GuardDutyMalwareScanStatus)に書く。
  # 5.2 の TBAC ポリシーはこのタグに依存するので ENABLED 必須。
  actions {
    tagging {
      status = "ENABLED"
    }
  }

  tags = { ManagedBy = "terraform", Purpose = "upload-malware-scan" }
}

A standalone-operation note: this aws_guardduty_malware_protection_plan can be created even without aws_guardduty_detector — this is the Terraform expression of "standalone mode." If you want to make it with GuardDuty, separately enable aws_guardduty_detector (and the MALWARE_PROTECTION-family feature if you like), and detection will also appear as a finding. For the trust policy and permissions of the IAM role passed to role (s3:GetObject / kms:Decrypt / s3:PutObjectTagging + for the EventBridge managed rule), minimize per the official template as in 3.2.


6. Cost: usage-based, the free tier, standalone vs. with GuardDuty

6.1 The billing model (verify the latest officially)

Malware Protection for S3's pricing is usage-based, different from other protection plans. Grasp the concept and you can read the budget.

Billing targetBilling unitIn the free tier?
Scanned data volumePer GBUp to 1 GB per month free
Objects evaluatedPer request (object)Up to 1,000 requests per month free
S3 object taggingS3's tagging costNot in the free tier
The S3 APIs GuardDuty hits (GET/PUT etc.)S3's API cost(S3-side normal billing)

The official free tier is "per account, per Region, up to 1,000 requests + 1 GB of data scanned per month free." Usage-based billing starts from the portion exceeding this. Note that on-demand scanning and tagging are not in the free tier.

Always confirm the monetary figures with the latest official values: in this article, I deliberately don't assert specific unit prices (USD/GB, USD/1,000 objects, etc.). Because pricing differs by Region and gets revised, it's correct to estimate the latest value for your Region on the GuardDuty pricing page. When you look at the US East (N. Virginia) figures too, treat them as a reference (verify needed). The cost-optimization story is dug into separately in the GuardDuty cost-optimization article.

6.2 The cost implications of standalone vs. with GuardDuty

  • Standalone mode: you pay only Malware Protection for S3's usage-based billing. No cost of GuardDuty itself (foundational detection, other protection plans) is incurred. You can answer the requirement of "just upload quarantine" at minimal cost.
  • With GuardDuty: on top of Malware Protection for S3's usage-based billing, the cost of GuardDuty itself and other enabled plans rides on. But you gain the value of malware detection riding onto existing incident response / correlation as a finding.

The crux of cost design: scoping by prefix (just uploads/), and limiting the scan target to "the receiver where users actually upload," directly minimizes billing. Carelessly protect the whole bucket and it scans even temporary objects generated by internal processing, which can swell the object-evaluation count. Creating "billing proportional to assets" is the work of design here too.


7. Summary: a Malware Protection for S3 production cheat sheet

A quick-reference table for when you're unsure.

  • Don't mix up the 2 features: S3 Protection = detect suspicious access / exfiltration with CloudTrail's S3 data events (GuardDuty required, feature S3_DATA_EVENTS). Malware Protection for S3 = malware-scan the contents of new uploads. "Who did what" vs. "what came in." And it's also different from for EC2 (EBS scanning).
  • Standalone-operable: you can enable just Malware Protection for S3 without GuardDuty itself. But no detector ID = no GuardDuty finding generated on malware detection. The result is only the EventBridge default bus + CloudWatch + (optional) tags. A "finding-premised" alert doesn't ring in standalone mode.
  • Mechanism: a per-bucket Malware Protection plan. Auto-scan on upload (PutObject, etc.). Own account, same Region only; even a delegated administrator can't do a member's bucket. Scope to up to 5 by prefix, KMS-supported (SSE-C unsupported). Existing/re-scan is SendObjectMalwareScan (on-demand, not in the free tier).
  • Limits (verify the latest): max object 100 GB, extracted files 100,000, nesting 100, protected buckets 25/account/Region. Exceeding it is UNSUPPORTED.
  • Result: the tag GuardDutyMalwareScanStatus = NO_THREATS_FOUND / THREATS_FOUND / UNSUPPORTED / ACCESS_DENIED / FAILED. Tagging must be enabled before upload. EventBridge is detail-type="GuardDuty Malware Protection Object Scan Result", at-least-once → idempotency required. scanStatus (status) and scanResultStatus (result) are different things.
  • The safe pipeline: landing (protection + tags) → EventBridge → an idempotent Lambda promotes only NO_THREATS_FOUND to clean, and quarantines everything else (fail-closed). Downstream reads only clean. Use a TBAC bucket policy to "DENY the GetObject of objects without the clean tag," and forbid tag tampering by anyone but GuardDuty (+ an SCP if in an organization).
  • Cost (verify the latest): usage-based on scanned GB + objects evaluated. 1,000 requests + 1 GB per month free (on-demand / tagging not in it). Scoping by prefix is itself the saving. Standalone mode has zero GuardDuty cost.

Malware Protection for S3 isn't "put it in the box and it's safe on its own"; its value is decided by "whether you can turn the scan result (especially other than NO_THREATS_FOUND) into plumbing that's idempotent, reversible, and fail-closed." The greatest leverage is in the design of the boundary (EventBridge + TBAC) that physically cuts off contamination from downstream and flows only clean, more than the detection itself.

On a multi-account serverless payment platform, I implemented IAM, observability, and DR across a platform handling actual money, carbon credits, and regional currencies, and ensured "correctness" with the structure of code and idempotency rather than operational vigilance — the idea of making the same event received twice have a one-time side effect in an at-least-once world can be diverted directly to processing scan results. I have no intention of claiming I operated Malware Protection for S3 in a specific client project. But this "secure upload pipeline (standalone operation, TBAC gating, idempotent promotion / isolation)" — based on the above real experience, I can design, implement, and deliver it.

"How do I build a malware-scanning quarantine into my company's upload feature — start standalone or put it on GuardDuty itself, how to protect downstream with TBAC, how to minimize cost." From the requirements-organizing stage through the implementation of Terraform / Lambda / bucket policies, I can accompany you fast and safely, one person × generative AI (Claude Code). Feel free to consult me.


Reference (official documentation)

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

I can take on the implementation from this article as an engagement

Data protection & threat detection for S3 / RDS / Lambda

Automatic malware scanning and quarantine of uploaded files, anomalous DB logins, and serverless network threats. I select a protection plan to fit your assets and implement tag-based access control and automated response.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading