AWS Lambda production-operation guide: firm up the execution model, idempotency, observability, security, and cost with the official spec

"I want to be freed from server management and write only code that reacts to events" — the motive for choosing AWS Lambda can be said in one line. But the moment you put it into production, the questions multiply at once. What if the same event comes twice? Why is startup slow (cold start)? Do I re-establish the DB connection every time? Where do the failed events disappear to? Where do I put secrets? And why is the bill this amount?

This article is an implementation guide for designing and operating AWS Lambda at production quality. As subject matter, I mix in the design judgments from a multi-tenant payment platform for the environment, carbon credits, and regional currency (the reliability design of a serverless payment platform), which I built as a core developer (3 main developers). It's a real example where 0 double charges or balance inconsistencies during production operation was supported by idempotency, failure design, and observability.

The rules of this article: Lambda's spec, parameter names, defaults, and quotas are based on the AWS official documentation (as of June 2026). Quotas and pricing are revised, and initial values differ by region and for new accounts. Always confirm the latest values on each official page (the "References" at the end) before production. The most important premise: Lambda may be invoked "at least once." A retry isn't an anomaly but a spec. So making the function idempotent is the starting point of reliability design.

0. Mental model: Lambda = "event → a disposable execution environment → billed for what you use"

First, let me fix the five mental models running through this article. If you internalize these, all the later implementation is their consequence.

A function = an event handler. Lambda is a service that "when it receives a request, prepares one execution environment, runs the code, and freezes (or discards) it when done." The server is invisible.
The execution environment is disposable, but reused for a certain time. A separate execution environment is allocated per concurrent request (= the concurrency). One environment, when done, is frozen and reused by the next invocation for a while (warm start).
Outside the handler = once, inside the handler = every time. On a cold start, the code outside the handler (INIT) runs once, and on subsequent warm invocations only inside the handler runs. So set up connections and SDK clients outside the handler (chapter 2).
Retry is a premise. Both asynchronous invocations and event sources are at-least-once. Make it idempotent so "the result is the same even if it comes twice" (chapter 7).
Billing is 'allocated memory × execution time (in 1ms units) + the number of requests.' Making it faster = making it cheaper basically coincides (chapter 10).

Right place for the right job: Lambda is strong at "event-driven, spike-following, scale-to-zero," with the constraints that the maximum execution time of a standard function is 900 seconds (15 minutes) and the synchronous payload is 6MB. If you need long-running processing, a resident server, or arbitrary TCP/UDP, Fargate is the prime choice. Which compute foundation to choose I leave to the sister article AWS Fargate vs Lambda vs App Runner tech-selection guide; this piece concentrates on "after choosing Lambda, how to build for production."

1. The execution environment's lifecycle: the 3 phases and the meaning of "outside the handler"

What governs Lambda's behavior is the execution environment's lifecycle. The official explains it in 3 phases.

Phase	What happens	When
Init	① Extension init (starting extensions) ② Runtime init (starting the runtime) ③ Function init (running the static code outside the handler)	once on a cold start
Invoke	the execution of the handler function. The function's timeout works on this whole phase	per request
Shutdown	if there's no invocation for a certain time, discard the environment. Give the extensions a grace period	after idle

Let me pin down three facts that matter for production (all based on the official).

There's a 10-second limit on Init (a standard on-demand function). If the initialization outside the handler exceeds 10 seconds, it's treated as a timeout on the first invocation and retried. Beware of heavy initialization.
Cold starts typically stay under 1% of all invocations, and the duration ranges "under 100ms to over 1 second." Not the frequency but the heaviness of the tail (P99) becomes the problem. For details, I dig in deeper in the sister article Lambda cold-start optimization guide.
From August 1, 2025, INIT time too was unified into being billed. Previously, only "managed-runtime + ZIP on-demand functions" had INIT non-billed, but now INIT is included in Billed Duration for all configurations (custom runtimes, provisioned concurrency, and container images were billed before). The initialization of a cold start is no longer free — this directly ties to chapter 10's cost design.

1.1 "Set up connections outside the handler" is the most important pattern

A frozen execution environment is reused with its memory state. As the official also clearly states, objects declared outside the handler live on between warm invocations. So initialize DB connections and SDK clients outside the handler once and reuse them. Neglecting this re-establishes the connection every time and invites latency, cost, and connection exhaustion.

// handler.ts — 「薄いアダプタ」。重い初期化はハンドラ外（INIT）で1度だけ。
import { Logger } from "@aws-lambda-powertools/logger";
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient } from "@aws-sdk/lib-dynamodb";
import type { SQSEvent, SQSBatchResponse, Context } from "aws-lambda";

// ── INIT フェーズ：コールドスタート時に1度だけ実行され、ウォーム呼び出しで再利用される ──
const logger = new Logger({ serviceName: "orders-worker" });
const ddb = DynamoDBDocumentClient.from(new DynamoDBClient({})); // 接続はここで張る
const TABLE = process.env.ORDERS_TABLE; // 設定の読み出しもここで（毎回 process.env を引かない）

// ── INVOKE フェーズ：リクエストごとにここだけが走る ──
export const handler = async (
  event: SQSEvent,
  _context: Context,
): Promise<SQSBatchResponse> => {
  if (!TABLE) throw new Error("ORDERS_TABLE is not configured"); // 起動時設定の検証
  const batchItemFailures: { itemIdentifier: string }[] = [];

  for (const record of event.Records) {
    try {
      await processOrder(JSON.parse(record.body), { ddb, table: TABLE, logger });
    } catch (err) {
      // 失敗した messageId だけを再表示させる（部分バッチ失敗。第6章）
      logger.error("record processing failed", { messageId: record.messageId, err });
      batchItemFailures.push({ itemIdentifier: record.messageId });
    }
  }
  return { batchItemFailures };
};

Note: the state outside the handler is only "sometimes reused" and not guaranteed. It's reset on each cold start, and each concurrency is a separate environment. So "using a global variable as a cache" is OK (speedup), but "relying on a global variable for uniqueness or consistency" is not. Generate a unique ID or token fresh inside the handler each time (especially important with SnapStart — see the sister article).

2. How to make the handler: a thin adapter + testability

Keep the handler "a thin adapter that receives the event and passes it to the business logic." This is SRP (single responsibility) and the key to testability.

The official standard signature is fixed per language.

// Node.js（推奨は async ハンドラ。Node.js 24 以降は callback 形式は不可）
export const handler = async (event: MyEvent, context: Context): Promise<MyResult> => { /* ... */ };

# Python（async ハンドラは不可。引数は必ず event, context の2つ）
def lambda_handler(event, context):
    ...

context contains information that matters in operation. Pass context.aws_request_id (Python) / context.awsRequestId (Node.js) through all logs as the correlation ID. get_remaining_time_in_millis() / getRemainingTimeInMillis() returns the remaining time, so it's usable for control that safely wraps up just before a timeout.

2.1 Carve the business logic into "a function that doesn't know Lambda"

Handler = adapter, logic = pure function. Splitting this way lets you unit-test the logic without the Lambda runtime (chapter 11). AWS's testing guide also recommends "make the handler a slim adapter, separate the business logic, and unit-test it."

# domain.py — Lambda を一切 import しない純粋ロジック。単体テストが容易。
from dataclasses import dataclass

@dataclass(frozen=True)
class Order:
    id: str
    amount: int  # 最小通貨単位（整数）。金額に float を使わない

def total_with_tax(order: Order, tax_rate: float) -> int:
    if order.amount < 0:
        raise ValueError("amount must be non-negative")  # 境界で検証
    return round(order.amount * (1 + tax_rate))

# handler.py — 薄いアダプタ。境界での検証＋ロジック呼び出しに専念。
import json
from domain import Order, total_with_tax

def lambda_handler(event, context):
    body = json.loads(event["body"])                 # 外部入力は境界でパース
    order = Order(id=str(body["id"]), amount=int(body["amount"]))
    return {"statusCode": 200, "body": json.dumps({"total": total_with_tax(order, 0.10)})}

3. Packaging: ZIP / container / layer

There are 2 deploy methods and 1 means of sharing dependencies. Let me fix the judgment axes in a table.

Method	Size limit (official)	Suited case
ZIP archive	50MB direct upload / 250MB unzipped (including layers and custom runtime)	a general function. The fastest deploy. Layers usable
Container image (OCI)	up to 10GB	large dependencies (ML libraries, etc.), a custom OS/runtime, integration with existing container CI
Layer	up to 5 per function, expanded to `/opt`, shares the 250MB-unzipped quota	share common dependencies / common utilities across a group of ZIP functions

Practical guidance:

Start with ZIP. The deploy is fast, and you can carve out common dependencies with layers. When you hit the 250MB-unzipped cap, go to a container.
Container is for "large, special, existing assets." You can pile up to 10GB, but the cold start and operation tend to be heavy. Layers can't be used (dependencies are bundled into the image).
Layers are not recommended for Go / Rust (official). Bundling dependencies into the executable is advantageous for the cold start.

Anti-pattern: bundling the whole node_modules. With AWS SDK v3, import only the necessary sub-clients and remove unnecessary dependencies; the package shrinks, and download = cold start gets faster. Tree-shaking with a bundler (esbuild, etc.) is the standard.

4. Concurrency and scaling: reserved, provisioned, throttling

Understand Lambda's "scaling" with numbers.

Concurrency = the number of requests being processed simultaneously. One execution environment per request.
Little's Law: concurrency = average requests/sec × average processing time (sec). Example: 100 req/s × 0.5s = 50. This is the estimation formula for the needed concurrency.
The account's default concurrency cap is 1,000 (per region, can be raised by request). Furthermore, the RPS cap is 10× the account concurrency (10,000 RPS by default) — it matters for a function whose processing is short, a few dozen ms.
The scale speed is "1,000 environments per 10 seconds per function." Each function can burst at this speed independently (the old region-shared burst quota has been removed).

There are 2 levers to control concurrency. Let me fix them in a table so as not to confuse them.

Lever	What it does	Billing	Main use
Reserved concurrency	sets an upper and lower bound on the function's concurrency. Other functions can't use that quota	no additional charge (but consumes the account quota)	protecting downstream (RDS, etc.), containing a runaway, emergency stop with `0` (kill switch)
Provisioned concurrency	pre-initializes environments to eliminate the cold start	additional charge	a latency-focused API where a double-digit-millisecond response is mandatory

Throttling: when the concurrency (or the 10× RPS) cap is exceeded, a synchronous invocation returns HTTP 429 (TooManyRequestsException). Setting reserved concurrency = 0 rejects all requests with 429, so it's usable as a kill switch during an incident. Async and event sources are held in an internal queue and retried when throttled (chapter 6).

The use distinction and cost-effectiveness of provisioned concurrency, SnapStart, and cold starts could be a whole article by itself. I separated the details into the Lambda cold-start optimization guide.

5. Invocation models and handling failures: sync / async / event source

"Who retries" is decided by the invocation model. Mistaking this here results in either failures silently disappearing or getting stuck forever.

Model	Example	Who retries
Synchronous (RequestResponse)	API Gateway, Function URL, direct Invoke	Lambda doesn't auto-retry. The caller judges
Asynchronous (Event)	S3 notification, SNS, EventBridge rule	Lambda retries in an internal queue (2 times by default, 3 total)
Event source mapping	SQS, Kinesis, DynamoDB Streams	Lambda polls and invokes synchronously. Retry follows the source spec

5.1 Asynchronous invocation: 2 retries by default, receive the trace with Destinations

An asynchronous invocation enters an internal queue and, on a function error, retries 2 times by default (3 total) (waits 1 minute until the 1st, 2 minutes until the 2nd). There are 2 control parameters:

MaximumRetryAttempts: 0–2 (default 2).
MaximumEventAgeInSeconds: 60–21600 (= up to 6 hours). Discards events that are too old.

For the destination of a failed event, recommend not the legacy DLQ (SQS/SNS only) but On-failure Destinations. Destinations can target SQS / SNS / Lambda / EventBridge (also S3 on failure), and the function's response details are recorded too. You can also specify a success destination (on-success).

# Terraform: 非同期呼び出しのリトライ上限・イベント寿命・失敗時の宛先
resource "aws_lambda_function_event_invoke_config" "orders" {
  function_name                = aws_lambda_function.orders.function_name
  maximum_retry_attempts       = 2      # 0〜2（既定2）
  maximum_event_age_in_seconds = 3600   # 60〜21600。1時間より古い失敗は捨てる

  destination_config {
    on_failure {
      destination = aws_sqs_queue.orders_dlq.arn # 調査・再処理できる隔離先へ
    }
  }
}

5.2 Event source (SQS/streams): prevent "redo everything" with partial batch failure

An event source mapping invokes the function in a batch. Here's the mine a beginner definitely steps on: if you throw an exception during batch processing, the whole batch — including the already-succeeded — is reprocessed.

What prevents this is partial batch response. Set FunctionResponseTypes=ReportBatchItemFailures, and the function returns only the failed IDs in batchItemFailures (see chapter 1's TS code). It's a common mechanism for SQS, Kinesis, and DynamoDB Streams. In streams, BisectBatchOnFunctionError (bisect the batch on failure and retry), MaximumRetryAttempts / MaximumRecordAgeInSeconds, and an On-failure destination prevent the accident of "a poison record clogging a shard for up to 7 days."

The details of an idempotent SQS×Lambda consumer, the visibility timeout, and DLQ design are in building idempotent async processing with SQS + Lambda + EventBridge, and CDC/event-driven starting from DynamoDB Streams is in DynamoDB Streams × Lambda event-driven architecture — each has a dedicated article.

6. Idempotency: turn retries into the "normal path"

In an at-least-once world, idempotency is the very foundation of reliability. Make it "even if the same event comes twice, the charge is once, the inventory allocation is once." If you build this yourself, you need to correctly implement the idempotency key's conditional insertion, TTL, and an in-progress lock, and a hole tends to open in race conditions.

For a new implementation, the idempotency utility of Powertools for AWS Lambda is the standard. It derives an idempotency key from the payload (or a specific field) and records the processing state in DynamoDB. During processing it locks with an INPROGRESS record and rejects a concurrent duplicate invocation. After completion it caches and returns the result.

# Python: Powertools の冪等化。デコレータ1つで「2回来ても1回」にする。
from aws_lambda_powertools.utilities.idempotency import (
    DynamoDBPersistenceLayer, idempotent_function, IdempotencyConfig,
)

persistence = DynamoDBPersistenceLayer(table_name="idempotency")
# 全ペイロードではなく注文IDだけを冪等性キーにする（再送で payload が微妙に変わっても同一視）
config = IdempotencyConfig(event_key_jmespath="id", expires_after_seconds=3600)

@idempotent_function(data_keyword_argument="order", persistence_store=persistence, config=config)
def charge(order: dict) -> dict:
    # ここが2回呼ばれても、課金は1回しか実行されない
    return run_charge(order)

// TypeScript: @aws-lambda-powertools/idempotency（peer依存に @aws-sdk/client-dynamodb と lib-dynamodb）
import { makeIdempotent } from "@aws-lambda-powertools/idempotency";
import { DynamoDBPersistenceLayer } from "@aws-lambda-powertools/idempotency/dynamodb";

const persistenceStore = new DynamoDBPersistenceLayer({ tableName: "idempotency" });
export const handler = makeIdempotent(async (event: { id: string }) => runCharge(event), {
  persistenceStore,
});

The design crux: choose the idempotency key so that "the same business operation has the same value" (order ID, billing period, etc.). Keying the whole payload causes a timestamp, etc., on retry to mix in and be misidentified as a different thing. The TTL (default 3600 seconds) should be longer than "the window in which a resend can come." The reason I could achieve 0 double charges on the payment platform is that I matched this idempotency-key design to the business invariant. For the theoretical background of idempotency, also see idempotent payment design (double-charge prevention).

7. Observability: zero-code-change structured logs + metrics + traces

Create a state where "you can trace a stalled process at a glance." Lambda automatically sends logs to CloudWatch Logs (the execution role needs AWSLambdaBasicExecutionRole). In production, firm it up with a three-piece set.

JSON structured logs + log-level control (no code change). With Lambda's "advanced logging settings," you can switch the log format (Text/JSON) and the application log level (TRACE–FATAL, default INFO) in the function settings (the JSON format is the premise). It makes console.log searchable/aggregatable in JSON, and realizes "INFO in production, raise to DEBUG only during investigation" with no deploy.
Powertools' Logger / Tracer / Metrics. Logger is structured logs with a correlation ID, Tracer is a thin wrapper of X-Ray, and Metrics writes metrics to standard output in EMF (Embedded Metric Format), which CloudWatch extracts asynchronously (since it doesn't call PutMetricData synchronously, it's fast and cheap).
X-Ray active tracing. Setting it to Active auto-generates traces of sampled requests. It needs permissions like xray:PutTraceSegments (AWSXRayDaemonWriteAccess).

# Powertools: 構造化ログ・トレース・EMFメトリクスを最小コードで
from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.metrics import MetricUnit

logger = Logger(service="orders")     # JSON構造化ログ＋相関ID
tracer = Tracer(service="orders")     # X-Ray セグメント
metrics = Metrics(namespace="Orders") # EMFでCloudWatchメトリクス（同期API呼び出し不要）

@metrics.log_metrics            # 集計したメトリクスをフラッシュ
@tracer.capture_lambda_handler  # ハンドラをトレース
@logger.inject_lambda_context   # request_id等を全ログに付与
def lambda_handler(event, context):
    metrics.add_metric(name="OrdersProcessed", unit=MetricUnit.Count, value=1)
    logger.info("processing order", extra={"order_id": event.get("id")})
    return {"ok": True}

Alert on symptoms: Errors / Throttles / Duration(P99) / ConcurrentExecutions / the failure-queue count of Destinations (notify immediately on 1). Adding Lambda Insights (an extension layer) gets even CPU, memory, and cold starts as one event per invocation. For the system of observability, see building production observability with OpenTelemetry.

8. Security: least privilege, secrets, VPC, public endpoints

Lambda security starts from understanding "two policies" separately.

Execution role: the IAM role the function assumes. Access rights to AWS services are here. With least privilege. Start from AWSLambdaBasicExecutionRole (logs permission only) and add only the necessary operations. Generating the policy from CloudTrail usage history with IAM Access Analyzer is safe.
Resource-based policy: who can invoke this function. When an AWS service invokes it, only this is evaluated.

// 実行ロールの最小権限例：この関数が触る1テーブルへの必要操作のみ。ワイルドカード '*' を避ける
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["dynamodb:PutItem", "dynamodb:GetItem"],
    "Resource": "arn:aws:dynamodb:ap-northeast-1:123456789012:table/orders"
  }]
}

Secrets: environment variables are encrypted at rest with KMS (the default is the AWS managed key, a CMK depending on requirements). But putting an API key or DB credentials in plaintext in an environment variable is not recommended. The official guidance is to put them in Secrets Manager / SSM Parameter Store and fetch/cache them with the AWS Parameters and Secrets Lambda Extension (default TTL 300 seconds). The app just "reads from the extension's local endpoint" and can follow rotation too.

VPC: to connect to a private RDS, etc., attach Lambda to the VPC. A NAT gateway is needed for outbound communication (just placing it in a private subnet doesn't reach the internet). The ENI Lambda creates is a Hyperplane ENI, shared/reused among functions of the same subnet + security group, so it scales fast even in a VPC. To AWS services, you can reach via a VPC endpoint without a NAT.

Function URL: an HTTPS endpoint directly connected to the function. Authentication is AWS_IAM or NONE. Choose AWS_IAM for anything but a public API. If you make it NONE (public), at minimum tighten it with WAF, rate limiting (reserved concurrency), and the lambda:InvokedViaFunctionUrl condition key. From October 2025, a new Function URL requires both lambda:InvokeFunctionUrl and lambda:InvokeFunction permissions.

9. Cost optimization: understand the bill's formula and make it cheap with speed

Lambda's pricing is two factors. If you know the formula, you can read the bill.

monthly ≒ request charge + execution-time charge
  request charge        = total requests × $0.20 / 1M
  execution-time charge = allocated memory (GB) × execution time (sec, rounded up in 1ms units) × unit price (GB-sec)

The x86 execution-time unit price is about $0.0000166667 / GB-sec (first tier, US East basis). Arm64 (Graviton2)'s execution charge is 20% cheaper than x86. If compatible, Arm64 is basically advantageous.
Free tier: 1 million requests + 400,000 GB-sec per month.
Execution time is rounded up in 1ms units. Making /tmp over 512MB, and using provisioned concurrency, are separate charges.
From August 2025, INIT time is billed too (unified to include managed-runtime + ZIP on-demand functions too). Heavy initialization rides on cost too.

9.1 Memory tuning is the single knob of "speed = cost"

CPU is allocated proportionally to memory, and 1,769MB equals 1 vCPU. That is, raising memory increases CPU too, and CPU-bound processing often shortens in time and lowers the total cost (the reversal happens that finishing fast at 1024MB is cheaper than running slow at 128MB). The iron rule is to decide memory by measurement, not guessing. Find the cost-minimum point with AWS Lambda Power Tuning (an official sample).

The way to make it cheaper	How it works
Switch to Arm64	execution charge 20% off. Just rebuild on many runtimes
Optimize memory by measurement	CPU-bound often shortens with more memory → total decreases
Make the package small	shorten the cold start = reduce INIT charge and latency
Reuse connections outside the handler	shorten the execution time of warm invocations
Erase unnecessary synchronous waits	you're billed even while waiting on an external API. Parallelize/make async

10. Testability: logic on your machine, integration in the cloud

Think of Lambda testing in two layers.

Unit tests: keep the handler thin (chapter 2) and run the business logic as a Lambda-independent pure function with normal unit tests. The context can be a mock. Catch the most bugs here — fastest, cheapest.
Integration tests: AWS clearly states "prioritize testing in the cloud." Emulators (sam local invoke / start-api, LocalStack, etc.) are fast, but they can't fully reproduce IAM, service settings, quotas, and API diffs, producing "passes on my machine but fails in the cloud." Use emulators sparingly and run integration/E2E against the real services in the cloud — that's the official recommendation.

# 単体テスト：純粋ロジックは Lambda 無しで即テストできる（第2章の domain.py）
import pytest
from domain import Order, total_with_tax

def test_total_with_tax_rounds_half_up():
    assert total_with_tax(Order(id="o1", amount=1000), 0.10) == 1100

def test_negative_amount_rejected():
    with pytest.raises(ValueError):
        total_with_tax(Order(id="o2", amount=-1), 0.10)

# 結合の入口：手元で1度だけ叩く（Docker必須）。本格的な検証はクラウドにデプロイして行う
sam local invoke OrdersFunction --event events/order.json

11. Summary: a production-Lambda cheat sheet

Finally, a quick reference for when you're lost.

Execution model: Init→Invoke→Shutdown. Connections / SDK clients outside the handler once. Generate a unique ID fresh inside the handler.
Handler: a thin adapter. Carve the business logic into a Lambda-independent pure function and test it.
Package: ZIP + layers first, container when over 250MB. The whole node_modules is NG; tree-shake.
Scale: needed concurrency = RPS × processing seconds. Downstream protection with reserved, latency guarantee with provisioned, emergency stop with reserved=0.
Failure design: sync has no auto-retry. Async is 2 by default + On-failure Destinations. Event sources prevent redoing everything with partial batch failure.
Idempotency: retry is a spec. Powertools idempotency + a business-aligned idempotency key + TTL.
Observability: JSON structured logs + log-level control (zero code change), Powertools Metrics (EMF), X-Ray. Alert on symptoms.
Security: the execution role is least privilege. Secrets in Secrets Manager + cache with the extension. VPC is NAT/endpoints. Function URL is AWS_IAM in principle.
Cost: memory × time (1ms) + requests. 20% off with Arm64, optimize memory by measurement, INIT is billed too.

Lambda isn't "easy because it's serverless"; it's the work of designing idempotency, observability, and cost on top of the constraints of at-least-once, cold start, least privilege, and the billing formula. On a multi-tenant payment platform in the environmental field, I firmed up monthly bulk billing and CO2 aggregation with idempotency, automatic reprocessing, least privilege, and symptom-based alerts, and achieved 0 double charges or balance inconsistencies during production operation.

"I want to make this workload of ours fast, cheap, and self-recovering even when it crashes, with Lambda" — from that design to implementation, operation, and cost optimization, I accompany you end-to-end at the speed of one person × generative AI (Claude Code). Even from the compute-foundation-selection stage (Lambda or Fargate), please feel free to consult me.

References (official documentation)

Lambda execution environment lifecycle — the 3 phases Init/Invoke/Shutdown, cold start, the reuse of objects outside the handler
AWS Lambda standardizes billing for the INIT phase — the INIT-billing unification from August 1, 2025
Lambda quotas — memory 128MB–10,240MB (1 vCPU at 1,769MB), timeout 900 seconds, payload, /tmp, package size, concurrency 1,000
Lambda runtimes — supported runtimes and the deprecation schedule, x86_64/arm64
Building Lambda functions with Node.js / Python — handler and context
Lambda concurrency / Scaling behavior — concurrency, Little's Law, reserved/provisioned, 1,000 environments per 10 seconds per function
Asynchronous invocation / Error handling and retries — 2 retries by default, MaximumRetryAttempts, MaximumEventAge, Destinations
Idempotency - Powertools for AWS Lambda — @idempotent / IdempotencyConfig / DynamoDBPersistenceLayer
Advanced logging controls — JSON format, log-level control (no code change)
Lambda security & permissions / Env var encryption / VPC — execution role, resource-based policy, KMS, Hyperplane ENI
Testing Lambda functions — the cloud-first testing strategy
AWS Lambda Pricing — request/execution-time charge, free tier, Arm/provisioned//tmp pricing

AWS Lambda production-operation guide: firm up the execution model, idempotency, observability, security, and cost with the official spec

0. Mental model: Lambda = "event → a disposable execution environment → billed for what you use"

1. The execution environment's lifecycle: the 3 phases and the meaning of "outside the handler"

1.1 "Set up connections outside the handler" is the most important pattern

2. How to make the handler: a thin adapter + testability

2.1 Carve the business logic into "a function that doesn't know Lambda"

3. Packaging: ZIP / container / layer

4. Concurrency and scaling: reserved, provisioned, throttling

5. Invocation models and handling failures: sync / async / event source

5.1 Asynchronous invocation: 2 retries by default, receive the trace with Destinations

5.2 Event source (SQS/streams): prevent "redo everything" with partial batch failure

6. Idempotency: turn retries into the "normal path"

7. Observability: zero-code-change structured logs + metrics + traces

8. Security: least privilege, secrets, VPC, public endpoints

9. Cost optimization: understand the bill's formula and make it cheap with speed

9.1 Memory tuning is the single knob of "speed = cost"

10. Testability: logic on your machine, integration in the cloud

11. Summary: a production-Lambda cheat sheet

References (official documentation)

Connecting from Lambda to RDS/Aurora: RDS Proxy, Data API, VPC design to prevent connection exhaustion, and cost optimization

Lambda testing strategy: designing unit/integration/E2E, SDK mocking, sam local, and verifying in the cloud

Building a production HTTP API with Lambda: choosing among API Gateway (REST/HTTP API), Function URLs, and ALB, plus auth, validation, and error design

Safe Lambda deployment: versions, aliases, canary releases (CodeDeploy), and SAM/CDK/Terraform selection

Also worth reading

DynamoDB Streams × Event-Driven Architecture / CDC Complete Guide (2026 Edition): Safely Propagating Change Data with Lambda and EventBridge Pipes

Building Idempotent Async Processing with SQS + Lambda + EventBridge: Duplicate, Ordering, and DLQ Design on the At-Least-Once Premise

DynamoDB Single-Table Design & Production Reliability Patterns — The Complete Guide (2026 Edition): Idempotency, Conditional Writes, and Transactions in Real Code

0. Mental model: Lambda = "event → a disposable execution environment → billed for what you use"

1. The execution environment's lifecycle: the 3 phases and the meaning of "outside the handler"

1.1 "Set up connections outside the handler" is the most important pattern

2. How to make the handler: a thin adapter + testability

2.1 Carve the business logic into "a function that doesn't know Lambda"

3. Packaging: ZIP / container / layer

4. Concurrency and scaling: reserved, provisioned, throttling

5. Invocation models and handling failures: sync / async / event source

5.1 Asynchronous invocation: 2 retries by default, receive the trace with Destinations

5.2 Event source (SQS/streams): prevent "redo everything" with partial batch failure

6. Idempotency: turn retries into the "normal path"

7. Observability: zero-code-change structured logs + metrics + traces

8. Security: least privilege, secrets, VPC, public endpoints

9. Cost optimization: understand the bill's formula and make it cheap with speed

9.1 Memory tuning is the single knob of "speed = cost"

10. Testability: logic on your machine, integration in the cloud

11. Summary: a production-Lambda cheat sheet

References (official documentation)

Related articles

Connecting from Lambda to RDS/Aurora: RDS Proxy, Data API, VPC design to prevent connection exhaustion, and cost optimization

Lambda testing strategy: designing unit/integration/E2E, SDK mocking, sam local, and verifying in the cloud

Building a production HTTP API with Lambda: choosing among API Gateway (REST/HTTP API), Function URLs, and ALB, plus auth, validation, and error design

Safe Lambda deployment: versions, aliases, canary releases (CodeDeploy), and SAM/CDK/Terraform selection

Also worth reading

DynamoDB Streams × Event-Driven Architecture / CDC Complete Guide (2026 Edition): Safely Propagating Change Data with Lambda and EventBridge Pipes

Building Idempotent Async Processing with SQS + Lambda + EventBridge: Duplicate, Ordering, and DLQ Design on the At-Least-Once Premise

DynamoDB Single-Table Design & Production Reliability Patterns — The Complete Guide (2026 Edition): Idempotency, Conditional Writes, and Transactions in Real Code