A serverless payments platform in the environmental sector (full-stack development; led the payment-reliability layer)
Implemented across the 4 surfaces (customer / merchant / admin / in-store terminal — backend & frontend) plus shared foundation, CI/CD, and observability / DR / IAM | achieved 0 double charges in production via idempotency, atomic transactions, and zero-downtime migration
Client
A multi-tenant payments platform handling environmental / carbon credits (J-Credits) / regional currencies / points / e-commerce (an AWS serverless foundation) | Setup: team development (3 main developers). As a core engineer responsible for ~60% of the repo's commits (403 of 694), implemented across the customer, merchant, admin, and in-store-terminal frontends/backends plus the shared foundation and infrastructure (observability, DR, IAM, CI/CD), and especially designed and led the payment-reliability layer (idempotency, atomic balance updates, zero-downtime migration).
My role
As a core engineer on a team (3 main developers), responsible for ~60% of the repo's commits (403 of 694). Implemented across the customer app (React 19 / Vite / MUI / TanStack Query), the merchant and admin frontends, the internal dashboard (Next.js / Mantine), the 4 Python serverless backends (customer, merchant, admin, in-store terminal / AWS SAM), the shared Lambda Layer, auth (Cognito custom auth, card PIN), observability (CloudWatch alarms, Slack notifications, structured logs), DR (AWS Backup, Vault Lock, PITR), IAM, and CI/CD (GitHub Actions, mypy strict). Designed and led the payment-reliability layer in particular (idempotency, atomic balance updates, zero-downtime migration via mirror writes).
Challenge (Situation & Task)
Real money, points, carbon credits (J-Credits), and regional currency had to be handled — serverlessly — across multiple user surfaces: customer app, merchant, admin, and in-store terminal, on a multi-tenant payments foundation. Double charges and balance inconsistencies were never permitted, and the data model had to keep evolving without ever stopping production. It also had to satisfy production-grade observability, resilience (DR), and security at the same time. The requirement across the whole design was to guarantee "correctness" not with operational carefulness, but with the structure of the code and DynamoDB's consistency primitives.
It had to satisfy often-conflicting requirements at once: financial-grade accuracy, evolution without downtime, and production-grade quality.
-
Multi-surface × multi-value: four user surfaces (customer / merchant / admin / in-store terminal) × multiple values (cash-equivalent, points, J-Credits, regional currency) had to be handled with a consistent balance/transaction model. Because duplicate implementations breed inconsistency, consolidating shared logic (SSoT) was essential.
-
Eliminating double charges (exactly-once): mobile-network timeouts and Lambda / API Gateway retries make the same payment request arrive multiple times. Retries themselves had to be accepted as the normal path while charges converged to exactly once.
-
Balance consistency under contention (atomicity): even with concurrent operations on the same card / customer (payment, charge, refund), the balance must not go negative or be deducted twice. Because a typical read-modify-write creates races, they had to be structurally excluded.
-
Zero-downtime migration: an old model where balance, points, J-Credits, and profile lived in one giant record (a God Record) had to be migrated to a new schema split by concern — without stopping production.
-
Production-grade quality: observability to detect failure precursors, DR (backup/recovery) against data loss, and security protecting PINs and personal information had to be built in at the same priority as features.
Why these technologies (Rationale)
AWS Lambda + DynamoDB + SAM (serverless × IaC): adopted as a payments foundation that follows demand while keeping server-operations load low. All 4 stacks are codified with AWS SAM (CloudFormation) to ensure reproducibility and reviewability.
A shared Lambda Layer (SSoT / DRY): core logic — balance, idempotency, transaction history, auth attributes — is consolidated in a shared Layer for common use across the 4 backends (customer / merchant / admin / terminal). A "fix once, reflected in every Lambda" structure eliminates inconsistency and duplication.
DynamoDB TransactWriteItems (atomic ADD + condition expressions): adopted to achieve contention-resilient balance updates without locks. Atomic
ADDincrements/decrements +ConditionExpression(balance floor, charge cap) exclude read-modify-write races at design time.A pure-function transaction builder: designed to have no side effects (DB I/O) and return only
TransactItemdicts, so payment logic is unit-testable and golden-vector-fixable without DynamoDB.Idempotency key + attribute_not_exists + TTL: a client-issued key is appended to the sort key, and conditional insertion blocks double execution. A TTL (default 90 days) auto-expires it to optimize storage cost.
React 19 + Vite + MUI + TanStack Query (4 frontends): built the customer, merchant, and admin UIs with type safety (TypeScript), API-cache efficiency, and a11y. The internal dashboard is Next.js + Mantine, with Sentry error observability on all frontends.
Cognito custom auth + card PIN (PBKDF2) + least-privilege IAM: realized diverse sign-in via CUSTOM_AUTH challenges (LINE / email OTP, etc.), with card PINs hashed by PBKDF2-HMAC (100,000 iterations, 32-byte salt, constant-time comparison). IAM roles are separated to least privilege per stack.
What I did (Action)
[Full-stack cross-cutting implementation] Implemented 4 frontends — the customer app (React 19 / Vite / MUI 7 / TanStack Query / AWS Amplify), merchant frontend (React 19 / MUI 6), admin frontend (React 18 / MUI 5), and internal dashboard (Next.js 16 / Mantine 8) — and 4 Python serverless backends (customer, merchant, admin, in-store terminal / AWS SAM, OpenAPI definitions). Owned feature groups across card payments, charging (with expiry), e-commerce (cart / order / shipping), J-Credits, regional currency, points, stamp rallies, lotteries, rankings, and coupons.
[Double-charge prevention via idempotency] A client-issued idempotency key is appended to the sort key
card_op_idem#<op>#<key>, and conditional insertion withattribute_not_existsstructurally excludes duplicate payments. Keys auto-expire via TTL (default 90 days). The Stripe webhook includes an event-ID-based idempotency marker within the same transaction, so if the handler fails the marker isn't written either — meaning Stripe's re-send correctly reprocesses.[Atomic balance updates] Eliminated read-modify-write and prevented balance inconsistency under contention with
ADD+ConditionExpression. Balance floor (balance ≥ amount) and charge cap (balance + amount ≤ cap) are expressed as condition expressions; on failure the whole transaction is rolled back and the cause is accurately classified intoINSUFFICIENT/CAP_EXCEEDED/CONFLICT/THROTTLEDand mapped to HTTP statuses.[Retrying only transient conflicts] Only
TransactionConflictfrom optimistic concurrency control is retried up to 3 times with exponential backoff (base 50ms × 2^n) + jitter (±50%).ConditionalCheckFailed(semantic failures such as insufficient balance or idempotency collision) is propagated immediately without retry. Reason-code judgment is SSoT'd in a shared module, and on conflict a CloudWatch metric fires so an alarm detects spikes.[Zero-downtime migration (mirror writes)] Split the God Record into "dual write → read-switch & dedup → remove old data," performing 13+ phases without downtime. Each builder operates idempotently with atomic
ADD/if_not_exists, and deletion intent is expressed with an explicitCLEARsentinel distinct fromNone. When no write is needed it returns an empty array to no-op.[Amount precision] Amounts and CO2 conversion (the rate is
Decimal('0.01')) are processed consistently withDecimal, rejectingfloatat both the type and runtime levels. Eliminates accumulated rounding error, verified with unit tests including boundary conditions.[Auth & security] Implemented Cognito custom auth (sign-up / post-confirmation hook / CUSTOM_AUTH challenges), with card PINs hashed by PBKDF2-HMAC (100,000 iterations, 32-byte CSPRNG salt, constant-time comparison). Input is validated at the boundary, and IAM is separated to least privilege per stack. Logs mask email and phone and never output PINs or tokens.
[Observability, resilience, idempotent async processing] Made production observable with 20+ CloudWatch alarms + composite alarms and structured logs (Slack notifications by severity). DR is multi-layered with AWS Backup + Vault Lock + PITR + a dedicated DR vault. Monthly bulk billing of employee cards and CO2 aggregation guarantee ordering, idempotency, and auto-reprocessing with SQS FIFO + a dead-letter queue.
[CI/CD, type safety, quality gates] GitHub Actions runs
ruff/black/mypy --strict(disallow_untyped_defs) /pytestautomatically, withpre-commitand a local CI script reproducing the same gate on dev machines. The backend eliminates the Any type and prevents regressions with golden-vector tests.
The design philosophy running through the whole product was to guarantee the "correctness" of payments with the structure of the code and the platform's consistency primitives — not with operational rules or review carefulness.
Consolidation via cross-cutting implementation and a shared Layer (SSoT / DRY): For the four user surfaces — customer, merchant, admin, in-store terminal — implemented across the board from frontend (React / Next.js) to backend (Python / AWS SAM). The balance, idempotency, transaction-history, and auth-attribute logic the 4 backends share is consolidated in a shared Lambda Layer as SSoT. This creates a "fix once, reflected in every Lambda" structure, laying a foundation that doesn't produce inconsistency even across multiple surfaces.
Binding idempotency and atomicity into one transaction:
Payment processing is assembled as side-effect-free pure builder functions (idempotency-marker insertion, balance ADD, history record, metrics update) and committed atomically in a single TransactWriteItems. Because the idempotency key's attribute_not_exists insertion and the balance ADD live in the same transaction, double charges on retry and balance inconsistency under contention are excluded at design time, not at runtime. As a result, double charges and balance inconsistencies in production have remained at 0.
Distinguishing transient conflicts from semantic failures:
The only thing safe to retry is a transient TransactionConflict from optimistic locking. ConditionalCheckFailed — insufficient balance, charge-cap exceeded, idempotency collision — "won't change on retry," so it's propagated immediately, classified into a typed enum, and mapped to an API error.
Swapping the engine while running (zero-downtime migration):
Splitting the balance model was decomposed into idempotent phases — dual write → read-switch (dedup) → remove old data. Because each builder is composed of atomic ADD and if_not_exists, even backfill re-runs or partial failures converge to a unique final state. This completed 13+ phases of schema evolution without stopping live payments for a single second.
Building production-grade quality at the same priority as features: Observability (CloudWatch alarms, Slack notifications, structured logs), resilience (multi-layered DR with AWS Backup, Vault Lock, PITR), security (Cognito auth, PBKDF2 PIN hashing, least-privilege IAM, PII masking), and idempotent async processing (SQS FIFO + DLQ) were implemented while guarded by CI/CD (GitHub Actions, mypy strict, pre-commit). Through these, I held responsibility end-to-end — from frontend UX/a11y to payment accuracy and production observability/resilience.
Key technical decisions
Consolidating into a shared Lambda Layer as SSoT: unifying core logic across the 4 backends (DRY)
Idempotency key + attribute_not_exists + TTL: structurally preventing double charges on retry
atomic ADD + ConditionExpression + transactions: balance consistency under contention
Retrying only TransactionConflict, propagating semantic failures immediately: eliminating wasted retries
Pure-function builders + mypy strict: testable without a DB, eliminating the Any type
AWS SAM / CloudWatch / AWS Backup: ensuring observability and resilience as code
Responsibilities
- Customer, merchant, and admin frontends (React / MUI / TanStack Query) and the internal dashboard (Next.js / Mantine)
- The 4 Python serverless backends (customer, merchant, admin, in-store terminal / AWS SAM / OpenAPI)
- Designing and leading the payment-reliability layer (idempotency, atomic balance updates, mirror-writes zero-downtime migration)
- Consolidating into the shared Lambda Layer as SSoT and making payment logic pure functions
- Auth (Cognito custom auth), card PIN (PBKDF2), least-privilege IAM, input validation
- Observability (CloudWatch alarms, Slack notifications, structured logs) and DR (AWS Backup, Vault Lock, PITR)
- Idempotent async processing of employee-card monthly billing and CO2 aggregation (SQS FIFO + DLQ)
- CI/CD (GitHub Actions, mypy strict, ruff, pytest, pre-commit) and unit tests (boundary conditions, golden vectors)
Technologies
Results in numbers
- Double charges in production
- 0casesStructurally excluded via idempotency + atomic transactions (during live production).
- Zero-downtime migration
- 13phases+Staged schema evolution without downtime.
- Commits owned
- 403commitsA core engineer responsible for ~60% of all repo commits (403 of 694).
- Apps implemented across
- 8appsImplemented across 4 backends + 4 frontends.
Results
- [The biggest outcome] Maintained 0 double charges / balance inconsistencies during live production (structurally excluded via idempotency + atomic transactions)
- Implemented the 4 surfaces (customer / merchant / admin / in-store terminal — backend & frontend) plus the shared foundation across the board, leading development as a core engineer responsible for ~60% of all repo commits (403 of 694)
- Structurally prevented double charges on retry with an idempotency key + attribute_not_exists condition (for both customer payments and Stripe webhooks)
- Achieved payment processing that doesn't produce negative balances or inconsistencies under contention, via atomic ADD + condition expressions + transactions
- Retried only TransactionConflict with exponential backoff + jitter up to 3 times and propagated semantic failures immediately, absorbing transient conflicts while eliminating wasted retries
- Evolved the payment data model without stopping the live service for a single second (13+ phases) via staged migration with dual writes
- Processed amounts and CO2 conversion consistently with Decimal, eliminating accumulated rounding error
- Hashed card PINs with PBKDF2-HMAC (100,000 iterations, 32-byte salt, constant-time comparison) and excluded PII / secrets from logs
- Ensured production-grade quality with 20+ CloudWatch alarms + structured logs (Slack notifications) and multi-layered DR (AWS Backup, Vault Lock, PITR)
- Made employee-card monthly billing and CO2 aggregation idempotent, order-guaranteed, and auto-reprocessing with SQS FIFO + DLQ
- Made payment logic pure functions, fixed with DB-free unit tests + golden vectors. Eliminated the any type with GitHub Actions + mypy strict to prevent regressions
同様の課題、抱えていませんか?
あなたのビジネス課題も、最新の技術で解決できます。 まずは30分の無料技術相談から、状況をお聞かせください。
自社の課題もSaaS化できるか相談するプロジェクト単位(請負)・技術顧問、どちらにも対応可能です