# TCP/IP complete guide: turning the mechanism of the 4-layer model, IP, TCP, and UDP into production design with RFCs and real code

> An implementation guide that explains TCP/IP in a form usable for production design. Faithful to IETF primary sources (RFC 1122, 791, 8200, 9293, 768), it systematizes — more clearly than the official docs — the 4-layer model and encapsulation, IP addressing and CIDR, the difference between TCP and UDP, Node.js/TypeScript real code, behavior that matters in production like TIME_WAIT, Keep-Alive, and MTU, and observation/debugging.

- Published: 2026-06-28
- Author: 友田 陽大
- Tags: TCP/IP, ネットワーク, TCP, UDP, アーキテクチャ設計, 可観測性
- URL: https://tomodahinata.com/en/blog/tcp-ip-protocol-suite-fundamentals-complete-guide
- Category: TCP/IP・ネットワーク

## Key points

- 'TCP/IP' isn't a single standard but the collective name for the protocol suite that runs the internet. The IETF defines it in RFC 1122 as the four layers Link/Internet/Transport/Application.
- IP (RFC 791 / IPv6 is RFC 8200) is best-effort — it guarantees neither delivery, order, nor deduplication. On top of it, TCP (RFC 9293) adds reliability, ordering, flow control, and congestion control; UDP (RFC 768) adds only minimal multiplexing.
- TCP creates reliability with retransmission (at-least-once). That means duplicates can reach the app. In a domain like payments, the pro's design is to not fully rely on TCP's reliability and create exactly-once with idempotency.
- Production-perceived quality is decided by 'operational knowledge beyond the RFCs' like TIME_WAIT, Keep-Alive, Nagle/delayed ACK, connection pools, MTU/MSS, and backlog. I concretize it with code examples.
- Failure analysis reads TCP state transitions with ss / tcpdump / ip. If you know the three-way handshake and the 11-state state machine, you can triage 'it won't connect' by fact, not by hunch.

---

"Just leave the network to someone who knows it well" — you can stay thinking that only while production is smooth. `ECONNRESET` floods the logs. Connections clog only right after a deploy. An API across an L7 load balancer times out at 30 seconds. A payment request comes back in a state of "did it succeed or fail, I can't tell." **These are all not app bugs but the behavior of TCP/IP itself.** Without knowing the mechanism, the logs stay "incantations," and each time you add a symptomatic-treatment retry, the situation worsens.

This article is an implementation guide to turn TCP/IP from "memorizing for a certification exam" into **knowledge usable for production design judgments.** It pins down what each layer guarantees and doesn't guarantee with IETF primary sources (RFCs), runs it with Node.js/TypeScript real code, and connects, as a continuum, to "operational knowledge not written in the official docs but definitely effective in production" (TIME_WAIT, Keep-Alive, MTU, connection pools). As subject matter, I mix in the judgments from the [serverless payment platform](/case-studies/payment-platform-reliability) where I designed and led the payment-reliability layer (on the premise of mobile-line timeouts and retransmissions, achieving **0 double charges in production** with idempotency).

> **The rules of this article**: protocol provisions are based on **IETF primary sources (RFCs).** The cores are **TCP = [RFC 9293](https://www.rfc-editor.org/rfc/rfc9293.html) (August 2022, obsoletes RFC 793 and others)**, **UDP = [RFC 768](https://www.rfc-editor.org/rfc/rfc768.txt) (1980)**, **IPv4 = [RFC 791](https://www.rfc-editor.org/rfc/rfc791)**, **IPv6 = [RFC 8200](https://www.rfc-editor.org/rfc/rfc8200)**, and **host requirements and layering = [RFC 1122](https://www.rfc-editor.org/rfc/rfc1122)**. Since RFCs are revised/obsoleted, always confirm the latest version at [rfc-editor.org](https://www.rfc-editor.org/) before production design. The code is arranged to run with Node.js's standard libraries (`net`, `dgram`), but **ports, hosts, and timeout values are on the premise of environment variables,** and in production always adjust based on observed values.

---

## 0. First, the standing: "TCP/IP" isn't a single standard

Before design, let's pin down the true nature of the term in three lines.

- **TCP/IP is the collective name for the "protocol suite."** It's a name borrowing the names of the two representative protocols (TCP and IP), and its substance is a collection of many protocols: IP, TCP, UDP, ICMP, ARP, DNS, etc.
- **The IETF defines this as "layers."** [RFC 1122](https://www.rfc-editor.org/rfc/rfc1122) (Host Requirements) provides for internet communication divided into the four layers **Application / Transport / Internet / Link.** The OSI reference model (7 layers) taught in school is for conceptual organization; the canon of implementation is this one.
- **Each layer is stacked on the premise of "not trusting the layer below."** IP doesn't guarantee delivery. So TCP creates reliability on top of it. Understanding this "boundary of guarantees" is the one and greatest crux of learning TCP/IP.

This article proceeds from bottom to top — the Internet layer (IP) → the Transport layer (TCP/UDP) → implementation and operation in the app — on the axis of "what each layer guarantees and doesn't guarantee."

---

## 1. The 4-layer model and encapsulation — how data is wrapped and flies

### 1.1 The 4-layer model and the correspondence with OSI

| RFC 1122 layer | Role (what it guarantees) | Representative protocols | Address unit | Data name |
| --- | --- | --- | --- | --- |
| Application | Meaning between apps (HTTP, gRPC, etc.) | HTTP, DNS, TLS, SMTP | — | message |
| Transport | Inter-process multiplexing, (TCP) reliability | **TCP, UDP** | port number | segment / datagram |
| Internet | Host-to-host end-to-end delivery | **IP, ICMP** | IP address | packet |
| Link | Node-to-node transfer within the same link | Ethernet, Wi-Fi, ARP | MAC address | frame |

The correspondence with OSI's 7 layers is roughly as follows. The concerns corresponding to OSI's "session layer and presentation layer" (TLS, character encoding, compression) are, in TCP/IP, handled collectively by the Application layer.

```text
OSI 7 layers       TCP/IP 4 layers (RFC 1122)
Application  ┐
Presentation ├──►  Application   (HTTP, TLS, DNS, gRPC)
Session      ┘
Transport    ───►  Transport     (TCP, UDP)
Network      ───►  Internet      (IP, ICMP)
Data Link    ┐
Physical     ┴──►  Link          (Ethernet, Wi-Fi)
```

### 1.2 Encapsulation — "nested envelopes"

On sending, the upper layer's data is wrapped in order by the lower layers' headers. This is **encapsulation.** By the time an HTTP request of `GET /` physically flies, it's wrapped like this.

```text
[ Ethernet header [ IP header [ TCP header [ HTTP data ] ] ] FCS ]
   └ Link layer    └ Internet layer └ Transport layer └ Application layer
```

The receiving side unpacks in reverse order (**decapsulation**), peeling off each layer's header and passing it up. **What's important is that each layer looks at only "its own header."** The IP layer doesn't know TCP's contents, and the TCP layer doesn't know HTTP's meaning. This **separation of concerns** is the core of the design that has made TCP/IP scale on a half-century scale (a good example of SRP working in protocol design).

> **Implication for design**: this nesting means each layer can evolve independently. HTTP/1.1 → HTTP/2 was realized without changing TCP, and HTTP/3 conversely replaced the Transport layer from TCP to QUIC (UDP-based). Being conscious of "which layer you replace and what changes" raises the resolution of technology selection.

---

## 2. The Internet layer: IP is "best-effort" — it doesn't guarantee delivery

### 2.1 What IP guarantees and doesn't guarantee

The role of IP defined by [RFC 791](https://www.rfc-editor.org/rfc/rfc791) (IPv4) and [RFC 8200](https://www.rfc-editor.org/rfc/rfc8200) (IPv6) is **"to do its best to deliver a packet to the destination IP address"** — just that. Concretely, it **doesn't guarantee** the following.

- **No delivery guarantee**: it can be discarded mid-route (congestion, TTL expiry, routing failure).
- **No order guarantee**: the order can be swapped going through a different route.
- **No deduplication**: the same packet can arrive multiple times.
- **The guarantee of integrity is limited**: the IPv4 header has a checksum, but it covers **the header only.** The integrity of the payload is left to the Transport layer (the TCP/UDP checksum). IPv6 abolished the header checksum itself.

In other words, IP is "mail that posts but doesn't guarantee delivery." This very resignation produces the design responsibility for the upper TCP of "where to create reliability."

### 2.2 IP addresses and CIDR — narrowed to the knowledge you definitely use in production

IPv4 is 32 bits (e.g., `192.0.2.10`), IPv6 is 128 bits (e.g., `2001:db8::1`). What's essential in production infrastructure design is the following three points.

**(1) CIDR notation** ([RFC 4632](https://www.rfc-editor.org/rfc/rfc4632)): the `/16` of `10.0.0.0/16` means "the upper 16 bits are the network portion." VPC subnet design can't start without being able to read this.

```text
10.0.0.0/16   → 10.0.0.0 - 10.0.255.255   (65,536 addresses, 65,534 hosts)
10.0.1.0/24   → 10.0.1.0 - 10.0.1.255     (256 addresses, 254 hosts)
              * In each subnet, two — the network address and the broadcast — are reserved
```

**(2) Private addresses** ([RFC 1918](https://www.rfc-editor.org/rfc/rfc1918)): reserved ranges for closed networks that don't go out to the internet. VPCs and intranets are carved from here.

```text
10.0.0.0/8        (10.x.x.x)            largest scale
172.16.0.0/12     (172.16-172.31.x.x)  medium scale, Docker's default
192.168.0.0/16    (192.168.x.x)         home, small scale
```

**(3) MTU and MSS**: the maximum payload one link can carry (MTU) is usually **1500 bytes** on Ethernet. The **MSS (Maximum Segment Size)** obtained by subtracting the IP/TCP headers from this is the upper limit TCP can send in one segment (typically 1460 bytes). **When the MTU shrinks across a VPN, tunnel, or between clouds, large packets silently drop, becoming the nightmare failure of "only communication of a specific size gets stuck."** On routes where Path MTU Discovery ([RFC 8201](https://www.rfc-editor.org/rfc/rfc8201)) doesn't work, suspecting here hits the mark.

### 2.3 The reading points of the IPv4 header

Memorizing all fields is unnecessary. What you actually look at in failure analysis is only the following.

- **TTL (Time To Live)**: decreases by 1 each time it crosses a router, discarded at 0. It's loop prevention and also the principle of `traceroute`.
- **Protocol**: the kind of upper protocol. **6 = TCP**, **17 = UDP**, **1 = ICMP**. Looking at this value with `tcpdump` gives a guess of the contents.
- **Source / Destination Address**: the source/destination IP. Note that the source is rewritten across NAT.

---

## 3. The Transport layer: TCP and UDP — where to create "reliability"

Once the Internet layer has decided it's "best-effort," **if reliability is needed, someone must create it.** What takes on that responsibility in a full set is TCP; what deliberately doesn't take it on and stays minimal is UDP.

### 3.1 Port numbers — multiplexing multiple processes on one host

The IP address points to "which host," and the **port number (16 bits, 0-65535)** points to "which process on that host." A single communication is uniquely identified by the **5-tuple** `(source IP, source port, destination IP, destination port, protocol)`.

- **Well-known ports (0-1023)**: HTTP=80, HTTPS=443, SSH=22, DNS=53, etc.
- **Ephemeral ports (49152-65535, IANA-recommended)**: temporary ports a client uses dynamically for each connection. **This exhaustion is the essence of the TIME_WAIT problem described later.**

### 3.2 The four guarantees TCP (RFC 9293) provides

According to [RFC 9293](https://www.rfc-editor.org/rfc/rfc9293.html), TCP provides a **connection-oriented reliable stream.** Concretely, the following four.

1. **Reliability**: with sequence numbers, ACKs, and **retransmission,** it recovers lost data.
2. **Ordering**: the receiving side reorders by sequence number and passes it to the app in order.
3. **Flow control**: with the receiving side's **Window field** (the number of bytes it can receive), it controls so that a fast sender doesn't overflow a slow receiver.
4. **Congestion control**: it detects network congestion and autonomously throttles the sending rate (the slow start and congestion avoidance of [RFC 5681](https://www.rfc-editor.org/rfc/rfc5681)). RFC 9293 makes the implementation of congestion control **mandatory.**

The costs of these are "the round trip to establish the connection (3-way handshake)," "holding state," and "head-of-line blocking." **The details of the mechanism (handshake, the 11-state state machine, retransmission, congestion control) are dug into in a dedicated article** (this cluster's "TCP mechanism complete explanation").

### 3.3 UDP (RFC 768) — the decisiveness of deliberately doing nothing

[RFC 768](https://www.rfc-editor.org/rfc/rfc768.txt)'s UDP header is a mere **8 bytes** (source port, destination port, length, checksum). The spec clearly states "**delivery and duplicate protection are not guaranteed.**" There's no connection establishment; it throws a datagram out of the blue.

There are domains where this "decisiveness" becomes a weapon — real-time audio/video, DNS, games, and QUIC (the foundation of HTTP/3). **Which of TCP and UDP to choose, I show the judgment axes, including QUIC/HTTP/3, in this cluster's "The difference between TCP and UDP and how to use them."**

---

## 4. Real code: understand TCP / UDP by "touching" them in Node.js

The theory settles when you move your hands. With just Node.js's standard libraries, feel the difference between TCP and UDP. Firm up the types with TypeScript.

### 4.1 TCP echo server (the `net` module)

```ts
import net from "node:net";

const PORT = Number(process.env.TCP_PORT ?? 9000);

const server = net.createServer((socket: net.Socket) => {
  // 5タプルでこの接続を識別できる
  const peer = `${socket.remoteAddress}:${socket.remotePort}`;
  console.log(`[open] ${peer}`);

  // TCP はバイトストリーム。データは「メッセージ単位」では届かない点に注意（後述）
  socket.on("data", (chunk: Buffer) => {
    console.log(`[recv] ${peer} ${chunk.length} bytes`);
    socket.write(chunk); // そのまま返す（echo）
  });

  socket.on("end", () => console.log(`[end]  ${peer}`)); // 相手が FIN を送った
  socket.on("error", (err) => console.error(`[err]  ${peer} ${err.message}`)); // RST 等
});

server.listen(PORT, () => console.log(`TCP echo listening on :${PORT}`));
```

```ts
// クライアント
import net from "node:net";

const socket = net.createConnection(
  { host: "127.0.0.1", port: Number(process.env.TCP_PORT ?? 9000) },
  () => socket.write("hello tcp"), // 接続確立(3-way handshake 完了)後に送信
);

socket.on("data", (data: Buffer) => {
  console.log("echo:", data.toString());
  socket.end(); // FIN を送って正常クローズ
});
socket.setTimeout(5_000, () => socket.destroy(new Error("idle timeout")));
```

### 4.2 The fatal pitfall: "TCP doesn't preserve message boundaries"

In the server above, even if the client calls `socket.write("AB")` and `socket.write("CD")` consecutively, the server's `data` event might come as **`"ABCD"` once,** or as **two times, `"A"` and `"BCD"`.** **TCP is a byte stream, and there's no guarantee that one `write` corresponds to one `data`** — this is a bug a beginner definitely steps on.

So you need to **implement "framing (delimiting)" yourself at the app layer.** The representative is the "length-prefix" method.

```ts
// 4バイトのビッグエンディアン長 + 本体、というフレームを復元する
class LengthPrefixedDecoder {
  private buf = Buffer.alloc(0);

  /** チャンクを push し、完成したメッセージだけを配列で返す（総関数：未完成なら []） */
  push(chunk: Buffer): Buffer[] {
    this.buf = Buffer.concat([this.buf, chunk]);
    const out: Buffer[] = [];
    while (this.buf.length >= 4) {
      const len = this.buf.readUInt32BE(0);
      if (this.buf.length < 4 + len) break; // まだ本体が揃っていない
      out.push(this.buf.subarray(4, 4 + len));
      this.buf = this.buf.subarray(4 + len); // 消費した分を捨てる
    }
    return out;
  }
}
```

> The reason HTTP and gRPC look "message-oriented" is that such framing (for HTTP, `Content-Length` or chunked; for gRPC, a 5-byte prefix) is **implemented at the upper layer.** When you use raw TCP, engrave in your mind that you must prepare this layer yourself.

### 4.3 UDP (the `dgram` module) — boundaries are preserved, but delivery isn't guaranteed

```ts
import dgram from "node:dgram";

const PORT = Number(process.env.UDP_PORT ?? 9001);
const server = dgram.createSocket("udp4");

// UDP は「1 send = 1 メッセージ」。境界は保たれる。ただし順序も到達も保証されない
server.on("message", (msg: Buffer, rinfo) => {
  console.log(`[recv] ${rinfo.address}:${rinfo.port} "${msg}"`);
  server.send(msg, rinfo.port, rinfo.address); // エコー（届かなくても誰も気づかない）
});
server.bind(PORT, () => console.log(`UDP listening on :${PORT}`));
```

**The essential difference between TCP and UDP shows in the code**: TCP sets up a "connection" and flows bytes but doesn't preserve boundaries. UDP throws per message without a "connection" but has no delivery guarantee. **This asymmetry is the root of all design judgments.**

---

## 5. TCP behavior that matters in production — the "operational knowledge" beyond the RFCs

From here is the area that doesn't appear on certification exams but **definitely matters in production.** I list mainly the points I actually stepped on in a payment platform and dealt with by design.

### 5.1 TIME_WAIT — the true nature of "I closed the connection but can't connect"

The side that **actively closed** a TCP connection (often the client, or an L7 proxy) stays in the **TIME_WAIT state** for **2×MSL (Maximum Segment Lifetime, typically around 60 seconds total)** after closing. This is a safety device "so that an old packet that arrived delayed doesn't pollute a new connection of the same 5-tuple," and is correct behavior provided for by RFC 9293.

The problem is the case of **setting up a large number of high-frequency short-lived connections** (e.g., a proxy connecting anew to the upstream API each time). When TIME_WAIT piles up, **Ephemeral ports are exhausted,** and new connections can't be set up with `EADDRNOTAVAIL`. The countermeasures are in this priority order.

1. **Reuse connections (most important)**: Keep-Alive and connection pools. If you don't close them in the first place, TIME_WAIT isn't born.
2. Adjusting kernel parameters (`net.ipv4.tcp_tw_reuse`, etc.) is a **last resort for symptom relief.** Reducing connections by app design comes first.

### 5.2 Keep-Alive and connection pools — "don't close" is justice

```ts
import net from "node:net";

const socket = net.createConnection({ host, port });

// TCP Keep-Alive：アイドル接続が生きているかを定期的に確認（死活監視）
socket.setKeepAlive(true, 30_000); // 30秒アイドルで keepalive プローブ開始

// Nagle アルゴリズムを無効化：小さなパケットを即送る（後述）
socket.setNoDelay(true);
```

For an HTTP client, **always use a connection pool.** Node.js's `undici` (the substance of `fetch`) and AWS SDK v3 pool by default. The iron rule is to set the pool size and **the idle timeout shorter than the upstream's timeout** (if the upstream cuts first, you grab a dead connection and step on `ECONNRESET`).

### 5.3 The Nagle algorithm and delayed ACK — the worst-compatibility combination

- **The Nagle algorithm**: accumulates small data and sends it together (saving bandwidth).
- **Delayed ACK**: delays the ACK a little and batches it (same as above).

When these two mesh, **"the sender waits for an ACK, the receiver waits for data,"** and a deadlock-like delay of up to several hundred ms occurs. **If you want to flow interactive small RPCs at low latency, cutting Nagle with `setNoDelay(true)` (`TCP_NODELAY`)** is the standard. On the other hand, for throughput-focused bulk transfer, not cutting it can sometimes be better — not "always cut" but **judge by the workload.**

### 5.4 backlog and SYN — "clogs only right after a deploy"

The `backlog` of `listen(backlog)` is **the queue length that holds established connections not yet accepted.** If this is small, right after startup or during a spike, connections **silently drop/delay.** It's rare that Node.js's default (511) is insufficient, but you need to design it together with **the backlog/connection-count limit of the reverse proxy or LB in front.**

### 5.5 Distinguish "three kinds" of timeout

If you lump "timeout" together, you err in design. Separate at least these three.

| Kind | Time it waits for what | Too short | Too long |
| --- | --- | --- | --- |
| Connection timeout | Completion of the 3-way handshake | Cuts even healthy connections | Grabs a dead destination long |
| Idle/receive timeout | Time no data comes | Cuts normal slow responses | Can't detect a hang |
| Overall (request) timeout | The total time of one request | Induces a retry storm | Occupies resources long |

In a payment platform, I designed these in a staircase of **"the more outside, the longer; the more inside, the shorter"** from upstream to the end (a timeout budget). If the inside is longer than the outside, after the outside gives up, the inside keeps running, falling into the worst state of **doing work no one is waiting for while eating up resources.**

---

## 6. Observation and debugging — triage "it won't connect" by fact

If you can read TCP's state transitions, failure triage changes from guessing to observation. Let me list practical tools on Linux.

```bash
# 1) いまの TCP 接続と状態を一覧（ss は netstat の後継・高速）
ss -tan
#   State      Recv-Q Send-Q   Local Address:Port   Peer Address:Port
#   ESTAB      0      0        10.0.1.5:443         10.0.2.9:51324
#   TIME-WAIT  0      0        10.0.1.5:51200       10.0.3.4:5432   ← 大量なら 5.1 を疑う
#   SYN-SENT   0      1        10.0.1.5:40012       10.0.9.9:80     ← 相手が SYN-ACK を返していない

# 2) 状態ごとに件数を集計（TIME-WAIT 肥大の検知に有効）
ss -tan | awk 'NR>1{print $1}' | sort | uniq -c | sort -rn

# 3) パケットを直接見る（handshake が成立しているか）
sudo tcpdump -ni eth0 'tcp port 443 and (tcp[tcpflags] & (tcp-syn|tcp-rst) != 0)'

# 4) 経路途中のどこで詰まるか（TTL を 1 ずつ増やして応答元を見る）
traceroute -T -p 443 api.example.com   # -T で TCP SYN を使う
```

**A quick reference to derive the cause from the state**:

- `SYN-SENT` stagnates → the SYN went out to the destination but **the SYN-ACK doesn't come back.** Suspect FW/SG/routing/destination-process stop.
- Many `SYN-RECV` → a **SYN flood** or backlog exhaustion. Check SYN Cookies.
- Many `TIME-WAIT` → creating too many short-lived connections (§5.1). Make them Keep-Alive.
- `CLOSE-WAIT` stagnates → **the app isn't calling `close()`** (a bug in your own code). The typical resource leak.
- Stays `ESTAB` with no response, `Recv-Q` grows → the app isn't reading the data (processing is clogged).

> Only the stagnation of `CLOSE-WAIT` is **almost certainly a bug in your own code** (received a FIN but not closing the socket). The others have the possibility of the peer or the network, but here suspect the app.

---

## 7. Reliability design: don't fully rely on TCP's "reliability"

Finally, the most practical lesson. **TCP creates reliability with "retransmission."** Conversely, this means **from the network's view, delivery is at-least-once.** Furthermore, in the world above it — load balancers, API gateways, mobile lines, Lambda retries — **the same request arriving multiple times isn't an anomaly but a daily occurrence.**

The core I designed in the [payment platform](/case-studies/payment-platform-reliability) was exactly here. The challenge was precisely "due to mobile-line timeouts and API Gateway/Lambda retries, the same payment request arrives multiple times." Against this,

- **Accept the retry itself as the normal path** (admit that the network definitely retransmits).
- And on top of that, **converge the charging to just once** — constrain the client-issued **idempotency key** to a one-time-only with a conditional write (`attribute_not_exists`), and make the balance update atomic with DynamoDB's atomic transaction.

As a result, even if retransmission happens at all layers above TCP, **as the app's meaning it becomes exactly-once,** and I achieved **0 double charges in production.**

Generalizing the lesson:

> **TCP's reliability only fixes "the loss of bytes within the same connection." It doesn't fix the real duplicates of the connection dropping and being re-set-up, or the upper layer retrying. So in domains like payments, inventory, and messaging, don't throw reliability wholesale onto the network — design it as idempotency at the app layer.**

This is a good example of TCP/IP knowledge not ending as low-layer cultivation but **directly tying to production monetary correctness.**

---

## 8. Summary: a checklist to turn TCP/IP into "design judgment"

- [ ] **You can state the boundary of guarantees**: IP is best-effort (guarantees neither delivery, order, nor deduplication). Reliability is added by TCP, and minimal multiplexing only by UDP.
- [ ] You can explain the **nesting of encapsulation** (each layer looks at only its own header = separation of concerns).
- [ ] You can read **CIDR / RFC 1918 / MTU·MSS** (VPC design and triaging MTU-caused failures).
- [ ] **TCP is a byte stream** — `write` and `data` aren't 1-to-1. In raw TCP, implement framing yourself.
- [ ] You design **TIME_WAIT, Keep-Alive, Nagle, backlog, and the three kinds of timeout** from an operational viewpoint.
- [ ] You **read TCP states with `ss`** and can derive the cause from `SYN-SENT`/`CLOSE-WAIT`/`TIME-WAIT`.
- [ ] You **don't throw reliability wholesale onto the network** and design the app to be idempotent on the premise of duplicates.

TCP/IP isn't "a classic to memorize" but **active design knowledge that determines production latency, cost, and correctness.** Next, I dig into its heart — TCP's state machine, retransmission, and congestion control — along RFC 9293.

---

I (Yudai Tomoda) design and implement, with one person × generative AI (Claude Code), **non-crashing, traceable, correct** backends that account even for low-layer behavior. "`ECONNRESET` won't stop," "ports dry up with TIME_WAIT," "I don't understand the design of timeouts and retries," "I'm afraid of double processing in payments/inventory" — such network/reliability-caused challenges, I cure at the root by identifying the cause from reproduction and observation of the phenomenon, with idempotency and appropriate timeout design. Please feel free to consult me.
