1

WAL — Write Ahead Log

The foundation everything else is built on

What is WAL?

Before PostgreSQL writes anything to the actual data files on disk, it first writes a record of what it's about to do in a special log file. That log is called the Write Ahead Log (WAL). It's the "paper trail" of every change.

📖
Real-world analogy: Think of a bank teller who writes down every transaction in a ledger before touching the cash. If the power goes out mid-transaction, they read the ledger to know exactly what happened and what didn't.

WAL serves two major purposes:

Crash Recovery
If Postgres crashes, it replays WAL records on restart to restore a consistent state — no data loss.
Replication
WAL records are streamed to replica servers, which replay them to stay in sync with the primary.

WAL Segments

WAL is not one giant file — it's divided into fixed-size files called WAL segments. Each segment is 16 MB by default.

WAL Segment files on disk
$PGDATA/pg_wal/ ├── 000000010000000000000001 ← oldest segment ├── 000000010000000000000002 ├── 000000010000000000000003 └── 000000010000000000000004 ← current (being written)

The filename encodes three things: timeline, log file number, and segment number. You don't need to decode it manually — PostgreSQL manages these automatically.

-- Check current WAL location SELECT pg_current_wal_lsn(); -- Returns: 0/3A2F1B8 (LSN = Log Sequence Number) -- How much WAL has been generated? SELECT pg_size_pretty(pg_wal_lsn_diff( pg_current_wal_lsn(), '0/0'::pg_lsn ));
⚠️
LSN (Log Sequence Number) is a position in the WAL stream, like a byte offset. Every WAL record has an LSN. Replicas use their LSN to know where to resume reading from.

Checkpoints

A checkpoint is when PostgreSQL flushes all dirty (modified) pages from memory to the actual data files on disk, and records a checkpoint LSN in WAL. After a checkpoint, WAL segments older than that LSN can safely be recycled.

1
Dirty pages accumulate Changes happen in memory (shared_buffers). Data files are NOT yet updated.
2
Checkpoint triggered Either by time (checkpoint_timeout, default 5 min) or WAL size (max_wal_size, default 1 GB).
3
Dirty pages flushed to disk All modified data pages are written from memory to the actual .dat files.
4
Checkpoint record written to WAL WAL now has a safe restart point. Old WAL segments before this point can be removed.
-- Force a checkpoint (useful before backups) CHECKPOINT; -- See checkpoint activity SELECT checkpoints_timed, checkpoints_req, checkpoint_write_time, checkpoint_sync_time FROM pg_stat_bgwriter;
2

Streaming Replication

Primary → Replica in near real-time

How Streaming Replication Works

In streaming replication, the replica server connects to the primary over a TCP connection and continuously receives WAL records as they are generated — just like a live stream. The replica replays those records to keep its data identical to the primary.

Streaming Replication Architecture
[ PRIMARY SERVER ] │ App writes INSERT/UPDATE/DELETE │ ▼ WAL Buffer → WAL Segment Files │ │ WAL Sender Process (on primary) │ ─────────────────────────────► │ TCP / network │ ◄───────────────────────────── │ Standby reports its LSN │ [ REPLICA SERVER ]WAL Receiver Process (on replica) │ receives WAL records ▼ Startup Process replays WAL → replica data = primary data
Result: The replica is a byte-for-byte copy of the primary. It can be promoted to primary in seconds if the primary fails (high availability).

Primary Configuration

On the primary server, you need to enable WAL streaming and create a replication user.

-- postgresql.conf on PRIMARY wal_level = replica -- must be 'replica' or 'logical' max_wal_senders = 10 -- max concurrent replica connections wal_keep_size = 1024 -- keep 1 GB of WAL for slow replicas (MB) -- pg_hba.conf on PRIMARY (allow replica to connect) -- TYPE DATABASE USER ADDRESS METHOD host replication replicator 10.0.0.2/32 md5
-- Create replication user on PRIMARY CREATE USER replicator WITH REPLICATION LOGIN PASSWORD 'strongpassword';

Replica Configuration

The replica needs to know where the primary is. This is done via a connection string.

-- Step 1: Take a base backup from primary (run on replica machine) pg_basebackup -h 10.0.0.1 -U replicator -D /var/lib/postgresql/data -P -R -- -R flag auto-creates standby.signal + postgresql.auto.conf -- Step 2: The -R flag writes this into postgresql.auto.conf on replica: primary_conninfo = 'host=10.0.0.1 port=5432 user=replicator password=strongpassword' -- Step 3: Start the replica PostgreSQL service -- That's it! It connects to primary and starts streaming.
💡
standby.signal is just an empty file in the data directory. Its presence tells PostgreSQL to start in standby (replica) mode instead of primary mode. PostgreSQL 12+ uses this file instead of the old recovery.conf.

Monitoring Replication

Always monitor replication lag — if the replica is falling behind, it might not be safe to failover to it.

-- On PRIMARY: see all connected replicas SELECT client_addr, state, -- 'streaming', 'catchup', 'backup' sent_lsn, -- WAL position sent to replica write_lsn, -- WAL written to replica disk flush_lsn, -- WAL flushed on replica disk replay_lsn, -- WAL applied (replayed) on replica pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS replication_lag FROM pg_stat_replication; -- On REPLICA: check replay lag in seconds SELECT now() - pg_last_xact_replay_timestamp() AS replication_delay;

Synchronous vs Asynchronous Replication

By default, streaming replication is asynchronous — the primary confirms a transaction commit without waiting for the replica. This is fast but means a small window of potential data loss on failover.

✅ Synchronous
-- Primary WAITS for replica to confirm
-- before returning success to client.

synchronous_standby_names = 'replica1'

Pros: Zero data loss on failover
Cons: Latency increases (network RTT added)
Use: Banking, payment systems
⚡ Asynchronous (default)
-- Primary does NOT wait for replica.
-- Returns success immediately.

synchronous_standby_names = ''  (default)

Pros: No latency added
Cons: Small risk of losing last few
      transactions on failover
Use: Most web apps, read scaling
-- Enable synchronous replication in postgresql.conf synchronous_standby_names = 'FIRST 1 (replica1, replica2)' -- FIRST 1 = wait for at least 1 replica to confirm -- Per-transaction override (useful for critical transactions) SET synchronous_commit = on; -- wait for replica flush SET synchronous_commit = off; -- don't wait (async)

Replication Slots

A replication slot ensures PostgreSQL doesn't delete WAL segments until the replica has consumed them. This is crucial for slow replicas that might otherwise be "left behind" when old WAL is recycled.

-- Create a replication slot on PRIMARY SELECT pg_create_physical_replication_slot('replica1_slot'); -- View slots SELECT slot_name, slot_type, active, restart_lsn FROM pg_replication_slots; -- On REPLICA: configure to use the slot primary_slot_name = 'replica1_slot'
🚨
Danger: If a replica with a slot goes offline for a long time, the primary will keep accumulating WAL forever (to avoid losing the replica's position). This can fill up your disk! Always monitor slot lag and drop unused slots.
3

Logical Replication

Replicate individual tables — not the whole cluster

Streaming vs Logical — What's the Difference?

Feature Streaming Replication Logical Replication
What it replicates Entire PostgreSQL cluster (all databases) Selected tables within a database
Replica must be same PG version Yes No (can be different PG version)
Replica can have extra indexes/tables No Yes
Replica can be written to No (read-only) Yes (other tables)
Use case HA failover, read replicas Cross-version migrations, analytics, partial replication
Performance overhead Low Slightly higher (decoding WAL)

Publications

A publication is defined on the publisher (source) side. It specifies which tables and operations (INSERT, UPDATE, DELETE) to replicate.

-- On PUBLISHER (source DB): enable logical decoding -- In postgresql.conf: wal_level = logical -- must be 'logical' (stronger than 'replica') -- Create a publication for ALL tables CREATE PUBLICATION my_pub FOR ALL TABLES; -- Create a publication for SPECIFIC tables CREATE PUBLICATION orders_pub FOR TABLE orders, order_items; -- Create a publication for specific operations only CREATE PUBLICATION inserts_only_pub FOR TABLE events WITH (publish = 'insert'); -- only INSERT, no UPDATE/DELETE -- View publications SELECT * FROM pg_publication; SELECT * FROM pg_publication_tables;

Subscriptions

A subscription is defined on the subscriber (destination) side. It connects to the publisher and pulls the changes.

-- On SUBSCRIBER (destination DB): create subscription CREATE SUBSCRIPTION my_sub CONNECTION 'host=10.0.0.1 dbname=mydb user=replicator password=pass' PUBLICATION my_pub; -- This automatically: -- 1. Copies all existing data (initial snapshot) -- 2. Then streams ongoing changes -- Check subscription status SELECT subname, subenabled, received_lsn FROM pg_stat_subscription; -- Disable/enable a subscription ALTER SUBSCRIPTION my_sub DISABLE; ALTER SUBSCRIPTION my_sub ENABLE; -- Drop subscription DROP SUBSCRIPTION my_sub;
⚠️
Tables must exist on subscriber: Logical replication does NOT create tables. You must create them manually (same schema) on the subscriber before subscribing.

Real-World Use Case: Zero-Downtime Major Version Upgrade

This is the most common practical use of logical replication — upgrading from PostgreSQL 14 → 16 with zero downtime.

1
Spin up a new PG 16 server Create all schemas and tables (you can use pg_dump --schema-only for this).
2
Create publication on PG 14 (old) CREATE PUBLICATION upgrade_pub FOR ALL TABLES;
3
Create subscription on PG 16 (new) PG 16 copies all data, then streams changes from PG 14 in real-time.
4
Wait for lag = 0 Monitor pg_stat_subscription until the new server is fully caught up.
5
Switch app connection string Point your app to the new PG 16 server. Drop the subscription. Old server decommissioned.
4

Replication Decision Tool

Which type of replication should you use?

Interactive Advisor

Answer a few questions and get a replication recommendation with the exact SQL to use.

🔧 Replication Advisor Interactive
Fill in the options above and click "Get Recommendation"

WAL Level Quick Reference

wal_levelSupportsWhen to use
minimal Crash recovery only Standalone server, no replication
replica Streaming replication HA setup, read replicas
logical Streaming + Logical replication Cross-version migration, CDC, partial replication
💡
Best practice: Most production setups use wal_level = replica by default. Only switch to logical when you need logical replication or CDC (Change Data Capture) tools like Debezium.
5

Full Setup Walkthrough

Complete primary + replica setup from scratch

Complete Streaming Replication Setup

Here is the full step-by-step to set up streaming replication between two servers: Primary: 10.0.0.1 and Replica: 10.0.0.2

🖥️
Environment: Both servers run PostgreSQL 16, Ubuntu 22.04. Replace IPs with your actual server addresses.
═══════════════════════════════════════════ STEP 1: Configure PRIMARY (10.0.0.1) ═══════════════════════════════════════════ -- Edit postgresql.conf listen_addresses = '*' wal_level = replica max_wal_senders = 5 wal_keep_size = 256 -- keep 256 MB of WAL -- Edit pg_hba.conf (add this line) host replication replicator 10.0.0.2/32 scram-sha-256 -- Create replication user (run in psql) CREATE USER replicator WITH REPLICATION LOGIN PASSWORD 'SecurePass123!'; -- Reload config SELECT pg_reload_conf();
═══════════════════════════════════════════ STEP 2: Initialize REPLICA (10.0.0.2) (Run on the replica machine shell) ═══════════════════════════════════════════ # Stop replica postgres if running sudo systemctl stop postgresql # Remove existing data directory sudo rm -rf /var/lib/postgresql/16/main # Take base backup from primary sudo -u postgres pg_basebackup \ -h 10.0.0.1 \ -U replicator \ -D /var/lib/postgresql/16/main \ -P \ # show progress -R \ # auto-create standby.signal & connstring --wal-method=stream # Start replica sudo systemctl start postgresql
═══════════════════════════════════════════ STEP 3: Verify Replication is Working ═══════════════════════════════════════════ -- On PRIMARY: check replica is connected SELECT client_addr, state, replay_lsn, pg_size_pretty(pg_wal_lsn_diff( pg_current_wal_lsn(), replay_lsn )) AS lag FROM pg_stat_replication; -- Should show: 10.0.0.2 | streaming | 0/3A00000 | 0 bytes -- On PRIMARY: insert a test row CREATE TABLE repl_test (id SERIAL, msg TEXT); INSERT INTO repl_test (msg) VALUES ('hello replica!'); -- On REPLICA: verify it appeared SELECT * FROM repl_test; -- Should return: 1 | hello replica! -- On REPLICA: confirm it is in standby mode SELECT pg_is_in_recovery(); -- returns TRUE on replica

Failover — Promoting a Replica to Primary

When the primary fails, you need to promote the replica to become the new primary.

═══════════════════════════════════════════ PROMOTE REPLICA TO PRIMARY (Run on 10.0.0.2 machine shell) ═══════════════════════════════════════════ # Option 1: pg_ctl (recommended) sudo -u postgres pg_ctl promote -D /var/lib/postgresql/16/main # Option 2: via SQL (PostgreSQL 12+) SELECT pg_promote(); -- Verify promotion succeeded SELECT pg_is_in_recovery(); -- now returns FALSE (it's a primary!) -- Now update your app's connection string to point to 10.0.0.2
⚠️
After failover: If the old primary comes back online, do NOT let it rejoin as primary — it is now out of date. Rebuild it as a replica using pg_basebackup from the new primary, or use a tool like Patroni which manages this automatically.

Production Tools for Replication Management

In real production systems, you rarely manage replication manually. These tools automate failover, leader election, and replica management:

Patroni
Most popular HA solution. Uses etcd/ZooKeeper to elect a leader. Handles automatic failover. Used by Zalando, GitLab.
pg_auto_failover
Simpler than Patroni. Microsoft-maintained. Good for smaller setups. Monitor + 2 data nodes = HA cluster.
repmgr
2ndQuadrant's replication manager. Handles standby registration, monitoring, failover, and switchover.
pgEdge / Citus
For distributed/multi-primary PostgreSQL — goes beyond basic replication into sharding territory.
6

Common Production Patterns

How real companies architect PostgreSQL replication

Pattern 1: Primary + Hot Standby (HA)

Most common pattern. One primary handles all writes. One or more hot standbys are ready to take over on failure.

High Availability Setup
Application ────► [ PRIMARY ] (writes + reads)WAL streaming │ ▼ [ HOT STANDBY ] (read-only + ready to promote) Patroni / pg_auto_failover monitors both. If primary dies → standby promoted in ~30 seconds.

Pattern 2: Read Scaling with Multiple Replicas

Distribute read-heavy queries (reports, dashboards) across replicas to offload the primary.

Read Scaling Setup
Writes ──► [ PRIMARY ]WAL streaming ┌────────────────┴───────────────────┐ ▼ ▼ [ REPLICA 1 ] [ REPLICA 2 ] App reads ◄─ Reports ◄── PgBouncer or HAProxy round-robins read queries across replicas.

Pattern 3: Logical Replication for CDC / Analytics

Stream changes from your OLTP PostgreSQL into a data warehouse or Kafka, without impacting the primary.

Change Data Capture (CDC)
[ OLTP PRIMARY ] │ wal_level = logical │ ▼ Debezium / pglogical (reads WAL via logical replication slot) │ ├──► Kafka topic: postgres.orders ├──► Elasticsearch index └──► Snowflake / BigQuery (analytics)
This is how modern data platforms work. Your PostgreSQL is the source of truth. Changes flow via WAL to Kafka, then to analytics DBs, search indexes, caches — all without touching your OLTP database.
Topic 14 Summary
WAL
The change log written before data files are touched. Foundation of both crash recovery and replication.
WAL Segments
16 MB files in pg_wal/. Recycled after checkpoints confirm data is safely on disk.
Streaming Replication
Full cluster copy. Primary → Replica via WAL. Best for HA and read scaling. Replica is read-only.
Logical Replication
Table-level, cross-version, writable replicas. Uses Publications + Subscriptions. Best for migrations and CDC.
Next up: Topic 15 — Backup & Recovery (pg_dump, PITR, WAL archiving) — reply OK / Next when ready!