PostgreSQL — Topic 14: Replication

1

WAL — Write Ahead Log

The foundation everything else is built on

▾

What is WAL?

Before PostgreSQL writes anything to the actual data files on disk, it first writes a record of what it's about to do in a special log file. That log is called the Write Ahead Log (WAL). It's the "paper trail" of every change.

📖

Real-world analogy: Think of a bank teller who writes down every transaction in a ledger before touching the cash. If the power goes out mid-transaction, they read the ledger to know exactly what happened and what didn't.

WAL serves two major purposes:

Crash Recovery

If Postgres crashes, it replays WAL records on restart to restore a consistent state — no data loss.

Replication

WAL records are streamed to replica servers, which replay them to stay in sync with the primary.

WAL Segments

WAL is not one giant file — it's divided into fixed-size files called WAL segments. Each segment is 16 MB by default.

WAL Segment files on disk

$PGDATA/pg_wal/ ├── 000000010000000000000001 ← oldest segment ├── 000000010000000000000002 ├── 000000010000000000000003 └── 000000010000000000000004 ← current (being written)

The filename encodes three things: timeline, log file number, and segment number. You don't need to decode it manually — PostgreSQL manages these automatically.

            
-- Check current WAL location
SELECT pg_current_wal_lsn();
-- Returns: 0/3A2F1B8  (LSN = Log Sequence Number)

-- How much WAL has been generated?
SELECT pg_size_pretty(pg_wal_lsn_diff(
  pg_current_wal_lsn(),
  '0/0'::pg_lsn
));
          

⚠️

LSN (Log Sequence Number) is a position in the WAL stream, like a byte offset. Every WAL record has an LSN. Replicas use their LSN to know where to resume reading from.

Checkpoints

A checkpoint is when PostgreSQL flushes all dirty (modified) pages from memory to the actual data files on disk, and records a checkpoint LSN in WAL. After a checkpoint, WAL segments older than that LSN can safely be recycled.

1

Dirty pages accumulate Changes happen in memory (shared_buffers). Data files are NOT yet updated.

2

Checkpoint triggered Either by time (checkpoint_timeout, default 5 min) or WAL size (max_wal_size, default 1 GB).

3

Dirty pages flushed to disk All modified data pages are written from memory to the actual .dat files.

4

Checkpoint record written to WAL WAL now has a safe restart point. Old WAL segments before this point can be removed.

            
-- Force a checkpoint (useful before backups)
CHECKPOINT;

-- See checkpoint activity
SELECT checkpoints_timed, checkpoints_req,
       checkpoint_write_time, checkpoint_sync_time
FROM   pg_stat_bgwriter;
          

2

Streaming Replication

Primary → Replica in near real-time

▾

How Streaming Replication Works

In streaming replication, the replica server connects to the primary over a TCP connection and continuously receives WAL records as they are generated — just like a live stream. The replica replays those records to keep its data identical to the primary.

Streaming Replication Architecture

[ PRIMARY SERVER ] │ App writes INSERT/UPDATE/DELETE │ ▼ WAL Buffer → WAL Segment Files │ │ WAL Sender Process (on primary) │ ─────────────────────────────► │ TCP / network │ ◄───────────────────────────── │ Standby reports its LSN │ [ REPLICA SERVER ] │ WAL Receiver Process (on replica) │ receives WAL records ▼ Startup Process replays WAL → replica data = primary data

✅

Result: The replica is a byte-for-byte copy of the primary. It can be promoted to primary in seconds if the primary fails (high availability).

Primary Configuration

On the primary server, you need to enable WAL streaming and create a replication user.

            
-- postgresql.conf on PRIMARY
wal_level = replica          -- must be 'replica' or 'logical'
max_wal_senders = 10        -- max concurrent replica connections
wal_keep_size = 1024        -- keep 1 GB of WAL for slow replicas (MB)

-- pg_hba.conf on PRIMARY (allow replica to connect)
-- TYPE  DATABASE        USER            ADDRESS         METHOD
host  replication     replicator      10.0.0.2/32     md5
          

            
-- Create replication user on PRIMARY
CREATE USER replicator WITH REPLICATION LOGIN PASSWORD 'strongpassword';

Replica Configuration

The replica needs to know where the primary is. This is done via a connection string.

            
-- Step 1: Take a base backup from primary (run on replica machine)
pg_basebackup -h 10.0.0.1 -U replicator -D /var/lib/postgresql/data -P -R
-- -R flag auto-creates standby.signal + postgresql.auto.conf

-- Step 2: The -R flag writes this into postgresql.auto.conf on replica:
primary_conninfo = 'host=10.0.0.1 port=5432 user=replicator password=strongpassword'

-- Step 3: Start the replica PostgreSQL service
-- That's it! It connects to primary and starts streaming.
          

💡

standby.signal is just an empty file in the data directory. Its presence tells PostgreSQL to start in standby (replica) mode instead of primary mode. PostgreSQL 12+ uses this file instead of the old recovery.conf.

Monitoring Replication

Always monitor replication lag — if the replica is falling behind, it might not be safe to failover to it.

            
-- On PRIMARY: see all connected replicas
SELECT
  client_addr,
  state,             -- 'streaming', 'catchup', 'backup'
  sent_lsn,          -- WAL position sent to replica
  write_lsn,         -- WAL written to replica disk
  flush_lsn,         -- WAL flushed on replica disk
  replay_lsn,        -- WAL applied (replayed) on replica
  pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS replication_lag
FROM pg_stat_replication;

-- On REPLICA: check replay lag in seconds
SELECT
  now() - pg_last_xact_replay_timestamp() AS replication_delay;
          

Synchronous vs Asynchronous Replication

By default, streaming replication is asynchronous — the primary confirms a transaction commit without waiting for the replica. This is fast but means a small window of potential data loss on failover.

✅ Synchronous

-- Primary WAITS for replica to confirm
-- before returning success to client.

synchronous_standby_names = 'replica1'

Pros: Zero data loss on failover
Cons: Latency increases (network RTT added)
Use: Banking, payment systems

⚡ Asynchronous (default)

-- Primary does NOT wait for replica.
-- Returns success immediately.

synchronous_standby_names = ''  (default)

Pros: No latency added
Cons: Small risk of losing last few
      transactions on failover
Use: Most web apps, read scaling

            
-- Enable synchronous replication in postgresql.conf
synchronous_standby_names = 'FIRST 1 (replica1, replica2)'
-- FIRST 1 = wait for at least 1 replica to confirm

-- Per-transaction override (useful for critical transactions)
SET synchronous_commit = on;   -- wait for replica flush
SET synchronous_commit = off;  -- don't wait (async)
          

Replication Slots

A replication slot ensures PostgreSQL doesn't delete WAL segments until the replica has consumed them. This is crucial for slow replicas that might otherwise be "left behind" when old WAL is recycled.

            
-- Create a replication slot on PRIMARY
SELECT pg_create_physical_replication_slot('replica1_slot');

-- View slots
SELECT slot_name, slot_type, active, restart_lsn
FROM  pg_replication_slots;

-- On REPLICA: configure to use the slot
primary_slot_name = 'replica1_slot'
          

🚨

Danger: If a replica with a slot goes offline for a long time, the primary will keep accumulating WAL forever (to avoid losing the replica's position). This can fill up your disk! Always monitor slot lag and drop unused slots.

3

Logical Replication

Replicate individual tables — not the whole cluster

▾

Streaming vs Logical — What's the Difference?

Feature	Streaming Replication	Logical Replication
What it replicates	Entire PostgreSQL cluster (all databases)	Selected tables within a database
Replica must be same PG version	Yes	No (can be different PG version)
Replica can have extra indexes/tables	No	Yes
Replica can be written to	No (read-only)	Yes (other tables)
Use case	HA failover, read replicas	Cross-version migrations, analytics, partial replication
Performance overhead	Low	Slightly higher (decoding WAL)

Publications

A publication is defined on the publisher (source) side. It specifies which tables and operations (INSERT, UPDATE, DELETE) to replicate.

            
-- On PUBLISHER (source DB): enable logical decoding
-- In postgresql.conf:
wal_level = logical   -- must be 'logical' (stronger than 'replica')

-- Create a publication for ALL tables
CREATE PUBLICATION my_pub FOR ALL TABLES;

-- Create a publication for SPECIFIC tables
CREATE PUBLICATION orders_pub FOR TABLE orders, order_items;

-- Create a publication for specific operations only
CREATE PUBLICATION inserts_only_pub
  FOR TABLE events
  WITH (publish = 'insert');   -- only INSERT, no UPDATE/DELETE

-- View publications
SELECT * FROM pg_publication;
SELECT * FROM pg_publication_tables;
          

Subscriptions

A subscription is defined on the subscriber (destination) side. It connects to the publisher and pulls the changes.

            
-- On SUBSCRIBER (destination DB): create subscription
CREATE SUBSCRIPTION my_sub
  CONNECTION 'host=10.0.0.1 dbname=mydb user=replicator password=pass'
  PUBLICATION my_pub;

-- This automatically:
-- 1. Copies all existing data (initial snapshot)
-- 2. Then streams ongoing changes

-- Check subscription status
SELECT subname, subenabled, received_lsn
FROM  pg_stat_subscription;

-- Disable/enable a subscription
ALTER SUBSCRIPTION my_sub DISABLE;
ALTER SUBSCRIPTION my_sub ENABLE;

-- Drop subscription
DROP SUBSCRIPTION my_sub;
          

⚠️

Tables must exist on subscriber: Logical replication does NOT create tables. You must create them manually (same schema) on the subscriber before subscribing.

Real-World Use Case: Zero-Downtime Major Version Upgrade

This is the most common practical use of logical replication — upgrading from PostgreSQL 14 → 16 with zero downtime.

1

Spin up a new PG 16 server Create all schemas and tables (you can use pg_dump --schema-only for this).

2

Create publication on PG 14 (old) CREATE PUBLICATION upgrade_pub FOR ALL TABLES;

3

Create subscription on PG 16 (new) PG 16 copies all data, then streams changes from PG 14 in real-time.

4

Wait for lag = 0 Monitor pg_stat_subscription until the new server is fully caught up.

5

Switch app connection string Point your app to the new PG 16 server. Drop the subscription. Old server decommissioned.

4

Replication Decision Tool

Which type of replication should you use?

▾

Interactive Advisor

Answer a few questions and get a replication recommendation with the exact SQL to use.

🔧 Replication Advisor Interactive

Goal:

Data loss tolerance:

Same PG version?

Fill in the options above and click "Get Recommendation"

WAL Level Quick Reference

wal_level	Supports	When to use
`minimal`	Crash recovery only	Standalone server, no replication
`replica`	Streaming replication	HA setup, read replicas
`logical`	Streaming + Logical replication	Cross-version migration, CDC, partial replication

💡

Best practice: Most production setups use wal_level = replica by default. Only switch to logical when you need logical replication or CDC (Change Data Capture) tools like Debezium.

5

Full Setup Walkthrough

Complete primary + replica setup from scratch

▾

Complete Streaming Replication Setup

Here is the full step-by-step to set up streaming replication between two servers: Primary: 10.0.0.1 and Replica: 10.0.0.2

🖥️

Environment: Both servers run PostgreSQL 16, Ubuntu 22.04. Replace IPs with your actual server addresses.

            
═══════════════════════════════════════════
  STEP 1: Configure PRIMARY (10.0.0.1)
═══════════════════════════════════════════

-- Edit postgresql.conf
listen_addresses = '*'
wal_level = replica
max_wal_senders = 5
wal_keep_size = 256        -- keep 256 MB of WAL

-- Edit pg_hba.conf (add this line)
host  replication  replicator  10.0.0.2/32  scram-sha-256

-- Create replication user (run in psql)
CREATE USER replicator
  WITH REPLICATION
  LOGIN
  PASSWORD 'SecurePass123!';

-- Reload config
SELECT pg_reload_conf();
          

            
═══════════════════════════════════════════
  STEP 2: Initialize REPLICA (10.0.0.2)
  (Run on the replica machine shell)
═══════════════════════════════════════════

# Stop replica postgres if running
sudo systemctl stop postgresql

# Remove existing data directory
sudo rm -rf /var/lib/postgresql/16/main

# Take base backup from primary
sudo -u postgres pg_basebackup \
  -h 10.0.0.1 \
  -U replicator \
  -D /var/lib/postgresql/16/main \
  -P \          # show progress
  -R \          # auto-create standby.signal & connstring
  --wal-method=stream

# Start replica
sudo systemctl start postgresql
          

            
═══════════════════════════════════════════
  STEP 3: Verify Replication is Working
═══════════════════════════════════════════

-- On PRIMARY: check replica is connected
SELECT client_addr, state, replay_lsn,
       pg_size_pretty(pg_wal_lsn_diff(
         pg_current_wal_lsn(), replay_lsn
       )) AS lag
FROM pg_stat_replication;
-- Should show: 10.0.0.2 | streaming | 0/3A00000 | 0 bytes

-- On PRIMARY: insert a test row
CREATE TABLE repl_test (id SERIAL, msg TEXT);
INSERT INTO repl_test (msg) VALUES ('hello replica!');

-- On REPLICA: verify it appeared
SELECT * FROM repl_test;
-- Should return: 1 | hello replica!

-- On REPLICA: confirm it is in standby mode
SELECT pg_is_in_recovery();   -- returns TRUE on replica
          

Failover — Promoting a Replica to Primary

When the primary fails, you need to promote the replica to become the new primary.

            
═══════════════════════════════════════════
  PROMOTE REPLICA TO PRIMARY
  (Run on 10.0.0.2 machine shell)
═══════════════════════════════════════════

# Option 1: pg_ctl (recommended)
sudo -u postgres pg_ctl promote -D /var/lib/postgresql/16/main

# Option 2: via SQL (PostgreSQL 12+)
SELECT pg_promote();

-- Verify promotion succeeded
SELECT pg_is_in_recovery();   -- now returns FALSE (it's a primary!)

-- Now update your app's connection string to point to 10.0.0.2
          

⚠️

After failover: If the old primary comes back online, do NOT let it rejoin as primary — it is now out of date. Rebuild it as a replica using pg_basebackup from the new primary, or use a tool like Patroni which manages this automatically.

Production Tools for Replication Management

In real production systems, you rarely manage replication manually. These tools automate failover, leader election, and replica management:

Patroni

Most popular HA solution. Uses etcd/ZooKeeper to elect a leader. Handles automatic failover. Used by Zalando, GitLab.

pg_auto_failover

Simpler than Patroni. Microsoft-maintained. Good for smaller setups. Monitor + 2 data nodes = HA cluster.

repmgr

2ndQuadrant's replication manager. Handles standby registration, monitoring, failover, and switchover.

pgEdge / Citus

For distributed/multi-primary PostgreSQL — goes beyond basic replication into sharding territory.

6

Common Production Patterns

How real companies architect PostgreSQL replication

▾

Pattern 1: Primary + Hot Standby (HA)

Most common pattern. One primary handles all writes. One or more hot standbys are ready to take over on failure.

High Availability Setup

Application ────► [ PRIMARY ] (writes + reads) │ WAL streaming │ ▼ [ HOT STANDBY ] (read-only + ready to promote) Patroni / pg_auto_failover monitors both. If primary dies → standby promoted in ~30 seconds.

Pattern 2: Read Scaling with Multiple Replicas

Distribute read-heavy queries (reports, dashboards) across replicas to offload the primary.

Read Scaling Setup

Writes ──► [ PRIMARY ] │ WAL streaming ┌────────────────┴───────────────────┐ ▼ ▼ [ REPLICA 1 ] [ REPLICA 2 ] App reads ◄─ Reports ◄── PgBouncer or HAProxy round-robins read queries across replicas.

Pattern 3: Logical Replication for CDC / Analytics

Stream changes from your OLTP PostgreSQL into a data warehouse or Kafka, without impacting the primary.

Change Data Capture (CDC)

[ OLTP PRIMARY ] │ wal_level = logical │ ▼ Debezium / pglogical (reads WAL via logical replication slot) │ ├──► Kafka topic: postgres.orders ├──► Elasticsearch index └──► Snowflake / BigQuery (analytics)

✅

This is how modern data platforms work. Your PostgreSQL is the source of truth. Changes flow via WAL to Kafka, then to analytics DBs, search indexes, caches — all without touching your OLTP database.