PostgreSQL — Topic 15: Backup & Recovery

pg_dump — Logical Backups

Export your database as SQL or a custom archive

▾

What is pg_dump?

pg_dump is PostgreSQL's built-in tool to export a database into a file you can restore later. It produces a logical backup — SQL statements or a binary archive that recreates your schema and data.

📦

Analogy: pg_dump is like a "Save As" for your entire database. It reads every row and writes it out as INSERT statements (or a compressed archive). You can restore it on any machine with any compatible PostgreSQL version.

pg_dump

Backs up a single database. Most commonly used. Run from any machine that can connect to PostgreSQL.

pg_dumpall

Backs up all databases in the cluster, plus global objects (users, roles). Only outputs plain SQL.

pg_restore

Restores from pg_dump's custom/directory/tar format. Cannot restore plain SQL files (use psql for those).

Full Backup — All Output Formats

pg_dump supports four output formats. The custom format (-Fc) is the most flexible and is recommended for most cases.

          
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  FORMAT 1: Plain SQL  (-Fp or default)
  Output: readable .sql file
  Restore: psql -d mydb -f backup.sql
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
pg_dump -h localhost -U postgres -d mydb -Fp -f backup.sql

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  FORMAT 2: Custom (RECOMMENDED) (-Fc)
  Output: compressed binary .dump file
  Restore: pg_restore -d mydb backup.dump
  Pros: compressed, parallel restore, selective restore
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
pg_dump -h localhost -U postgres -d mydb -Fc -f backup.dump

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  FORMAT 3: Directory (-Fd)
  Output: folder with one file per table
  Restore: pg_restore -d mydb -j 4 ./backup_dir/
  Pros: supports parallel backup AND restore (-j flag)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
pg_dump -h localhost -U postgres -d mydb -Fd -f ./backup_dir/ -j 4

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  FORMAT 4: Tar (-Ft)
  Output: .tar file
  Restore: pg_restore -d mydb backup.tar
  Note: cannot be compressed further easily
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
pg_dump -h localhost -U postgres -d mydb -Ft -f backup.tar
        

Schema-Only Backup

Sometimes you want only the structure (tables, indexes, functions) without any data — useful for setting up a new environment or migrations.

          
-- Schema ONLY (no data)
pg_dump -h localhost -U postgres -d mydb --schema-only -f schema_only.sql

-- Data ONLY (no CREATE TABLE etc.)
pg_dump -h localhost -U postgres -d mydb --data-only -f data_only.sql

-- Specific TABLE backup
pg_dump -h localhost -U postgres -d mydb -t orders -t order_items -Fc -f orders_backup.dump

-- Specific SCHEMA (namespace) backup
pg_dump -h localhost -U postgres -d mydb -n public -Fc -f public_schema.dump

-- Exclude a table (e.g. logs — too large)
pg_dump -h localhost -U postgres -d mydb --exclude-table=audit_logs -Fc -f backup.dump
        

Restoring with pg_restore

Use pg_restore for custom/directory/tar backups. For plain SQL, use psql.

          
-- Restore custom format into an existing (empty) database
-- First create the target database:
createdb -h localhost -U postgres mydb_restored

-- Then restore:
pg_restore -h localhost -U postgres -d mydb_restored backup.dump

-- Parallel restore (4 workers — much faster for large DBs)
pg_restore -h localhost -U postgres -d mydb_restored -j 4 backup.dump

-- Restore ONLY a specific table
pg_restore -h localhost -U postgres -d mydb_restored -t orders backup.dump

-- List what's inside a backup (without restoring)
pg_restore --list backup.dump

-- Restore plain SQL backup
psql -h localhost -U postgres -d mydb_restored -f backup.sql

-- Restore globals (roles, tablespaces) — from pg_dumpall
pg_dumpall -h localhost -U postgres --globals-only -f globals.sql
psql -h localhost -U postgres -f globals.sql
        

⚠️

Common mistake: pg_dump does NOT lock your database or stop writes. It takes a consistent snapshot at the moment it starts. Any changes made during the backup are included if they were committed before the backup started — this is safe for live production backups.

Automating Backups with a Shell Script

In production you always automate backups on a schedule (cron job).

          
#!/bin/bash
# /opt/scripts/pg_backup.sh
# Schedule: 0 2 * * *  (every day at 2 AM via cron)

DB_NAME="mydb"
DB_USER="postgres"
DB_HOST="localhost"
BACKUP_DIR="/backups/postgres"
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
BACKUP_FILE="${BACKUP_DIR}/${DB_NAME}_${TIMESTAMP}.dump"

# Run backup
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME -Fc -f $BACKUP_FILE

# Check success
if [ $? -eq 0 ]; then
  echo "✅ Backup succeeded: $BACKUP_FILE"
else
  echo "❌ Backup FAILED" | mail -s "PG Backup Failed" ops@company.com
  exit 1
fi

# Delete backups older than 7 days
find $BACKUP_DIR -name "*.dump" -mtime +7 -delete

# Upload to S3 (optional)
aws s3 cp $BACKUP_FILE s3://my-bucket/postgres-backups/
        

PITR — Point In Time Recovery

"Undo" any mistake by rewinding to any moment in the past

▾

What is PITR and Why Does It Exist?

pg_dump gives you a snapshot of a specific moment. But what if someone ran DELETE FROM orders; at 3:47 PM and you don't notice until 4:30 PM? Your last nightly backup was at 2 AM — you'd lose 13+ hours of data.

PITR (Point In Time Recovery) solves this. By continuously archiving WAL files, you can restore your database to any second in the past — not just backup times.

⏱️

Real scenario: "Someone ran DROP TABLE users at 3:47:22 PM. Restore to 3:47:21 PM." PITR can do exactly that — second-level precision.

📖

How it works: PITR = Base Backup + Archived WAL files. PostgreSQL replays WAL from the base backup forward, stopping at the exact time you specify.

How PITR Works — The Big Picture

PITR Architecture

DAILY BASE BACKUP (full pg_basebackup at 2 AM) │ ▼ [Base Backup] ──────── stored on S3 / backup server │ │ then continuously... │ ▼ WAL files generated every minute │ 00000001000000000000001A (2 AM) │ 00000001000000000000001B (2:05 AM) │ ... │ 000000010000000000000041 (3:47 PM) ← mistake here │ 000000010000000000000042 (3:52 PM) │ ──────────────────────────────────────► S3 archive RECOVERY: Base Backup (2 AM) + replay WAL: 2 AM → 3:47:21 PM ← STOP HERE = Database restored to 3:47:21 PM ✅

Step 1: Enable WAL Archiving

WAL archiving tells PostgreSQL to copy each completed WAL segment to a safe location (S3, NFS, another server) before recycling it.

          
-- postgresql.conf settings for WAL archiving
wal_level = replica          -- minimum for archiving
archive_mode = on            -- enable archiving
archive_command = 'aws s3 cp %p s3://my-bucket/wal-archive/%f'
--  %p = full path to WAL file
--  %f = filename only
-- Command must return 0 on success, non-zero on failure

-- Alternative: copy to local archive directory
archive_command = 'test ! -f /mnt/wal-archive/%f && cp %p /mnt/wal-archive/%f'

-- How long to wait before forcing an archive even if WAL isn't full
archive_timeout = 60   -- archive at least every 60 seconds (RPO = 60s)
        

⚠️

RPO (Recovery Point Objective): With archive_timeout = 60, you can lose at most 60 seconds of data. Set it lower for more critical data. The tradeoff is more archive storage used.

Step 2: Take a Base Backup

A base backup is the starting point. PITR replays WAL on top of this base backup. You need a fresh base backup regularly (daily is common).

          
-- Take a base backup (run on the server or from a replica)
pg_basebackup \
  -h localhost \
  -U postgres \
  -D /backups/basebackup_$(date +%Y%m%d) \
  -Ft \               # tar format
  -z \                # gzip compress
  -P \                # show progress
  --wal-method=stream # include WAL generated during backup

-- Upload base backup to S3
aws s3 sync /backups/basebackup_20240115/ s3://my-bucket/basebackups/20240115/
        

Step 3: Perform Point In Time Recovery

When disaster strikes, this is how you recover to a specific moment:

Restore the base backup to a new data directory Download from S3 and extract it. This is your starting point.

Create a recovery config file Tell PostgreSQL where to find WAL archives and when to stop replaying.

Start PostgreSQL — it replays WAL automatically PostgreSQL enters recovery mode, fetches WAL files, replays them, then stops at your target time.

Verify data, then promote Check the database looks correct. Run SELECT pg_wal_replay_resume(); to complete recovery.

          
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  STEP 1: Restore base backup
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
sudo systemctl stop postgresql
sudo rm -rf /var/lib/postgresql/16/main
sudo mkdir /var/lib/postgresql/16/main
sudo chown postgres:postgres /var/lib/postgresql/16/main

# Download and extract base backup from S3
aws s3 sync s3://my-bucket/basebackups/20240115/ /tmp/basebackup/
sudo -u postgres tar -xzf /tmp/basebackup/base.tar.gz \
  -C /var/lib/postgresql/16/main/

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  STEP 2: Create recovery config
  (postgresql.conf or postgresql.auto.conf)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# Add to postgresql.conf:
restore_command = 'aws s3 cp s3://my-bucket/wal-archive/%f %p'
# %f = WAL filename, %p = destination path on this server

recovery_target_time = '2024-01-15 15:47:21'  -- STOP HERE
recovery_target_action = 'promote'  -- auto-promote after recovery

# Create the signal file to enter recovery mode
sudo touch /var/lib/postgresql/16/main/recovery.signal

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  STEP 3: Start PostgreSQL — it auto-recovers
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
sudo systemctl start postgresql

# Watch the logs
sudo tail -f /var/log/postgresql/postgresql-16-main.log
# You'll see: "recovery stopping before commit..."
# Then: "database system is ready to accept connections"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  STEP 4: Verify and confirm
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SELECT pg_is_in_recovery();    -- TRUE = still paused in recovery
SELECT count(*) FROM orders;   -- verify data looks correct

-- If recovery_target_action = 'pause', manually promote:
SELECT pg_wal_replay_resume();
        

Recovery Target Options

You can recover to different types of targets — not just a timestamp.

Target Type	Setting	Example
Specific timestamp	`recovery_target_time`	`'2024-01-15 15:47:21 UTC'`
Specific transaction ID	`recovery_target_xid`	`'1234567'` — stop just before this txn
Named restore point	`recovery_target_name`	`'before_migration'`
Specific LSN	`recovery_target_lsn`	`'0/4A000028'`
Latest possible	`recovery_target = 'immediate'`	Stop as soon as consistent (fastest)

          
-- Create a named restore point BEFORE a risky migration
-- (Run this on the live DB before you run migrations)
SELECT pg_create_restore_point('before_v2_migration');

-- Then in recovery.conf if migration goes wrong:
recovery_target_name = 'before_v2_migration'
recovery_target_action = 'promote'
        

✅

Best practice: Always call pg_create_restore_point('before_<migration_name>') before running major schema migrations in production. It gives you a clean rollback target if something goes wrong.

WAL Recovery

How PostgreSQL auto-heals after a crash

▾

Crash Recovery vs Point-in-Time Recovery

These are two different things that both use WAL:

⚡ Crash Recovery (Automatic)

What: Postgres crashed mid-write
      (power outage, OOM kill, etc.)

How: On next startup, PostgreSQL
     reads WAL and replays any
     committed transactions that
     didn't make it to data files.

You do: Nothing. Automatic.
Time: Seconds to minutes.

Goal: Consistency — no partial
      transactions on disk.

🕐 PITR (Manual)

What: Human error, logical corruption
      (accidental DELETE, DROP TABLE)

How: You restore a base backup +
     replay archived WAL files
     to a specific moment.

You do: Everything (as above).
Time: Minutes to hours.

Goal: Roll back to a point BEFORE
      the mistake was made.

What Happens During Crash Recovery

When PostgreSQL starts after a crash, it automatically goes through a recovery sequence:

Find the last checkpoint PostgreSQL reads the control file (pg_control) to find the most recent checkpoint LSN — this is the safe starting point.

Replay WAL forward from the checkpoint All WAL records after the checkpoint are replayed. Committed transactions are applied; aborted ones are ignored.

Reach end of WAL Recovery stops when it hits the end of available WAL. The database is now in a consistent state.

Open for connections PostgreSQL starts accepting queries. The whole process happens transparently — users just see a brief restart delay.

          
-- See recovery status in the logs after startup:
LOG:  database system was shut down at 2024-01-15 15:40:11 UTC
LOG:  entering standby mode
LOG:  redo starts at 0/3F000028
LOG:  consistent recovery state reached at 0/3F0000F0
LOG:  database system is ready to accept read only connections
LOG:  recovery stopping before commit of transaction 4892, time 2024-01-15 15:47:22

-- Check the control file for database state
pg_controldata /var/lib/postgresql/16/main | grep -E 'state|checkpoint'

-- Typical output:
-- Database cluster state:               in production
-- Latest checkpoint location:           0/3F000028
        

WAL Archiving Tools for Production

Writing your own archiving scripts works, but these battle-tested tools are preferred in production:

Tool	What it does	Used by
pgBackRest	Full backup solution: base backup, WAL archiving, PITR, compression, S3/GCS/Azure support, parallel restore	Most production setups — Recommended
Barman	Backup and Recovery Manager by 2ndQuadrant. Good UI, streaming + archiving support, retention policies	Enterprise PostgreSQL setups
WAL-G	Lightweight Go tool for WAL archiving to cloud storage (S3, GCS, Azure). Used by Heroku Postgres.	Cloud-native setups
pg_dump (cron)	Simple logical backups on a schedule. Easy but no PITR.	Small apps, development No PITR

          
-- pgBackRest quick start (most common production tool)

# Install
sudo apt install pgbackrest

# /etc/pgbackrest.conf
[global]
repo1-path=/backups/pgbackrest
repo1-retention-full=2        # keep last 2 full backups
start-fast=y

[mydb]
pg1-path=/var/lib/postgresql/16/main

# postgresql.conf additions
archive_mode = on
archive_command = 'pgbackrest --stanza=mydb archive-push %p'
restore_command = 'pgbackrest --stanza=mydb archive-get %f "%p"'

# Initialize pgBackRest
sudo -u postgres pgbackrest --stanza=mydb stanza-create

# Take a full backup
sudo -u postgres pgbackrest --stanza=mydb --type=full backup

# Restore to specific time
sudo -u postgres pgbackrest --stanza=mydb \
  --target="2024-01-15 15:47:21" \
  --target-action=promote \
  restore
        

Backup Strategy Builder

Get the right backup plan for your scenario

▾

What Backup Do You Need?

🛡️ Backup Strategy Advisor Interactive

App type:

DB size:

Max data loss (RPO):

Max recovery time (RTO):

Select your requirements and click "Build Strategy"

Backup Method Comparison

Method	Granularity	PITR?	Speed	Best for
`pg_dump`	Database / table	No	Slow for large DBs	Small DBs, dev, schema migrations
`pg_basebackup` only	Full cluster	No	Fast (file copy)	Replication setup, base for PITR
pg_basebackup + WAL archive	Full cluster	Yes	Fast restore	Production — most common
pgBackRest	Full cluster + incremental	Yes	Fastest (parallel)	Large production DBs
Managed cloud (RDS, Supabase)	Full cluster	Yes	Managed for you	Cloud-hosted apps — easiest

Backup Golden Rules & Testing

A backup you haven't tested is not a backup

▾

The 3-2-1 Backup Rule

Industry standard for reliable backups:

3 Copies

Keep 3 copies of your data — the original + 2 backups.

2 Different Media

Store on 2 different types of media (local disk + cloud S3).

1 Offsite

Keep 1 copy offsite — different cloud region or physical location.

Always Test Your Restores

🚨

The #1 backup mistake: Never testing restores. Companies discover their backups are corrupt or incomplete only when disaster strikes. Test your restore process every month in a staging environment.

          
-- Monthly restore test script (automate this!)

# 1. Pick the latest backup
LATEST=$(aws s3 ls s3://my-bucket/postgres-backups/ | sort | tail -n1 | awk '{print $4}')

# 2. Restore to a test database
aws s3 cp s3://my-bucket/postgres-backups/$LATEST /tmp/test_restore.dump
createdb -U postgres restore_test
pg_restore -U postgres -d restore_test /tmp/test_restore.dump

# 3. Verify critical tables exist and have data
psql -U postgres -d restore_test -c "
  SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::regclass))
  FROM pg_tables
  WHERE schemaname = 'public'
  ORDER BY pg_total_relation_size(tablename::regclass) DESC
  LIMIT 10;
"

# 4. Check row counts match expected
psql -U postgres -d restore_test -c "SELECT COUNT(*) FROM users;"
psql -U postgres -d restore_test -c "SELECT COUNT(*) FROM orders;"

# 5. Clean up
dropdb -U postgres restore_test
echo "✅ Restore test completed successfully"
        

Production Checklist

Task	Frequency	Why
pg_dump (logical backup)	Daily	Table-level recovery, cross-version portability
pg_basebackup (full)	Daily or weekly	PITR starting point
WAL archiving	Continuous	PITR between backups
Test restore	Monthly	Verify backups actually work
Verify archive_command success	Daily alert	Detect silent failures
Check backup storage size	Weekly	Avoid running out of space
Rotate old backups	Per retention policy	Cost control

✅

Cloud managed databases (AWS RDS, Supabase, Neon, Google Cloud SQL) handle all of this for you automatically — they do continuous WAL archiving and provide PITR out of the box. Use them unless you have specific reasons for self-hosted PostgreSQL.

Topic 15 Summary

pg_dump

Logical backup — SQL or compressed archive. Use -Fc format. Restores with pg_restore. No PITR.

Full Backup

pg_basebackup copies the entire cluster. Foundation for PITR. Use -R flag for replica or -Ft -z for compressed archive.

PITR

Base Backup + continuous WAL archiving = recover to any second. Set archive_mode=on, archive_command, restore_command.

WAL Recovery

Crash recovery is automatic — PostgreSQL replays WAL on startup. PITR is manual — you restore base backup + replay WAL to a target time.

✅ Next up: Topic 16 — Vacuum (dead tuples, autovacuum, VACUUM FULL, freeze) — reply OK / Next when ready!