1

pg_dump — Logical Backups

Export your database as SQL or a custom archive

What is pg_dump?

pg_dump is PostgreSQL's built-in tool to export a database into a file you can restore later. It produces a logical backup — SQL statements or a binary archive that recreates your schema and data.

📦
Analogy: pg_dump is like a "Save As" for your entire database. It reads every row and writes it out as INSERT statements (or a compressed archive). You can restore it on any machine with any compatible PostgreSQL version.
pg_dump
Backs up a single database. Most commonly used. Run from any machine that can connect to PostgreSQL.
pg_dumpall
Backs up all databases in the cluster, plus global objects (users, roles). Only outputs plain SQL.
pg_restore
Restores from pg_dump's custom/directory/tar format. Cannot restore plain SQL files (use psql for those).

Full Backup — All Output Formats

pg_dump supports four output formats. The custom format (-Fc) is the most flexible and is recommended for most cases.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ FORMAT 1: Plain SQL (-Fp or default) Output: readable .sql file Restore: psql -d mydb -f backup.sql ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ pg_dump -h localhost -U postgres -d mydb -Fp -f backup.sql ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ FORMAT 2: Custom (RECOMMENDED) (-Fc) Output: compressed binary .dump file Restore: pg_restore -d mydb backup.dump Pros: compressed, parallel restore, selective restore ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ pg_dump -h localhost -U postgres -d mydb -Fc -f backup.dump ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ FORMAT 3: Directory (-Fd) Output: folder with one file per table Restore: pg_restore -d mydb -j 4 ./backup_dir/ Pros: supports parallel backup AND restore (-j flag) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ pg_dump -h localhost -U postgres -d mydb -Fd -f ./backup_dir/ -j 4 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ FORMAT 4: Tar (-Ft) Output: .tar file Restore: pg_restore -d mydb backup.tar Note: cannot be compressed further easily ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ pg_dump -h localhost -U postgres -d mydb -Ft -f backup.tar

Schema-Only Backup

Sometimes you want only the structure (tables, indexes, functions) without any data — useful for setting up a new environment or migrations.

-- Schema ONLY (no data) pg_dump -h localhost -U postgres -d mydb --schema-only -f schema_only.sql -- Data ONLY (no CREATE TABLE etc.) pg_dump -h localhost -U postgres -d mydb --data-only -f data_only.sql -- Specific TABLE backup pg_dump -h localhost -U postgres -d mydb -t orders -t order_items -Fc -f orders_backup.dump -- Specific SCHEMA (namespace) backup pg_dump -h localhost -U postgres -d mydb -n public -Fc -f public_schema.dump -- Exclude a table (e.g. logs — too large) pg_dump -h localhost -U postgres -d mydb --exclude-table=audit_logs -Fc -f backup.dump

Restoring with pg_restore

Use pg_restore for custom/directory/tar backups. For plain SQL, use psql.

-- Restore custom format into an existing (empty) database -- First create the target database: createdb -h localhost -U postgres mydb_restored -- Then restore: pg_restore -h localhost -U postgres -d mydb_restored backup.dump -- Parallel restore (4 workers — much faster for large DBs) pg_restore -h localhost -U postgres -d mydb_restored -j 4 backup.dump -- Restore ONLY a specific table pg_restore -h localhost -U postgres -d mydb_restored -t orders backup.dump -- List what's inside a backup (without restoring) pg_restore --list backup.dump -- Restore plain SQL backup psql -h localhost -U postgres -d mydb_restored -f backup.sql -- Restore globals (roles, tablespaces) — from pg_dumpall pg_dumpall -h localhost -U postgres --globals-only -f globals.sql psql -h localhost -U postgres -f globals.sql
⚠️
Common mistake: pg_dump does NOT lock your database or stop writes. It takes a consistent snapshot at the moment it starts. Any changes made during the backup are included if they were committed before the backup started — this is safe for live production backups.

Automating Backups with a Shell Script

In production you always automate backups on a schedule (cron job).

#!/bin/bash # /opt/scripts/pg_backup.sh # Schedule: 0 2 * * * (every day at 2 AM via cron) DB_NAME="mydb" DB_USER="postgres" DB_HOST="localhost" BACKUP_DIR="/backups/postgres" TIMESTAMP=$(date +"%Y%m%d_%H%M%S") BACKUP_FILE="${BACKUP_DIR}/${DB_NAME}_${TIMESTAMP}.dump" # Run backup pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME -Fc -f $BACKUP_FILE # Check success if [ $? -eq 0 ]; then echo "✅ Backup succeeded: $BACKUP_FILE" else echo "❌ Backup FAILED" | mail -s "PG Backup Failed" ops@company.com exit 1 fi # Delete backups older than 7 days find $BACKUP_DIR -name "*.dump" -mtime +7 -delete # Upload to S3 (optional) aws s3 cp $BACKUP_FILE s3://my-bucket/postgres-backups/
2

PITR — Point In Time Recovery

"Undo" any mistake by rewinding to any moment in the past

What is PITR and Why Does It Exist?

pg_dump gives you a snapshot of a specific moment. But what if someone ran DELETE FROM orders; at 3:47 PM and you don't notice until 4:30 PM? Your last nightly backup was at 2 AM — you'd lose 13+ hours of data.

PITR (Point In Time Recovery) solves this. By continuously archiving WAL files, you can restore your database to any second in the past — not just backup times.

⏱️
Real scenario: "Someone ran DROP TABLE users at 3:47:22 PM. Restore to 3:47:21 PM." PITR can do exactly that — second-level precision.
📖
How it works: PITR = Base Backup + Archived WAL files. PostgreSQL replays WAL from the base backup forward, stopping at the exact time you specify.

How PITR Works — The Big Picture

PITR Architecture
DAILY BASE BACKUP (full pg_basebackup at 2 AM) │ ▼ [Base Backup] ──────── stored on S3 / backup server │ │ then continuously... │ ▼ WAL files generated every minute │ 00000001000000000000001A (2 AM) │ 00000001000000000000001B (2:05 AM) │ ... │ 000000010000000000000041 (3:47 PM) ← mistake here │ 000000010000000000000042 (3:52 PM) │ ──────────────────────────────────────► S3 archive RECOVERY: Base Backup (2 AM) + replay WAL: 2 AM → 3:47:21 PM ← STOP HERE = Database restored to 3:47:21 PM ✅

Step 1: Enable WAL Archiving

WAL archiving tells PostgreSQL to copy each completed WAL segment to a safe location (S3, NFS, another server) before recycling it.

-- postgresql.conf settings for WAL archiving wal_level = replica -- minimum for archiving archive_mode = on -- enable archiving archive_command = 'aws s3 cp %p s3://my-bucket/wal-archive/%f' -- %p = full path to WAL file -- %f = filename only -- Command must return 0 on success, non-zero on failure -- Alternative: copy to local archive directory archive_command = 'test ! -f /mnt/wal-archive/%f && cp %p /mnt/wal-archive/%f' -- How long to wait before forcing an archive even if WAL isn't full archive_timeout = 60 -- archive at least every 60 seconds (RPO = 60s)
⚠️
RPO (Recovery Point Objective): With archive_timeout = 60, you can lose at most 60 seconds of data. Set it lower for more critical data. The tradeoff is more archive storage used.

Step 2: Take a Base Backup

A base backup is the starting point. PITR replays WAL on top of this base backup. You need a fresh base backup regularly (daily is common).

-- Take a base backup (run on the server or from a replica) pg_basebackup \ -h localhost \ -U postgres \ -D /backups/basebackup_$(date +%Y%m%d) \ -Ft \ # tar format -z \ # gzip compress -P \ # show progress --wal-method=stream # include WAL generated during backup -- Upload base backup to S3 aws s3 sync /backups/basebackup_20240115/ s3://my-bucket/basebackups/20240115/

Step 3: Perform Point In Time Recovery

When disaster strikes, this is how you recover to a specific moment:

1
Restore the base backup to a new data directory Download from S3 and extract it. This is your starting point.
2
Create a recovery config file Tell PostgreSQL where to find WAL archives and when to stop replaying.
3
Start PostgreSQL — it replays WAL automatically PostgreSQL enters recovery mode, fetches WAL files, replays them, then stops at your target time.
4
Verify data, then promote Check the database looks correct. Run SELECT pg_wal_replay_resume(); to complete recovery.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 1: Restore base backup ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ sudo systemctl stop postgresql sudo rm -rf /var/lib/postgresql/16/main sudo mkdir /var/lib/postgresql/16/main sudo chown postgres:postgres /var/lib/postgresql/16/main # Download and extract base backup from S3 aws s3 sync s3://my-bucket/basebackups/20240115/ /tmp/basebackup/ sudo -u postgres tar -xzf /tmp/basebackup/base.tar.gz \ -C /var/lib/postgresql/16/main/ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 2: Create recovery config (postgresql.conf or postgresql.auto.conf) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ # Add to postgresql.conf: restore_command = 'aws s3 cp s3://my-bucket/wal-archive/%f %p' # %f = WAL filename, %p = destination path on this server recovery_target_time = '2024-01-15 15:47:21' -- STOP HERE recovery_target_action = 'promote' -- auto-promote after recovery # Create the signal file to enter recovery mode sudo touch /var/lib/postgresql/16/main/recovery.signal ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 3: Start PostgreSQL — it auto-recovers ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ sudo systemctl start postgresql # Watch the logs sudo tail -f /var/log/postgresql/postgresql-16-main.log # You'll see: "recovery stopping before commit..." # Then: "database system is ready to accept connections" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 4: Verify and confirm ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SELECT pg_is_in_recovery(); -- TRUE = still paused in recovery SELECT count(*) FROM orders; -- verify data looks correct -- If recovery_target_action = 'pause', manually promote: SELECT pg_wal_replay_resume();

Recovery Target Options

You can recover to different types of targets — not just a timestamp.

Target TypeSettingExample
Specific timestamp recovery_target_time '2024-01-15 15:47:21 UTC'
Specific transaction ID recovery_target_xid '1234567' — stop just before this txn
Named restore point recovery_target_name 'before_migration'
Specific LSN recovery_target_lsn '0/4A000028'
Latest possible recovery_target = 'immediate' Stop as soon as consistent (fastest)
-- Create a named restore point BEFORE a risky migration -- (Run this on the live DB before you run migrations) SELECT pg_create_restore_point('before_v2_migration'); -- Then in recovery.conf if migration goes wrong: recovery_target_name = 'before_v2_migration' recovery_target_action = 'promote'
Best practice: Always call pg_create_restore_point('before_<migration_name>') before running major schema migrations in production. It gives you a clean rollback target if something goes wrong.
3

WAL Recovery

How PostgreSQL auto-heals after a crash

Crash Recovery vs Point-in-Time Recovery

These are two different things that both use WAL:

⚡ Crash Recovery (Automatic)
What: Postgres crashed mid-write
      (power outage, OOM kill, etc.)

How: On next startup, PostgreSQL
     reads WAL and replays any
     committed transactions that
     didn't make it to data files.

You do: Nothing. Automatic.
Time: Seconds to minutes.

Goal: Consistency — no partial
      transactions on disk.
🕐 PITR (Manual)
What: Human error, logical corruption
      (accidental DELETE, DROP TABLE)

How: You restore a base backup +
     replay archived WAL files
     to a specific moment.

You do: Everything (as above).
Time: Minutes to hours.

Goal: Roll back to a point BEFORE
      the mistake was made.

What Happens During Crash Recovery

When PostgreSQL starts after a crash, it automatically goes through a recovery sequence:

1
Find the last checkpoint PostgreSQL reads the control file (pg_control) to find the most recent checkpoint LSN — this is the safe starting point.
2
Replay WAL forward from the checkpoint All WAL records after the checkpoint are replayed. Committed transactions are applied; aborted ones are ignored.
3
Reach end of WAL Recovery stops when it hits the end of available WAL. The database is now in a consistent state.
4
Open for connections PostgreSQL starts accepting queries. The whole process happens transparently — users just see a brief restart delay.
-- See recovery status in the logs after startup: LOG: database system was shut down at 2024-01-15 15:40:11 UTC LOG: entering standby mode LOG: redo starts at 0/3F000028 LOG: consistent recovery state reached at 0/3F0000F0 LOG: database system is ready to accept read only connections LOG: recovery stopping before commit of transaction 4892, time 2024-01-15 15:47:22 -- Check the control file for database state pg_controldata /var/lib/postgresql/16/main | grep -E 'state|checkpoint' -- Typical output: -- Database cluster state: in production -- Latest checkpoint location: 0/3F000028

WAL Archiving Tools for Production

Writing your own archiving scripts works, but these battle-tested tools are preferred in production:

ToolWhat it doesUsed by
pgBackRest Full backup solution: base backup, WAL archiving, PITR, compression, S3/GCS/Azure support, parallel restore Most production setups — Recommended
Barman Backup and Recovery Manager by 2ndQuadrant. Good UI, streaming + archiving support, retention policies Enterprise PostgreSQL setups
WAL-G Lightweight Go tool for WAL archiving to cloud storage (S3, GCS, Azure). Used by Heroku Postgres. Cloud-native setups
pg_dump (cron) Simple logical backups on a schedule. Easy but no PITR. Small apps, development No PITR
-- pgBackRest quick start (most common production tool) # Install sudo apt install pgbackrest # /etc/pgbackrest.conf [global] repo1-path=/backups/pgbackrest repo1-retention-full=2 # keep last 2 full backups start-fast=y [mydb] pg1-path=/var/lib/postgresql/16/main # postgresql.conf additions archive_mode = on archive_command = 'pgbackrest --stanza=mydb archive-push %p' restore_command = 'pgbackrest --stanza=mydb archive-get %f "%p"' # Initialize pgBackRest sudo -u postgres pgbackrest --stanza=mydb stanza-create # Take a full backup sudo -u postgres pgbackrest --stanza=mydb --type=full backup # Restore to specific time sudo -u postgres pgbackrest --stanza=mydb \ --target="2024-01-15 15:47:21" \ --target-action=promote \ restore
4

Backup Strategy Builder

Get the right backup plan for your scenario

What Backup Do You Need?

🛡️ Backup Strategy Advisor Interactive
Select your requirements and click "Build Strategy"

Backup Method Comparison

Method Granularity PITR? Speed Best for
pg_dump Database / table No Slow for large DBs Small DBs, dev, schema migrations
pg_basebackup only Full cluster No Fast (file copy) Replication setup, base for PITR
pg_basebackup + WAL archive Full cluster Yes Fast restore Production — most common
pgBackRest Full cluster + incremental Yes Fastest (parallel) Large production DBs
Managed cloud (RDS, Supabase) Full cluster Yes Managed for you Cloud-hosted apps — easiest
5

Backup Golden Rules & Testing

A backup you haven't tested is not a backup

The 3-2-1 Backup Rule

Industry standard for reliable backups:

3 Copies
Keep 3 copies of your data — the original + 2 backups.
2 Different Media
Store on 2 different types of media (local disk + cloud S3).
1 Offsite
Keep 1 copy offsite — different cloud region or physical location.

Always Test Your Restores

🚨
The #1 backup mistake: Never testing restores. Companies discover their backups are corrupt or incomplete only when disaster strikes. Test your restore process every month in a staging environment.
-- Monthly restore test script (automate this!) # 1. Pick the latest backup LATEST=$(aws s3 ls s3://my-bucket/postgres-backups/ | sort | tail -n1 | awk '{print $4}') # 2. Restore to a test database aws s3 cp s3://my-bucket/postgres-backups/$LATEST /tmp/test_restore.dump createdb -U postgres restore_test pg_restore -U postgres -d restore_test /tmp/test_restore.dump # 3. Verify critical tables exist and have data psql -U postgres -d restore_test -c " SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::regclass)) FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(tablename::regclass) DESC LIMIT 10; " # 4. Check row counts match expected psql -U postgres -d restore_test -c "SELECT COUNT(*) FROM users;" psql -U postgres -d restore_test -c "SELECT COUNT(*) FROM orders;" # 5. Clean up dropdb -U postgres restore_test echo "✅ Restore test completed successfully"

Production Checklist

TaskFrequencyWhy
pg_dump (logical backup)DailyTable-level recovery, cross-version portability
pg_basebackup (full)Daily or weeklyPITR starting point
WAL archivingContinuousPITR between backups
Test restoreMonthlyVerify backups actually work
Verify archive_command successDaily alertDetect silent failures
Check backup storage sizeWeeklyAvoid running out of space
Rotate old backupsPer retention policyCost control
Cloud managed databases (AWS RDS, Supabase, Neon, Google Cloud SQL) handle all of this for you automatically — they do continuous WAL archiving and provide PITR out of the box. Use them unless you have specific reasons for self-hosted PostgreSQL.
Topic 15 Summary
pg_dump
Logical backup — SQL or compressed archive. Use -Fc format. Restores with pg_restore. No PITR.
Full Backup
pg_basebackup copies the entire cluster. Foundation for PITR. Use -R flag for replica or -Ft -z for compressed archive.
PITR
Base Backup + continuous WAL archiving = recover to any second. Set archive_mode=on, archive_command, restore_command.
WAL Recovery
Crash recovery is automatic — PostgreSQL replays WAL on startup. PITR is manual — you restore base backup + replay WAL to a target time.
Next up: Topic 16 — Vacuum (dead tuples, autovacuum, VACUUM FULL, freeze) — reply OK / Next when ready!