1

Dead Tuples

Why PostgreSQL leaves "ghost" rows behind — and what that costs you

What is a Dead Tuple?

Remember MVCC (Topic 8)? When you UPDATE or DELETE a row, PostgreSQL does not overwrite or remove the old data immediately. Instead it marks the old row version as no longer visible by stamping a transaction ID on xmax. That obsolete row version is called a dead tuple.

Table: orders (heap storage) ┌─────────────────────────────────────────────────────┐ │ Page 1 │ │ │ │ Tuple 1 xmin=100 xmax=0 → LIVE (current row) │ │ Tuple 2 xmin=95 xmax=110 → DEAD (old version) │ │ Tuple 3 xmin=88 xmax=102 → DEAD (deleted row) │ │ Tuple 4 xmin=112 xmax=0 → LIVE (current row) │ │ │ │ Free space wasted by dead tuples: ~40% │ └─────────────────────────────────────────────────────┘

Dead tuples take up real disk space and slow down every sequential scan because PostgreSQL must read them and then discard them while looking for live rows.

How Dead Tuples Accumulate

Any write operation creates dead tuples:

OperationWhat happensDead tuple created?
INSERTAdds a new live tupleNo
UPDATEInserts a new version + marks old as deadYes
DELETEMarks the row as deadYes
ROLLBACKAny inserted rows become deadYes

Seeing Dead Tuples

-- Create a test table CREATE TABLE orders ( id serial PRIMARY KEY, status text ); INSERT INTO orders (status) SELECT 'pending' FROM generate_series(1,10000); -- Now update half of them → 10000 dead tuples created UPDATE orders SET status = 'completed' WHERE id <= 5000; -- Check dead tuples in pg_stat_user_tables SELECT relname AS table_name, n_live_tup AS live_rows, n_dead_tup AS dead_rows, last_vacuum, last_autovacuum FROM pg_stat_user_tables WHERE relname = 'orders'; -- Result will show: -- live_rows = 10000 -- dead_rows = 5000 ← these need to be cleaned up

Why Dead Tuples are Dangerous

Bloat
Tables grow larger than necessary. A 100 MB table can balloon to 500 MB if vacuumed poorly.
Slow Scans
Sequential scans read dead tuples and throw them away — CPU and I/O wasted on every query.
Index Bloat
Index entries pointing to dead heap tuples remain until VACUUM cleans them up.
XID Wraparound
Transaction IDs are 32-bit. Without VACUUM, they can wrap around causing data loss. More on this in section 5.
2

VACUUM

The standard vacuum — non-blocking cleanup of dead tuples

What VACUUM Does

VACUUM scans pages, finds dead tuples that are no longer visible to any active transaction, and marks that space as reusable. It does not return space to the OS — that space stays in the table file but PostgreSQL can reuse it for future inserts/updates.

Non-blocking: Regular VACUUM runs alongside normal reads and writes. It does NOT lock your table.
1
Scan heap pages — identify dead tuples whose xmax is older than the oldest active transaction.
2
Remove index entries — clean up index pointers that reference dead heap tuples.
3
Mark space free — update the Free Space Map (FSM) so new rows can reuse these pages.
4
Update visibility map — mark pages as "all-visible" if every tuple on them is live. Speeds up index-only scans.
5
Advance relfrozenxid — freezes old transaction IDs to prevent wraparound (see section 5).

Running VACUUM

-- Vacuum a specific table VACUUM orders; -- Vacuum with verbose output (see what it's doing) VACUUM (VERBOSE) orders; -- Vacuum and update statistics for the query planner VACUUM ANALYZE orders; -- Vacuum the entire database VACUUM; -- Verbose output from VACUUM shows: -- INFO: vacuuming "public.orders" -- INFO: scanned index "orders_pkey" to remove 5000 row versions -- INFO: "orders": removed 5000 row versions in 28 pages -- INFO: "orders": found 5000 removable, 10000 nonremovable row versions -- INFO: free space: ...

What VACUUM Does NOT Do

⚠️
Space is not returned to OS: VACUUM marks dead space as reusable inside the file, but the file size on disk does NOT shrink. For that, you need VACUUM FULL.
💡
Analogy: Think of a notebook. VACUUM erases the old text (dead tuples) so you can write new text in those pages. But the notebook itself stays the same size — it doesn't shrink. VACUUM FULL would be like rewriting the whole notebook into a smaller one.

Before vs After VACUUM

BEFORE VACUUM Page 1: [LIVE][DEAD][DEAD][LIVE][DEAD] ← 3 wasted slots Page 2: [DEAD][LIVE][DEAD][DEAD][LIVE] ← 3 wasted slots Page 3: [LIVE][LIVE][DEAD][LIVE][DEAD] ← 2 wasted slots AFTER VACUUM Page 1: [LIVE][FREE][FREE][LIVE][FREE] ← 3 reusable slots Page 2: [FREE][LIVE][FREE][FREE][LIVE] ← 3 reusable slots Page 3: [LIVE][LIVE][FREE][LIVE][FREE] ← 2 reusable slots File size on disk: SAME (space reclaimed internally, not from OS)
3

VACUUM FULL

Aggressive space reclamation — use with caution

What VACUUM FULL Does

VACUUM FULL rewrites the entire table into a new file on disk, removing all dead tuples and compacting the live rows. The result: the table file on disk actually shrinks and disk space is returned to the OS.

BEFORE VACUUM FULL orders.dat (500 MB) ┌──────────────────────────────────────────┐ │ LIVE │ DEAD │ DEAD │ LIVE │ DEAD │ LIVE │ │ DEAD │ LIVE │ DEAD │ DEAD │ LIVE │ DEAD │ │ ... 500 MB of mixed live/dead ... │ └──────────────────────────────────────────┘ AFTER VACUUM FULL orders.dat (120 MB) ┌──────────────────────┐ │ LIVE │ LIVE │ LIVE │ │ LIVE │ LIVE │ LIVE │ │ ... 120 MB compact │ └──────────────────────┘ 380 MB returned to OS ✓

Running VACUUM FULL

-- VACUUM FULL on a specific table VACUUM FULL orders; -- With verbose output VACUUM FULL VERBOSE orders; -- Check table size before and after SELECT pg_size_pretty(pg_total_relation_size('orders')) AS total_size, pg_size_pretty(pg_relation_size('orders')) AS table_size, pg_size_pretty(pg_indexes_size('orders')) AS index_size;

The Big Problem: Exclusive Lock

🚨
VACUUM FULL acquires an ACCESS EXCLUSIVE lock. This blocks ALL reads and writes to the table while it runs. On a 10 GB table, this could take 10–30 minutes of downtime. Never run this on a production table during peak hours without a plan!
VACUUMVACUUM FULL
Reclaims dead space?Yes (internally)Yes (to OS)
Shrinks file on disk?NoYes
Locks table?No (non-blocking)Yes (full lock)
Rewrites table?NoYes
Rebuilds indexes?NoYes
Safe in production?YesUse with care

Alternative: pg_repack

If you need the space-reclamation of VACUUM FULL without the lock, use the pg_repack extension. It rebuilds the table in the background and then swaps it in with a very brief lock.

-- Install pg_repack (system level) -- sudo apt-get install postgresql-14-repack -- Enable extension CREATE EXTENSION pg_repack; -- Run pg_repack (from terminal, not psql) -- pg_repack -d mydb -t orders -- This repacks 'orders' online with minimal locking!
4

Autovacuum

PostgreSQL's built-in background janitor — how it works and how to tune it

What is Autovacuum?

Autovacuum is a background process (actually a launcher + workers) that automatically runs VACUUM and ANALYZE when a table accumulates enough dead tuples or stale statistics. It's enabled by default and is the primary reason most PostgreSQL databases don't need manual vacuuming in day-to-day operations.

Autovacuum Architecture PostgreSQL ├── Postmaster (main process) │ ├── autovacuum launcher ← wakes every autovacuum_naptime (default 1 min) │ │ └── Monitors all tables in pg_stat_user_tables │ │ If dead tuples exceed threshold → spawns a worker │ │ │ ├── autovacuum worker 1 ← VACUUMs table A │ ├── autovacuum worker 2 ← VACUUMs table B │ └── autovacuum worker 3 ← ANALYZEs table C │ │ Max concurrent workers: autovacuum_max_workers (default 3)

When Does Autovacuum Trigger?

Autovacuum fires on a table when dead tuples exceed this threshold:

VACUUM trigger formula: threshold = autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor × n_live_tup) Default values: autovacuum_vacuum_threshold = 50 (minimum 50 dead tuples) autovacuum_vacuum_scale_factor = 0.2 (20% of table size) Example for a 10,000 row table: threshold = 50 + (0.2 × 10,000) = 2,050 dead tuples → When orders has 2,050 dead tuples, autovacuum kicks in. ANALYZE trigger formula: threshold = autovacuum_analyze_threshold + (autovacuum_analyze_scale_factor × n_live_tup) threshold = 50 + (0.1 × 10,000) = 1,050 changed tuples

Key Autovacuum Settings (postgresql.conf)

-- View current autovacuum settings SHOW autovacuum; SHOW autovacuum_max_workers; SHOW autovacuum_naptime; SHOW autovacuum_vacuum_threshold; SHOW autovacuum_vacuum_scale_factor; -- Or see all at once SELECT name, setting, unit FROM pg_settings WHERE name LIKE 'autovacuum%' ORDER BY name;
SettingDefaultMeaning
autovacuumonEnable autovacuum. Never turn this off!
autovacuum_max_workers3Max concurrent autovacuum workers
autovacuum_naptime1minHow often the launcher checks for work
autovacuum_vacuum_threshold50Minimum dead tuples before vacuum
autovacuum_vacuum_scale_factor0.2% of table that must be dead before vacuum
autovacuum_analyze_threshold50Min changed tuples before ANALYZE
autovacuum_analyze_scale_factor0.1% of table that triggers ANALYZE
autovacuum_vacuum_cost_delay2msThrottle speed to reduce I/O impact

Per-Table Autovacuum Tuning

For high-write tables (e.g., an events log that gets millions of inserts per day), the default 20% scale factor is too high. Tune it per-table:

-- For a high-churn table like 'events' (millions of rows), -- trigger vacuum when 1000 dead tuples accumulate (not 20%) ALTER TABLE events SET ( autovacuum_vacuum_threshold = 1000, autovacuum_vacuum_scale_factor = 0.01, -- 1% instead of 20% autovacuum_analyze_threshold = 500, autovacuum_analyze_scale_factor = 0.005 ); -- For a rarely-updated reference table like 'countries', -- no need to vacuum often ALTER TABLE countries SET ( autovacuum_vacuum_scale_factor = 0.5, -- 50% before triggering autovacuum_analyze_scale_factor = 0.3 ); -- See per-table settings SELECT relname, reloptions FROM pg_class WHERE reloptions IS NOT NULL;

Monitoring Autovacuum Activity

-- See tables most in need of vacuuming SELECT relname AS table_name, n_dead_tup AS dead_rows, n_live_tup AS live_rows, round(n_dead_tup::numeric / nullif(n_live_tup, 0) * 100, 2) AS dead_pct, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 20; -- See currently running autovacuum workers SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE query LIKE 'autovacuum:%';

Autovacuum vs Manual VACUUM — When to Use Each

SituationRecommended Action
Normal day-to-day operationAutovacuum handles it automatically
Just ran a massive DELETE / bulk loadManual VACUUM ANALYZE
Table is severely bloated (need OS space back)Schedule VACUUM FULL during maintenance window
Statistics are stale after large data importManual ANALYZE
Approaching XID wraparound (age > 1 billion)Urgent: Manual VACUUM FREEZE
5

Freeze (XID Wraparound)

The most critical vacuum job — preventing transaction ID exhaustion

The Transaction ID Problem

PostgreSQL uses a 32-bit integer as a transaction ID (XID). That gives ~2.1 billion unique transaction IDs. In a busy system doing 1000 transactions/second, you'd exhaust all IDs in about 25 days. When IDs wrap around, PostgreSQL can no longer tell which rows are "older" than others — leading to data corruption or an emergency shutdown.

Transaction ID Space (32-bit = 2,147,483,648 IDs) 0 500M 1B 1.5B 2B Wraparound |─────────|──────────|──────────|──────────|──────────▶ ↑ DANGER ZONE: past this point, old rows look "in the future" and become INVISIBLE → data loss! PostgreSQL actually uses a circular model: - Half the space (1B) is considered "past" (visible) - Half (1B) is "future" (invisible) - After 2B transactions, XIDs wrap and old rows become "future"

What Freezing Does

Freezing solves wraparound by replacing a tuple's xmin with the special value FrozenTransactionId (XID 2). A frozen tuple is always visible to all transactions, regardless of what XIDs are in use — it has permanently escaped the wraparound problem.

BEFORE freeze: Tuple: xmin = 500 ← a real XID from 3 years ago → After 2B more transactions, this could wrap around AFTER freeze: Tuple: xmin = FrozenXID (2) ← special constant, always visible → Safe forever, no wraparound risk ✓

Monitoring Wraparound Risk

-- Check how close each table is to wraparound -- age() returns: current_xid - relfrozenxid -- If age > 1 billion → urgent vacuum needed! -- If age > 2 billion → PostgreSQL shuts down in self-defense SELECT relname AS table_name, age(relfrozenxid) AS xid_age, pg_size_pretty(pg_relation_size(oid)) AS table_size FROM pg_class WHERE relkind = 'r' ORDER BY age(relfrozenxid) DESC LIMIT 20; -- Check database-level wraparound risk SELECT datname, age(datfrozenxid) AS xid_age FROM pg_database ORDER BY age(datfrozenxid) DESC; -- Safe zone: age < 200,000,000 (200M) -- Warning: age > 500,000,000 (500M) -- Danger: age > 1,500,000,000 (1.5B) -- Emergency: age > 2,100,000,000 (database shuts down!)

Manual Freeze

-- Force freeze all tuples old enough (standard freeze) VACUUM FREEZE orders; -- Most aggressive: freeze everything in the entire database -- (use during maintenance window only) VACUUM FREEZE; -- autovacuum handles freezing automatically based on: -- vacuum_freeze_min_age (default 50M XIDs old) -- vacuum_freeze_table_age (default 150M XIDs old → full table scan for freeze) SHOW vacuum_freeze_min_age; SHOW vacuum_freeze_table_age;
🚨
Real incident: If you see the PostgreSQL log saying "database is not accepting commands to avoid wraparound data loss" — your DB has shut itself down. You must start it in single-user mode and run VACUUM FREEZE to recover. Always monitor XID age in production!

Autovacuum and Freezing

Autovacuum automatically handles freezing based on two thresholds:

-- autovacuum freezes tuples older than vacuum_freeze_min_age (50M XIDs) -- autovacuum does a full-table freeze scan when table age exceeds -- autovacuum_freeze_max_age (default 200M XIDs) SHOW autovacuum_freeze_max_age; -- 200000000 (200M) -- For safety, autovacuum will force-vacuum a table when its age -- approaches autovacuum_freeze_max_age, even if it doesn't -- have enough dead tuples to normally trigger vacuum. -- This is called an "anti-wraparound vacuum". SELECT relname, age(relfrozenxid) AS table_age, 200000000 - age(relfrozenxid) AS transactions_until_antiwrap_vacuum FROM pg_class WHERE relkind = 'r' ORDER BY age(relfrozenxid) DESC;
6

Real-World Vacuum Patterns

Practical playbooks for production systems

Playbook 1 — After a Bulk DELETE

You delete 5 million old records from an archive table. Autovacuum won't catch up fast enough. Manually run vacuum immediately:

-- Step 1: Delete old data DELETE FROM events WHERE created_at < NOW() - INTERVAL '1 year'; -- Step 2: Immediately vacuum + analyze so: -- a) dead tuples are reclaimed -- b) query planner gets fresh statistics VACUUM ANALYZE events; -- Optional: if disk space is critical, reclaim space to OS -- (only during a maintenance window!) VACUUM FULL events;

Playbook 2 — High-Traffic Tables

An orders table getting 50,000 updates/minute. Fine-tune autovacuum to be more aggressive:

-- Lower the trigger threshold so vacuum runs more often ALTER TABLE orders SET ( autovacuum_vacuum_threshold = 500, -- trigger after 500 dead tuples autovacuum_vacuum_scale_factor = 0.02, -- or 2% of table size autovacuum_vacuum_cost_delay = 2, -- ms of sleep between cost bursts (reduce I/O) autovacuum_vacuum_cost_limit = 400 -- cost units before sleeping (higher = faster vacuum) );

Playbook 3 — Bloat Detection Query

-- Estimate table bloat (how much of the table is wasted space) SELECT schemaname, tablename, pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS actual_size, n_dead_tup, n_live_tup, round( 100 * n_dead_tup::numeric / nullif(n_dead_tup + n_live_tup, 0), 2 ) AS bloat_pct FROM pg_stat_user_tables WHERE n_live_tup > 0 ORDER BY bloat_pct DESC LIMIT 10; -- Rule of thumb: -- bloat_pct < 20% → healthy -- bloat_pct 20-50% → consider tuning autovacuum -- bloat_pct > 50% → urgent vacuum needed

Playbook 4 — Weekly Health Check Script

-- Run this weekly to catch problems early SELECT '=== VACUUM HEALTH CHECK ===' AS report; -- 1. Tables not vacuumed in > 24 hours SELECT relname, last_autovacuum, n_dead_tup FROM pg_stat_user_tables WHERE (last_autovacuum < NOW() - INTERVAL '1 day' OR last_autovacuum IS NULL) AND n_dead_tup > 1000; -- 2. XID wraparound risk SELECT relname, age(relfrozenxid) AS xid_age FROM pg_class WHERE relkind = 'r' AND age(relfrozenxid) > 500000000 -- 500M threshold ORDER BY age(relfrozenxid) DESC; -- 3. Tables with high bloat SELECT relname, round(n_dead_tup::numeric / nullif(n_live_tup,0)*100,2) AS bloat_pct FROM pg_stat_user_tables WHERE n_live_tup > 1000 ORDER BY bloat_pct DESC LIMIT 5;

Summary — Everything in One View

ConceptWhat It IsKey Action
Dead TupleOld row version left by UPDATE/DELETE via MVCCCleaned by VACUUM
VACUUMMarks dead space as reusable; non-blockingRun after big writes, or let autovacuum handle it
VACUUM FULLRewrites table to disk; shrinks file sizeOnly during maintenance windows (locks table!)
AutovacuumBackground daemon that auto-triggers VACUUM + ANALYZETune per-table for high-churn tables
FreezeReplaces old XID with FrozenXID to prevent wraparoundMonitor age(relfrozenxid) in production

Topic 16 of 18 · Next: Security & RLS