Vacuum
How PostgreSQL keeps tables healthy, reclaims space, prevents transaction ID wraparound, and lets autovacuum do the heavy lifting automatically.
Dead Tuples
Why PostgreSQL leaves "ghost" rows behind — and what that costs you
What is a Dead Tuple?
Remember MVCC (Topic 8)? When you UPDATE or DELETE a row,
PostgreSQL does not overwrite or remove the old data immediately.
Instead it marks the old row version as no longer visible by stamping a
transaction ID on xmax. That obsolete row version is called a
dead tuple.
Dead tuples take up real disk space and slow down every sequential scan because PostgreSQL must read them and then discard them while looking for live rows.
How Dead Tuples Accumulate
Any write operation creates dead tuples:
| Operation | What happens | Dead tuple created? |
|---|---|---|
INSERT | Adds a new live tuple | No |
UPDATE | Inserts a new version + marks old as dead | Yes |
DELETE | Marks the row as dead | Yes |
ROLLBACK | Any inserted rows become dead | Yes |
Seeing Dead Tuples
Why Dead Tuples are Dangerous
VACUUM
The standard vacuum — non-blocking cleanup of dead tuples
What VACUUM Does
VACUUM scans pages, finds dead tuples that are no longer visible to any
active transaction, and marks that space as reusable. It does
not return space to the OS — that space stays in the table file but
PostgreSQL can reuse it for future inserts/updates.
xmax is older than the oldest active transaction.Running VACUUM
What VACUUM Does NOT Do
VACUUM FULL.Before vs After VACUUM
VACUUM FULL
Aggressive space reclamation — use with caution
What VACUUM FULL Does
VACUUM FULL rewrites the entire table into a new file on disk, removing
all dead tuples and compacting the live rows. The result: the table file on disk
actually shrinks and disk space is returned to the OS.
Running VACUUM FULL
The Big Problem: Exclusive Lock
| VACUUM | VACUUM FULL | |
|---|---|---|
| Reclaims dead space? | Yes (internally) | Yes (to OS) |
| Shrinks file on disk? | No | Yes |
| Locks table? | No (non-blocking) | Yes (full lock) |
| Rewrites table? | No | Yes |
| Rebuilds indexes? | No | Yes |
| Safe in production? | Yes | Use with care |
Alternative: pg_repack
If you need the space-reclamation of VACUUM FULL without the lock, use the
pg_repack extension. It rebuilds the table in the background and then swaps
it in with a very brief lock.
Autovacuum
PostgreSQL's built-in background janitor — how it works and how to tune it
What is Autovacuum?
Autovacuum is a background process (actually a launcher + workers) that
automatically runs VACUUM and ANALYZE when a table accumulates
enough dead tuples or stale statistics. It's enabled by default and is the primary
reason most PostgreSQL databases don't need manual vacuuming in day-to-day operations.
When Does Autovacuum Trigger?
Autovacuum fires on a table when dead tuples exceed this threshold:
Key Autovacuum Settings (postgresql.conf)
| Setting | Default | Meaning |
|---|---|---|
autovacuum | on | Enable autovacuum. Never turn this off! |
autovacuum_max_workers | 3 | Max concurrent autovacuum workers |
autovacuum_naptime | 1min | How often the launcher checks for work |
autovacuum_vacuum_threshold | 50 | Minimum dead tuples before vacuum |
autovacuum_vacuum_scale_factor | 0.2 | % of table that must be dead before vacuum |
autovacuum_analyze_threshold | 50 | Min changed tuples before ANALYZE |
autovacuum_analyze_scale_factor | 0.1 | % of table that triggers ANALYZE |
autovacuum_vacuum_cost_delay | 2ms | Throttle speed to reduce I/O impact |
Per-Table Autovacuum Tuning
For high-write tables (e.g., an events log that gets millions of inserts per day),
the default 20% scale factor is too high. Tune it per-table:
Monitoring Autovacuum Activity
Autovacuum vs Manual VACUUM — When to Use Each
| Situation | Recommended Action |
|---|---|
| Normal day-to-day operation | Autovacuum handles it automatically |
| Just ran a massive DELETE / bulk load | Manual VACUUM ANALYZE |
| Table is severely bloated (need OS space back) | Schedule VACUUM FULL during maintenance window |
| Statistics are stale after large data import | Manual ANALYZE |
| Approaching XID wraparound (age > 1 billion) | Urgent: Manual VACUUM FREEZE |
Freeze (XID Wraparound)
The most critical vacuum job — preventing transaction ID exhaustion
The Transaction ID Problem
PostgreSQL uses a 32-bit integer as a transaction ID (XID). That gives ~2.1 billion unique transaction IDs. In a busy system doing 1000 transactions/second, you'd exhaust all IDs in about 25 days. When IDs wrap around, PostgreSQL can no longer tell which rows are "older" than others — leading to data corruption or an emergency shutdown.
What Freezing Does
Freezing solves wraparound by replacing a tuple's xmin
with the special value FrozenTransactionId (XID 2). A frozen tuple is
always visible to all transactions, regardless of what XIDs are in use — it
has permanently escaped the wraparound problem.
Monitoring Wraparound Risk
Manual Freeze
Autovacuum and Freezing
Autovacuum automatically handles freezing based on two thresholds:
Real-World Vacuum Patterns
Practical playbooks for production systems
Playbook 1 — After a Bulk DELETE
You delete 5 million old records from an archive table. Autovacuum won't catch up fast enough. Manually run vacuum immediately:
Playbook 2 — High-Traffic Tables
An orders table getting 50,000 updates/minute. Fine-tune autovacuum
to be more aggressive:
Playbook 3 — Bloat Detection Query
Playbook 4 — Weekly Health Check Script
Summary — Everything in One View
| Concept | What It Is | Key Action |
|---|---|---|
| Dead Tuple | Old row version left by UPDATE/DELETE via MVCC | Cleaned by VACUUM |
| VACUUM | Marks dead space as reusable; non-blocking | Run after big writes, or let autovacuum handle it |
| VACUUM FULL | Rewrites table to disk; shrinks file size | Only during maintenance windows (locks table!) |
| Autovacuum | Background daemon that auto-triggers VACUUM + ANALYZE | Tune per-table for high-churn tables |
| Freeze | Replaces old XID with FrozenXID to prevent wraparound | Monitor age(relfrozenxid) in production |