Full Text Search
Search inside text like a search engine — no external tools, built right into PostgreSQL.
What Is Full Text Search?
LIKE vs FTS — and how PostgreSQL thinks about text
🤔 The Problem with LIKE
The obvious way to search text is LIKE — but it has serious limitations in production:
SELECT * FROM articles WHERE body LIKE '%running%'; Problems: • Can't use indexes (full table scan) • Won't match "run", "runs", "runner" • Case sensitive by default • No relevance ranking • Extremely slow on large tables
SELECT * FROM articles
WHERE to_tsvector('english', body)
@@ to_tsquery('english', 'running');
Benefits:
• GIN index — lightning fast
• Matches "run", "runs", "runner"
• Language-aware stemming
• Relevance ranking (ts_rank)
• Scales to millions of rows
🧠 How PostgreSQL FTS Works
PostgreSQL FTS has two main building blocks:
& AND, | OR, ! NOT, <-> phrase). This is what the user searches for.
RAW TEXT: "PostgreSQL is running fast and Postgres runs well"
│
▼ to_tsvector('english', ...)
│
┌────────────────────────────────────────────┐
│ Parser: split into tokens │
│ "PostgreSQL" "is" "running" "fast" ... │
│ │ │
│ Stop words removed: "is", "and", "well" │
│ │ │
│ Stemmer: "running" → "run" │
│ "runs" → "run" │
│ "fast" → "fast" │
│ │ │
│ RESULT tsvector: │
│ 'fast':4 'postg':1,7 'run':3,6 │
└────────────────────────────────────────────┘
│
▼ @@ (match operator)
│
tsquery: to_tsquery('english', 'run & fast')
= 'run' & 'fast'
│
▼
MATCH! ✅ (both lexemes appear in tsvector)
🔑 Key Concepts at a Glance
| Concept | What It Is | Example |
|---|---|---|
| Lexeme | Normalised word stem — the dictionary form | "running" → run |
| Stop word | Common word filtered out (carries no search value) | "is", "the", "and", "a" |
| Stemming | Reducing a word to its root form | "quickly" → quick |
| tsvector | Preprocessed, indexed form of a document | 'fast':2 'run':1,5 |
| tsquery | A search expression with boolean logic | 'run' & 'fast' |
| @@ | The match operator — does tsvector match tsquery? | tsvector @@ tsquery |
| ts_rank | Ranks results by relevance (0.0–1.0) | ts_rank(vec, query) |
| GIN index | Index type for fast FTS queries | CREATE INDEX ... USING GIN |
to_tsvector
Converting text into a searchable document
📖 Syntax & Basic Usage
to_tsvector(language, text) converts raw text into a tsvector. The language controls the stop-word list and stemmer.
-- Basic conversion SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog'); -- Result: -- 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2 -- Notice: "The", "over" removed (stop words) -- "jumps" → "jump", "lazy" → "lazi" (stemmed) -- Each word gets a position number (used for phrase search) SELECT to_tsvector('english', 'PostgreSQL runs fast. PostgreSQL is powerful.'); -- 'fast':3 'postgresql':1,5 'powerful':6 'run':2 -- "postgresql" appears at positions 1 and 5 -- Without language — uses default_text_search_config SELECT to_tsvector('Databases are amazing tools for data storage'); -- 'amaz':3 'data':6 'databas':1 'storag':7 'tool':4
🔤 What Happens to Each Word
Let's trace the sentence "The users are quickly running their PostgreSQL databases" step by step:
-- Verify the stemming yourself: SELECT to_tsvector('english', 'The users are quickly running their PostgreSQL databases'); -- Result: 'databas':8 'postgresql':7 'quick':4 'run':5 'user':2 -- Positions: user=2, quick=4, run=5, postgresql=7, databas=8
🗄️ Storing tsvector in a Table Column
For production, don't call to_tsvector() on every query — store the result in a dedicated column and keep it updated with a trigger or generated column.
-- Option 1: Generated column (PG 12+) — auto-updated on INSERT/UPDATE CREATE TABLE articles ( id BIGSERIAL PRIMARY KEY, title TEXT NOT NULL, body TEXT NOT NULL, author TEXT, created_at TIMESTAMPTZ DEFAULT NOW(), -- Auto-generated tsvector: weighted title (A) + body (B) search_vec TSVECTOR GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(body, '')), 'B') ) STORED ); -- Option 2: Trigger-based (PG 11 and earlier) CREATE TABLE articles ( id BIGSERIAL PRIMARY KEY, title TEXT, body TEXT, search_vec TSVECTOR -- manually maintained ); -- Trigger to keep search_vec up to date CREATE OR REPLACE FUNCTION articles_search_update() RETURNS TRIGGER AS $$ BEGIN NEW.search_vec := setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') || setweight(to_tsvector('english', coalesce(NEW.body, '')), 'B'); RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER articles_search_trigger BEFORE INSERT OR UPDATE ON articles FOR EACH ROW EXECUTE FUNCTION articles_search_update();
ts_rank. This is how Google-style "title match beats body match" works.🌍 Supported Languages
-- List all available text search configurations (languages) SELECT cfgname FROM pg_ts_config; -- Common ones: -- english, spanish, french, german, portuguese, -- italian, dutch, russian, simple -- 'simple' — no stemming, no stop words (good for codes/IDs) SELECT to_tsvector('simple', 'The quick brown fox'); -- 'brown':3 'fox':4 'quick':2 'the':1 (all words kept!) -- Check current default SHOW default_text_search_config; -- usually 'pg_catalog.english' -- Set default for session SET default_text_search_config = 'english';
to_tsquery
Building search queries with boolean logic
🔍 Query Functions
PostgreSQL provides three functions to build a tsquery, each with different strictness:
| Function | Behaviour | Best For |
|---|---|---|
to_tsquery() |
Each word must be a valid lexeme. Errors on stop words. | Internal/programmatic queries where you control input |
plainto_tsquery() |
Plain text → AND of all words. Stop words silently ignored. | Simple user search box — "postgres fast query" |
phraseto_tsquery() |
Words must appear in order, adjacent. | Exact phrase search — "machine learning" |
websearch_to_tsquery() |
Google-style: AND/OR/NOT/phrases from natural input. | User-facing search with "quotes" and -exclusions |
🛠️ to_tsquery — Full Boolean Control
-- AND: both words must appear SELECT to_tsquery('english', 'postgresql & performance'); -- 'postgresql' & 'perform' (both must match) -- OR: either word SELECT to_tsquery('english', 'postgresql | mysql'); -- 'postgresql' | 'mysql' -- NOT: word must NOT appear SELECT to_tsquery('english', 'database & !nosql'); -- 'databas' & !'nosql' -- Phrase: words adjacent in order (distance operator) SELECT to_tsquery('english', 'machine <-> learning'); -- 'machin' <1> 'learn' (must appear next to each other) -- Custom distance: within N words SELECT to_tsquery('english', 'query <2> optimization'); -- must appear within 2 positions of each other -- Prefix matching with :* SELECT to_tsquery('english', 'postg:*'); -- matches: postgres, postgresql, postgreSQL ✅ -- Combined complex query SELECT to_tsquery('english', '(postgresql | mysql) & (index | performance) & !slow'); -- Reads: (postgresql or mysql) AND (index or performance) AND NOT slow
💬 plainto_tsquery — Best for User Input
-- Just paste raw user text — it handles everything SELECT plainto_tsquery('english', 'the quick brown fox'); -- 'quick' & 'brown' & 'fox' ("the" removed as stop word) SELECT plainto_tsquery('english', 'postgres query optimization tips'); -- 'postgr' & 'queri' & 'optim' & 'tip' -- No error on stop words (unlike to_tsquery) SELECT plainto_tsquery('english', 'the is are'); -- '' (empty — all stop words, no error) ⚠️ check for empty result
🌐 websearch_to_tsquery — Google-Style
-- Double quotes = phrase search SELECT websearch_to_tsquery('english', '"machine learning" database'); -- 'machin' <-> 'learn' & 'databas' -- Minus = NOT SELECT websearch_to_tsquery('english', 'postgresql -mysql'); -- 'postgresql' & !'mysql' -- OR keyword SELECT websearch_to_tsquery('english', 'postgres OR mysql fast'); -- ('postgresql' | 'mysql') & 'fast' -- Real-world: build a search API endpoint SELECT id, title FROM articles WHERE search_vec @@ websearch_to_tsquery('english', 'postgresql "query planner" -slow') ORDER BY ts_rank(search_vec, websearch_to_tsquery('english', 'postgresql "query planner" -slow')) DESC LIMIT 10;
⚡ The Match Operator @@
-- @@ returns TRUE if tsvector matches tsquery SELECT to_tsvector('english', 'PostgreSQL is fast and powerful') @@ to_tsquery('english', 'fast & powerful') AS match; -- Result: true ✅ SELECT to_tsvector('english', 'PostgreSQL is fast and powerful') @@ to_tsquery('english', 'fast & mysql') AS match; -- Result: false ❌ (mysql not in text) -- Using in a WHERE clause — this is the core FTS query pattern SELECT id, title FROM articles WHERE search_vec @@ plainto_tsquery('english', 'postgres index performance');
Ranking with ts_rank
Sorting results by relevance — like a real search engine
📊 What Is ts_rank?
ts_rank(tsvector, tsquery) returns a float between 0 and 1 indicating how relevant a document is to the search query. Higher = more relevant. It considers frequency of the matching lexemes and their weights (A > B > C > D).
-- Basic ranking: sort by relevance score descending SELECT id, title, ts_rank(search_vec, to_tsquery('english', 'postgresql & index')) AS rank FROM articles WHERE search_vec @@ to_tsquery('english', 'postgresql & index') ORDER BY rank DESC LIMIT 10;
⚖️ ts_rank vs ts_rank_cd
| Function | Algorithm | Use When |
|---|---|---|
ts_rank() |
Based on frequency of matching lexemes. The more times a word appears, the higher the rank. | General purpose. Short documents (articles, descriptions). |
ts_rank_cd() |
Cover Density ranking — also considers proximity of matched terms to each other. | Long documents where clustering of terms matters (books, manuals). |
-- ts_rank_cd — better for long documents SELECT id, title, ts_rank_cd(search_vec, query) AS rank FROM articles, to_tsquery('english', 'postgresql & performance') query WHERE search_vec @@ query ORDER BY rank DESC;
🏷️ Weight Ranking — Title Beats Body
When you use setweight(), words from title (A) are ranked higher than words from body (B). You can tune the weight multipliers:
-- Default weights: D=0.1, C=0.2, B=0.4, A=1.0 -- Pass custom weights array: {D, C, B, A} SELECT id, title, ts_rank( search_vec, to_tsquery('english', 'postgresql'), 32 -- normalization: divide rank by document length ) AS rank FROM articles WHERE search_vec @@ to_tsquery('english', 'postgresql') ORDER BY rank DESC; -- Normalization values (can be OR'd together): -- 0 = ignore document length (default) -- 1 = divide by 1 + log(length) -- 2 = divide by document length -- 4 = divide by mean harmonic distance between extents -- 8 = divide by number of unique words -- 16 = divide by 1 + log(unique word count) -- 32 = divide by rank + 1 (most common normalization)
📌 ts_headline — Highlight Matches
ts_headline() wraps matching words in your text with HTML tags — essential for showing search snippets like Google does.
-- Generate highlighted snippet SELECT id, title, ts_headline( 'english', body, to_tsquery('english', 'postgresql & index'), 'MaxWords=35, MinWords=15, ShortWord=3, HighlightAll=false, MaxFragments=2, StartSel=<b>, StopSel=</b>' ) AS snippet FROM articles WHERE search_vec @@ to_tsquery('english', 'postgresql & index') ORDER BY ts_rank(search_vec, to_tsquery('english', 'postgresql & index')) DESC LIMIT 5; -- Example output for snippet: -- "...using <b>PostgreSQL</b> you can create a GIN <b>index</b> -- on any tsvector column for fast full text..." -- Options explained: -- MaxWords/MinWords: size of the snippet -- MaxFragments: how many text fragments to return -- StartSel/StopSel: tags to wrap matched words in -- HighlightAll: if true, highlight entire document
WHERE filter first — only highlight the top results you're returning, not the whole table.📈 Ranking in Action — Visual Example
Three documents, same query 'postgresql & index' — rank varies by frequency and weight:
GIN Indexes
Making full text search lightning fast at scale
⚡ Why GIN?
GIN (Generalized Inverted Index) is the right index type for FTS. It stores a posting list — for each lexeme, a list of document IDs that contain it. This lets PostgreSQL find all matching documents for a lexeme in O(1) time.
Lexeme │ Document IDs (posting list) ──────────────┼────────────────────────────────────── 'databas' │ [1, 5, 12, 44, 203, 901 ...] 'fast' │ [2, 5, 7, 44, 100, 203 ...] 'index' │ [1, 2, 5, 19, 44, 55, 203 ...] 'postgresql' │ [1, 2, 5, 7, 12, 19, 44, 203 ...] 'run' │ [3, 8, 44, 100 ...] Query: 'postgresql' & 'index' → Lookup 'postgresql' posting list: [1,2,5,7,12,19,44,203...] → Lookup 'index' posting list: [1,2,5,19,44,55,203...] → Intersect: [1,2,5,19,44,203...] → Fetch only those rows ✅ (no full scan!)
🛠️ Creating GIN Indexes
-- Option A: Index the stored tsvector column (BEST — fastest queries) CREATE INDEX idx_articles_search ON articles USING GIN (search_vec); -- Option B: Functional index — no extra column needed -- (slightly slower on INSERT but no extra column storage) CREATE INDEX idx_articles_fts ON articles USING GIN ( to_tsvector('english', title || ' ' || body) ); -- Option C: Separate indexes per column (when weights matter) CREATE INDEX idx_articles_title_gin ON articles USING GIN (to_tsvector('english', title)); CREATE INDEX idx_articles_body_gin ON articles USING GIN (to_tsvector('english', body)); -- Build concurrently on large tables (no lock) CREATE INDEX CONCURRENTLY idx_articles_search_gin ON articles USING GIN (search_vec);
tsvector in a generated column and create the GIN index on that column. This gives the fastest query performance and keeps the index up-to-date automatically.📊 GIN vs GiST for FTS
| Index | Query Speed | Build Speed | Update Speed | Space | Use For FTS? |
|---|---|---|---|---|---|
| GIN | Fast | Slow | Slower | Larger | ✅ Preferred |
| GiST | Moderate | Fast | Faster | Smaller | Only if heavy writes |
✅ Verify Index Is Being Used
-- Run EXPLAIN to verify GIN is used EXPLAIN (ANALYZE, BUFFERS) SELECT id, title FROM articles WHERE search_vec @@ to_tsquery('english', 'postgresql & index'); -- Look for in EXPLAIN output: -- Bitmap Index Scan on idx_articles_search ← GIN used ✅ -- Index Cond: (search_vec @@ '''postgresql'' & ''index''') -- If you see Seq Scan — index not used ❌ -- Reasons: table too small (planner prefers seq scan), -- or functional index language mismatch
Interactive: FTS Playground
See tsvector, tsquery, and results in action
🧪 tsvector Preview
Type any text and see what PostgreSQL's lexer produces (stop words removed, words stemmed).
📄 to_tsvector Simulator
🔎 Live Search Simulator
Search across sample articles and see ranked results with highlighted snippets.
🗞️ Article Search Demo
⚙️ Full Production Query Builder
🏗️ Generate FTS SQL
Real-World FTS Patterns
Complete search implementations for production
📰 Blog / Document Search (Complete Setup)
-- 1. Create the table with generated tsvector CREATE TABLE articles ( id BIGSERIAL PRIMARY KEY, title TEXT NOT NULL, body TEXT NOT NULL, tags TEXT[], author_id BIGINT NOT NULL, published BOOLEAN DEFAULT false, created_at TIMESTAMPTZ DEFAULT NOW(), -- Weighted vector: title (A) > body (B) search_vec TSVECTOR GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(body, '')), 'B') ) STORED ); -- 2. GIN index on the search vector CREATE INDEX idx_articles_search ON articles USING GIN(search_vec); -- 3. The search query with ranking + headline SELECT id, title, author_id, created_at, ts_rank(search_vec, query) AS rank, ts_headline('english', body, query, 'MaxWords=30, MinWords=15, MaxFragments=2') AS snippet FROM articles, websearch_to_tsquery('english', 'postgresql indexing') query WHERE search_vec @@ query AND published = true -- only show published ORDER BY rank DESC LIMIT 10;
💬 AI Chat: Message Search
-- Search inside chat history for a specific user CREATE TABLE messages ( id BIGSERIAL PRIMARY KEY, conversation_id BIGINT NOT NULL, user_id BIGINT NOT NULL, role TEXT NOT NULL, -- 'user' | 'assistant' content TEXT NOT NULL, created_at TIMESTAMPTZ DEFAULT NOW(), search_vec TSVECTOR GENERATED ALWAYS AS ( to_tsvector('english', content) ) STORED ); CREATE INDEX idx_messages_search ON messages USING GIN(search_vec); CREATE INDEX idx_messages_user ON messages(user_id, created_at DESC); -- Find messages mentioning "refund" for a specific user SELECT m.id, m.conversation_id, m.role, m.created_at, ts_headline('english', m.content, plainto_tsquery('english', 'refund payment')) AS snippet FROM messages m WHERE m.user_id = 42 AND m.search_vec @@ plainto_tsquery('english', 'refund payment') ORDER BY m.created_at DESC LIMIT 20;
🛒 Product Search with Filters
CREATE TABLE products ( id BIGSERIAL PRIMARY KEY, name TEXT NOT NULL, description TEXT, category TEXT, price NUMERIC(10,2), in_stock BOOLEAN DEFAULT TRUE, search_vec TSVECTOR GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(name, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'B') || setweight(to_tsvector('english', coalesce(category, '')), 'C') ) STORED ); CREATE INDEX idx_products_search ON products USING GIN(search_vec); -- Search products: text + category filter + price range SELECT id, name, price, category, ts_rank(search_vec, websearch_to_tsquery('english', 'wireless headphones')) AS rank FROM products WHERE search_vec @@ websearch_to_tsquery('english', 'wireless headphones') AND category = 'Electronics' AND price BETWEEN 500 AND 5000 AND in_stock = true ORDER BY rank DESC LIMIT 20;
💡 FTS Best Practices
to_tsvector() at query time skips the index. Pre-compute and store it.WHERE + ORDER BY rank LIMIT first, then apply ts_headline to those rows only.WHERE price < 500 AND category = 'X' work together. PostgreSQL uses bitmap AND to combine the GIN index with B-tree indexes.to_tsquery (strict) · plainto_tsquery (plain) · websearch_to_tsquery (google-style)ts_rank() by frequency · ts_rank_cd() by density · ts_headline() for snippetsCREATE INDEX USING GIN on it.