PostgreSQL — Topic 13: JSONB Deep Dive

💡

Why JSONB? JSON vs JSONB

What makes JSONB special — and when to use it

▾

JSON vs JSONB — Core Difference

PostgreSQL has two JSON types. Understanding the difference is the foundation of everything in this topic.

Feature	`json`	`jsonb`
Storage format	Plain text, stored as-is	Binary (decomposed & parsed)
Write speed	⚡ Faster (just stores text)	Slightly slower (parses on insert)
Read/Query speed	🐢 Slow (parses on every read)	⚡ Very fast (already parsed)
Indexing	❌ No GIN support	✅ Full GIN indexing
Key order preserved	✅ Yes	❌ No (sorted internally)
Duplicate keys	✅ Kept (last one wins on parse)	❌ Last value kept
Use in production	Rarely	✅ Always prefer this

⚠️

Rule: Always use jsonb unless you specifically need to preserve key order or exact formatting. In production, jsonb is the standard.

When Should You Use JSONB?

JSONB is powerful — but not a replacement for relational design. Use it when the data is truly semi-structured or variable.

✅ Good Fit

Product attributes that vary per category (size, color, wattage, material…)

✅ Good Fit

User preferences / settings object unique per user

✅ Good Fit

External API payloads / webhook logs you want to store raw

✅ Good Fit

Audit log metadata, event properties (analytics events)

❌ Bad Fit

Data you JOIN on, filter heavily, or report on across many rows

❌ Bad Fit

Relationship data that should be a foreign key (user_id, order_id)

Creating a Table with JSONB

            
-- E-commerce product with variable attributes
CREATE TABLE products (
  id          SERIAL PRIMARY KEY,
  name        TEXT NOT NULL,
  price       NUMERIC(10,2),
  category    TEXT,
  attributes  JSONB  -- variable per category
);

-- Insert a laptop
INSERT INTO products (name, price, category, attributes)
VALUES (
  'ThinkPad X1', 1299.99, 'laptop',
  '{"ram": "16GB", "cpu": "i7", "storage": "512GB SSD", "ports": ["USB-C", "HDMI", "USB-A"]}'
);

-- Insert a T-shirt (totally different attributes)
INSERT INTO products (name, price, category, attributes)
VALUES (
  'Classic Tee', 29.99, 'clothing',
  '{"color": "navy", "sizes": ["S","M","L","XL"], "material": "100% Cotton"}'
);
          

✅

See the power: Two products in one table, each with completely different attributes — no schema change needed.

1

JSONB Operators

-> ->> #> #>> and more

▾

The Four Core Extraction Operators

These operators let you dig into a JSONB value to retrieve nested fields.

->

Get JSON value (keeps JSON type)

Extracts a field by key. Returns JSONB — use when you want to pass result to another operator.

attributes -> 'ram'

→ returns: "16GB" (as jsonb)

->>

Get TEXT value (unwraps quotes)

Same as -> but returns TEXT. Use when comparing, filtering, or displaying values.

attributes ->> 'ram'

→ returns: 16GB (plain text, no quotes)

#>

Get JSON at path (nested)

Extracts nested value at a path array. Returns JSONB.

attributes #> '{ports,0}'

→ returns: "USB-C" (as jsonb)

#>>

Get TEXT at path (nested)

Same as #> but returns TEXT. Most useful for deeply nested leaf values.

attributes #>> '{ports,0}'

→ returns: USB-C (plain text)

💡

Mental model: Single arrow (->) = one key. Double hash-arrow (#>) = path of many keys. Single tail (->) = JSONB result. Double tail (->>) = TEXT result.

Using Operators in Real Queries

            
-- Get the RAM of all laptops (as TEXT for display)
SELECT name, attributes ->> 'ram' AS ram
FROM products
WHERE category = 'laptop';
-- result: ThinkPad X1 | 16GB

-- Filter: products where RAM = '16GB'
SELECT name
FROM products
WHERE attributes ->> 'ram' = '16GB';

-- Get first port (index 0 from array)
SELECT attributes #>> '{ports,0}' AS first_port
FROM products
WHERE name = 'ThinkPad X1';
-- result: USB-C

-- Numeric comparison (cast to numeric)
SELECT name
FROM products
WHERE (attributes ->> 'price_discount')::numeric > 10;
          

Containment and Existence Operators

These operators let you check whether a JSONB value contains another, or whether a key exists. Critical for filtering and GIN index usage.

Operator	Meaning	Example
`@>`	Left contains right (containment)	`attributes @> '{"ram":"16GB"}'`
`<@`	Left is contained by right	`'{"ram":"16GB"}' <@ attributes`
`?`	Key exists at top level	`attributes ? 'ram'`
`?\|`	Any of the keys exist	`attributes ?\| array['ram','cpu']`
`?&`	All of the keys exist	`attributes ?& array['ram','cpu']`

            
-- Find products with RAM = 16GB (containment — uses GIN index!)
SELECT name
FROM products
WHERE attributes @> '{"ram": "16GB"}';

-- Find products that HAVE a 'color' key
SELECT name
FROM products
WHERE attributes ? 'color';

-- Find products that have BOTH 'ram' and 'cpu' keys
SELECT name
FROM products
WHERE attributes ?& array['ram', 'cpu'];

-- Array containment: find products that have 'HDMI' in ports array
SELECT name
FROM products
WHERE attributes @> '{"ports": ["HDMI"]}';
          

🚀

GIN Index Tip: The @>, ?, ?|, ?& operators are all supported by GIN indexes. This makes them extremely fast on large tables. We'll set this up in Section 2.

Other Useful Operators

Operator / Function	Purpose	Example
`\|\|`	Concatenate / merge two JSONB objects	`'{"a":1}' \|\| '{"b":2}'` → `{"a":1,"b":2}`
`-` (minus key)	Delete a key from JSONB	`attributes - 'ram'`
`#-`	Delete at path	`attributes #- '{ports,0}'`
`jsonb_set()`	Set/update a value at path	`jsonb_set(attributes, '{ram}', '"32GB"')`
`jsonb_pretty()`	Format JSONB as readable string	`jsonb_pretty(attributes)`
`jsonb_each()`	Expand object to (key, value) rows	`SELECT * FROM jsonb_each(attributes)`
`jsonb_array_elements()`	Expand JSON array to rows	`SELECT * FROM jsonb_array_elements(attributes->'ports')`

            
-- Merge extra data into existing attributes
UPDATE products
SET attributes = attributes || '{"weight_kg": 1.4}'
WHERE name = 'ThinkPad X1';

-- Update just one nested key
UPDATE products
SET attributes = jsonb_set(attributes, '{ram}', '"32GB"')
WHERE name = 'ThinkPad X1';

-- Delete a key
UPDATE products
SET attributes = attributes - 'storage'
WHERE id = 1;

-- Expand ports array to individual rows
SELECT name, jsonb_array_elements_text(attributes -> 'ports') AS port
FROM products
WHERE category = 'laptop';
-- result:
-- ThinkPad X1 | USB-C
-- ThinkPad X1 | HDMI
-- ThinkPad X1 | USB-A
          

🧪 Interactive: Operator Explorer

Try JSONB Operators Simulated

Select an operator to see what it does with the sample product data below.

{
  "ram": "16GB",
  "cpu": "i7",
  "storage": "512GB SSD",
  "ports": ["USB-C", "HDMI", "USB-A"],
  "in_stock": true,
  "rating": 4.5
}

Choose operator:

Select an operator above and click Run.

2

JSONB Indexing

GIN indexes — the secret to fast JSONB queries at scale

▾

Why JSONB Needs Indexes

Without an index, filtering on a JSONB column does a full sequential scan — reads every row and inspects the JSON. On a table with 10M rows, this is catastrophically slow.

🐢

No index: WHERE attributes ->> 'ram' = '16GB' on 1M rows = full table scan, ~2-5 seconds. With GIN index = <10ms.

GIN Index — The Default Choice

GIN (Generalized Inverted Index) indexes every key-value pair inside the JSONB document. Think of it like a book index — it builds a map of "where does key X with value Y appear?"

            
-- Create a GIN index on the whole JSONB column
-- This indexes ALL keys and values inside the document
CREATE INDEX idx_products_attributes
  ON products USING GIN (attributes);

-- Now these queries use the index:
SELECT * FROM products WHERE attributes @> '{"ram": "16GB"}';
SELECT * FROM products WHERE attributes ? 'color';
SELECT * FROM products WHERE attributes ?& array['ram', 'cpu'];
          

GIN Operator Classes

PostgreSQL has two GIN operator classes for JSONB. Choose based on your use case:

Operator Class	Supports	Index Size	Best For
`jsonb_ops` (default)	`@> ? ?\| ?&`	Larger	General JSONB queries — containment & key existence
`jsonb_path_ops`	`@>` only	Smaller ✅	When you only need `@>` (containment). Faster & smaller.

            
-- Default (jsonb_ops) — all operators supported
CREATE INDEX idx_attrs_full
  ON products USING GIN (attributes);

-- jsonb_path_ops — smaller index, only @> supported
-- Use when you ONLY ever use @> containment queries
CREATE INDEX idx_attrs_path
  ON products USING GIN (attributes jsonb_path_ops);
          

Expression Indexes on JSONB (B-Tree)

If you always query a specific key inside JSONB, a regular B-Tree expression index is often faster and smaller than GIN.

            
-- B-Tree index on a specific JSONB key (extracted as text)
CREATE INDEX idx_products_ram
  ON products ((attributes ->> 'ram'));

-- Now this query uses the B-Tree index (much faster for equality)
SELECT * FROM products
WHERE attributes ->> 'ram' = '16GB';

-- Partial index: only index in-stock laptops with known RAM
CREATE INDEX idx_instock_ram
  ON products ((attributes ->> 'ram'))
  WHERE category = 'laptop'
    AND (attributes ->> 'in_stock')::boolean = TRUE;
          

Choosing the Right Index Strategy

GIN (full)

When you query many different keys, use ?, or don't know which key up front

GIN (path_ops)

When you only ever use @> containment. Smaller & faster for that case

B-Tree expression

When you always query the same known key. Best equality performance

Partial B-Tree

Combine a WHERE clause + expression index for hyper-targeted queries

⚠️

Note: GIN indexes do NOT help with attributes ->> 'key' = 'value' (arrow operator). Use @> (containment) for GIN, or create a B-Tree expression index for specific key comparisons.

Verify Index is Being Used

            
EXPLAIN ANALYZE
SELECT * FROM products
WHERE attributes @> '{"ram": "16GB"}';

-- You should see: Bitmap Index Scan on idx_products_attributes
-- NOT: Seq Scan on products
          

3

Hybrid Relational + JSON Design

The most powerful pattern — very common in SaaS products

▾

What Is the Hybrid Pattern?

The core idea: use relational columns for structured, known, frequently-queried data, and JSONB for flexible, variable, or extensible data. Never go full-JSON when you have structured fields.

❌ Anti-Pattern — All JSON

-- Don't put everything in JSON
CREATE TABLE users (
  id   SERIAL PRIMARY KEY,
  data JSONB  -- name, email, role all in here!
);

-- Problems:
-- Can't index email properly
-- Can't enforce NOT NULL on name
-- Can't use foreign keys
-- Poor query performance

✅ Hybrid Pattern

-- Fixed fields → columns
-- Variable fields → JSONB
CREATE TABLE users (
  id          SERIAL PRIMARY KEY,
  email       TEXT UNIQUE NOT NULL,
  name        TEXT NOT NULL,
  role        TEXT NOT NULL,
  preferences JSONB  -- theme, notifications, etc.
);

-- Best of both worlds!

Real-World Example: SaaS User with Preferences

            
-- HYBRID: relational columns + jsonb for extensible settings
CREATE TABLE users (
  id           BIGSERIAL PRIMARY KEY,
  email        TEXT UNIQUE NOT NULL,
  name         TEXT NOT NULL,
  role         TEXT NOT NULL DEFAULT 'member',
  created_at   TIMESTAMPTZ DEFAULT NOW(),
  preferences  JSONB DEFAULT '{}'
);

-- Insert user with preferences
INSERT INTO users (email, name, role, preferences)
VALUES (
  'alice@example.com', 'Alice', 'admin',
  '{
    "theme": "dark",
    "language": "en",
    "notifications": {
      "email": true,
      "sms": false,
      "push": true
    },
    "dashboard_widgets": ["revenue", "users", "events"]
  }'
);

-- Query: find all users with dark theme
SELECT name, email
FROM users
WHERE preferences @> '{"theme": "dark"}';

-- Query: find users with email notifications enabled
SELECT name
FROM users
WHERE preferences #>> '{notifications,email}' = 'true';

-- Update: add a new widget to the dashboard without touching other prefs
UPDATE users
SET preferences = jsonb_set(
  preferences,
  '{dashboard_widgets}',
  (preferences -> 'dashboard_widgets') || '"alerts"'
)
WHERE email = 'alice@example.com';
          

Real-World Example: E-Commerce Product Catalog

The classic use case — every category has different attributes but shares a common structure.

            
CREATE TABLE products (
  id           BIGSERIAL PRIMARY KEY,
  sku          TEXT UNIQUE NOT NULL,
  name         TEXT NOT NULL,
  price        NUMERIC(10,2) NOT NULL,
  category_id  INT REFERENCES categories(id),
  is_active    BOOLEAN DEFAULT TRUE,
  attributes   JSONB DEFAULT '{}'  -- variable per category
);

-- Laptop
INSERT INTO products (sku, name, price, category_id, attributes)
VALUES ('LP001', 'ThinkPad X1', 1299.99, 1,
  '{"ram":"16GB","cpu":"i7","screen_inch":14,"weight_kg":1.4}');

-- T-Shirt
INSERT INTO products (sku, name, price, category_id, attributes)
VALUES ('TS001', 'Classic Tee', 29.99, 2,
  '{"color":"navy","sizes":["S","M","L","XL"],"material":"Cotton"}');

-- Index for fast attribute search
CREATE INDEX idx_products_attrs ON products USING GIN(attributes);

-- Find all products available in size 'L'
SELECT name, price
FROM products
WHERE attributes @> '{"sizes": ["L"]}';

-- Find laptops with 16GB RAM under $1500
SELECT name, price
FROM products
WHERE category_id = 1
  AND attributes @> '{"ram": "16GB"}'
  AND price < 1500;
          

Real-World Example: Audit Log / Event Tracking

Store events with a fixed schema + flexible payload — perfect for analytics, audit trails, and webhook logs.

            
CREATE TABLE events (
  id          BIGSERIAL PRIMARY KEY,
  user_id     BIGINT REFERENCES users(id),
  event_type  TEXT NOT NULL,
  created_at  TIMESTAMPTZ DEFAULT NOW(),
  metadata    JSONB DEFAULT '{}'
);

-- Different events with different payloads
INSERT INTO events (user_id, event_type, metadata) VALUES
(1, 'login',         '{"ip":"1.2.3.4","device":"chrome"}'),
(1, 'purchase',      '{"order_id":42,"amount":99.99,"items":3}'),
(1, 'profile_update','{"changed_fields":["name","avatar"]}');

-- Query: all purchase events over $50
SELECT user_id, created_at,
       (metadata ->> 'amount')::numeric AS amount
FROM events
WHERE event_type = 'purchase'
  AND (metadata ->> 'amount')::numeric > 50;
          

Real-World Example: AI / RAG System with JSONB Metadata

A common pattern in modern AI platforms — documents with metadata stored as JSONB.

            
CREATE TABLE documents (
  id          BIGSERIAL PRIMARY KEY,
  org_id      INT NOT NULL,
  title       TEXT NOT NULL,
  content     TEXT,
  embedding   vector(1536),          -- pgvector
  metadata    JSONB DEFAULT '{}',   -- source, author, tags, etc.
  created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Insert a document with rich metadata
INSERT INTO documents (org_id, title, metadata)
VALUES (1, 'Q4 Earnings Report', '{
  "source": "google_drive",
  "author": "finance_team",
  "tags": ["earnings","finance","Q4"],
  "confidential": true,
  "department": "finance"
}');

-- Find finance department docs tagged with 'earnings'
SELECT title
FROM documents
WHERE org_id = 1
  AND metadata @> '{"department":"finance","tags":["earnings"]}';

-- GIN index to make this fast
CREATE INDEX idx_doc_metadata ON documents USING GIN(metadata);
          

Decision Framework — Column vs JSONB

Ask these questions for each field:

▶ Is this field always present? → Use a column
▶ Do you JOIN on this field? → Use a column
▶ Do you ORDER BY or GROUP BY on it? → Use a column
▶ Is this a foreign key? → Use a column
▶ Is this field optional / varies per row? → Use JSONB
▶ Is this a bag of user settings / preferences? → Use JSONB
▶ Is this an external API payload you want to keep raw? → Use JSONB
▶ Does the structure change per category / type? → Use JSONB

🧪 Interactive: Hybrid Design Advisor

Should this field be a column or JSONB? Simulated

Field name:

Always present?

Used in JOIN?

Used in ORDER/GROUP BY?

Varies per row?

Fill in the details and click Get Recommendation.

4

Advanced JSONB Patterns

Aggregation, validation, generated columns & more

▾

Building JSON from Relational Data

PostgreSQL can build JSON on the fly from relational tables — perfect for APIs.

            
-- Build JSON object per row
SELECT json_build_object(
  'id', id,
  'name', name,
  'price', price,
  'attributes', attributes
) AS product_json
FROM products;

-- Aggregate all products into a JSON array
SELECT json_agg(
  json_build_object('id', id, 'name', name, 'price', price)
) AS products_list
FROM products
WHERE is_active = TRUE;

-- Nest related data — orders with their items
SELECT
  o.id,
  o.total,
  jsonb_agg(jsonb_build_object(
    'product', p.name,
    'qty', oi.quantity,
    'price', oi.unit_price
  )) AS items
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
JOIN products p    ON oi.product_id = p.id
GROUP BY o.id, o.total;
          

Generated Columns from JSONB

Extract a JSONB key into a real column automatically — best of both worlds: JSONB flexibility + column performance.

            
-- Generated column: auto-populated from JSONB field
-- Now you can index and query it as a plain column!
ALTER TABLE products
  ADD COLUMN ram_gen TEXT
    GENERATED ALWAYS AS (attributes ->> 'ram') STORED;

-- Index the generated column like any normal column
CREATE INDEX idx_products_ram_gen ON products(ram_gen);

-- Now this is a fast B-Tree lookup — no JSON overhead
SELECT * FROM products WHERE ram_gen = '16GB';
          

✅

Pro technique: Use generated columns when a JSONB key is queried very frequently. It turns a JSON lookup into a direct column access with a B-Tree index — far more efficient.

JSONB with CHECK Constraints (Basic Validation)

            
-- Ensure 'theme' is always 'light' or 'dark' if present
ALTER TABLE users
  ADD CONSTRAINT chk_theme
  CHECK (
    preferences ->> 'theme' IS NULL
    OR preferences ->> 'theme' IN ('light', 'dark')
  );

-- Ensure attributes is always an object (not an array or string)
ALTER TABLE products
  ADD CONSTRAINT chk_attrs_object
  CHECK (jsonb_typeof(attributes) = 'object');
          

Querying Across Nested Arrays (Lateral Join)

When JSONB contains an array and you need to filter or join on elements inside it, use LATERAL with jsonb_array_elements.

            
-- Find all products and list each tag as a separate row
SELECT p.name, tag.value AS tag
FROM documents p,
     LATERAL jsonb_array_elements_text(p.metadata -> 'tags') AS tag(value)
WHERE tag.value = 'finance';

-- Count products per tag
SELECT tag.value AS tag, COUNT(*) AS count
FROM documents d,
     LATERAL jsonb_array_elements_text(d.metadata -> 'tags') AS tag(value)
GROUP BY tag.value
ORDER BY count DESC;
          

JSONB Path Queries (jsonpath — PostgreSQL 12+)

PostgreSQL 12 added SQL/JSON path language — a powerful way to query deep into JSONB with conditions.

            
-- jsonb_path_exists: does any port start with 'USB'?
SELECT name
FROM products
WHERE jsonb_path_exists(
  attributes,
  '$.ports[*] ? (@ starts with "USB")'
);

-- jsonb_path_query: extract all ports containing 'USB'
SELECT name,
  jsonb_path_query_array(attributes, '$.ports[*] ? (@ starts with "USB")')
    AS usb_ports
FROM products
WHERE category = 'laptop';

-- Filter: products where rating > 4
SELECT name
FROM products
WHERE jsonb_path_exists(attributes, '$.rating ? (@ > 4)');
          

⚠️

Common JSONB Mistakes

Pitfalls that kill performance in production

▾

Mistake 1 — Using ->> Instead of @> for GIN Queries

❌ Won't Use GIN Index

-- ->> returns text, GIN can't help here
-- Results in a sequential scan!
SELECT * FROM products
WHERE attributes->>'ram' = '16GB';

✅ Uses GIN Index

-- @> containment uses GIN index
SELECT * FROM products
WHERE attributes @> '{"ram":"16GB"}';

Mistake 2 — Putting Relational Data in JSONB

❌ Wrong

-- user_id in JSONB = can't use FK, slow JOINs
INSERT INTO orders (data) VALUES (
  '{"user_id": 42, "total": 99.99}'
);

✅ Correct

-- user_id as a proper FK column
INSERT INTO orders (user_id, total, metadata)
VALUES (42, 99.99,
  '{"promo_code":"SAVE10"}');

Mistake 3 — Forgetting to Cast JSONB Numbers

❌ Wrong — String Comparison

-- "9" > "10" as text! Wrong result.
WHERE attributes->>'count' > '10'

✅ Correct — Numeric Cast

-- Cast to numeric first
WHERE (attributes->>'count')::numeric > 10

Mistake 4 — No Index on Frequently Queried JSONB

🚨

Critical: A table with 1M rows and no GIN index on JSONB will do a full sequential scan on every query. Always add at minimum CREATE INDEX ... USING GIN (attributes) if you query JSONB columns.

Mistake 5 — Over-using JSONB (JSON for Everything)

⚠️

Remember: JSONB is a tool, not a replacement for relational design. If you catch yourself putting IDs, foreign keys, timestamps, or any field you regularly JOIN or sort on into JSONB — stop and use a proper column. JSONB is for the variable part of your data.

🎯 Quick Check — Test Your Understanding

JSONB Quiz 5 Questions

📋

Summary & Cheatsheet

Everything you need to remember from Topic 13

▾

Operator Cheatsheet

Operator	Returns	Use When
`-> 'key'`	JSONB	Pass to another operator / chain access
`->> 'key'`	TEXT	Display, compare, or cast value
`#> '{a,b}'`	JSONB	Nested path access (keep as JSONB)
`#>> '{a,b}'`	TEXT	Nested path access (get as text)
`@> '{...}'`	BOOLEAN	Containment check — uses GIN index ✅
`? 'key'`	BOOLEAN	Key existence — uses GIN index ✅
`?\| array[...]`	BOOLEAN	Any key exists — uses GIN index ✅
`?& array[...]`	BOOLEAN	All keys exist — uses GIN index ✅
`\|\|`	JSONB	Merge two JSONB objects
`- 'key'`	JSONB	Delete a key

Key Rules to Remember

Always use JSONB

Never use json type in production. JSONB is faster to read and supports GIN indexing.

Use @> for GIN

Only containment (@>) and existence (?) operators use GIN. Arrow operators need B-Tree expression index.

Cast numbers

Always cast: (col->>'price')::numeric. Raw ->> returns text — numeric comparison will break.

Hybrid design

Columns for fixed, queryable, FK data. JSONB for variable, optional, or extensible fields.

Index always

Always add GIN index on any JSONB column used in WHERE clauses, even during development.

Generated columns

For hot JSONB keys queried every request, extract to a generated stored column with B-Tree index.

What's Next?

🚀

Topic 14 — Replication: How PostgreSQL replicates data across servers for high availability and read scaling. You'll learn WAL, streaming replication, and logical replication.