PostgreSQL — Topic 1: Database Design

1.1

Data Modeling

Entities, attributes, relationships & cardinality

▾

Entities

An entity is a real-world thing your database needs to track. Think of it as a noun — a person, a product, an order.

💡

Rule of thumb: If you find yourself asking "does this thing have its own identity?" — it's probably an entity. A customer has their own identity. But a customer's address might just be an attribute (unless addresses are shared across customers, in which case it becomes its own entity).

Entity Identification

Name the real-world objects: User, Product, Order, Payment

Business Objects

Things the business cares about: invoices, customers, employees

Entity Lifecycle

Created → Updated → Soft-deleted. Track state changes over time

-- Example: Identifying entities for an e-commerce app
-- Ask yourself: what are the "nouns" in our system?

-- Entity 1: Users (customers who buy)
CREATE TABLE users (
    user_id    BIGINT      PRIMARY KEY,
    email      VARCHAR(255),
    created_at TIMESTAMPTZ
);

-- Entity 2: Products (things being sold)
CREATE TABLE products (
    product_id  BIGINT      PRIMARY KEY,
    name        VARCHAR(500),
    price       NUMERIC(10,2)
);

-- Entity 3: Orders (a transaction between user and products)
CREATE TABLE orders (
    order_id   BIGINT      PRIMARY KEY,
    user_id    BIGINT,      -- links to users
    total      NUMERIC(10,2),
    created_at TIMESTAMPTZ
);
          

Attributes

Attributes are the properties or columns of an entity. There are three kinds:

Required Attributes

Must always have a value. Use NOT NULL. Example: email for a user

Optional Attributes

Can be null. Example: middle_name, phone — not everyone provides these

Derived Attributes

Computed from other columns. Example: age derived from date_of_birth

CREATE TABLE users (
    user_id       BIGINT       PRIMARY KEY,

    -- Required attributes (NOT NULL)
    email         VARCHAR(255) NOT NULL,
    first_name    VARCHAR(100) NOT NULL,

    -- Optional attributes (allow NULL)
    middle_name   VARCHAR(100),
    phone         VARCHAR(20),

    -- Derived attribute (compute age from dob, don't store age!)
    date_of_birth DATE,
    -- age = EXTRACT(YEAR FROM AGE(date_of_birth)) ← compute on query

    created_at    TIMESTAMPTZ DEFAULT NOW()
);

-- Querying a derived attribute on the fly:
SELECT
    first_name,
    EXTRACT(YEAR FROM AGE(date_of_birth)) AS age
FROM users;
          

⚠️

Don't store derived attributes! If you store age, it gets stale. Compute it at query time from date_of_birth instead.

Relationships

Relationships describe how entities connect to each other. There are three types:

-- ONE-TO-ONE: each user has at most one profile
-- Example: User ↔ UserProfile

CREATE TABLE users (
    user_id BIGINT PRIMARY KEY,
    email   VARCHAR(255) NOT NULL
);

CREATE TABLE user_profiles (
    profile_id BIGINT PRIMARY KEY,
    user_id    BIGINT UNIQUE REFERENCES users(user_id),
    -- UNIQUE ensures max one profile per user
    bio        TEXT,
    avatar_url TEXT
);
            

✓

Use 1:1 to split large tables — keep hot columns in the main table, cold/optional columns in a separate table.

-- ONE-TO-MANY: one user has many orders
-- The "many" side holds the foreign key

CREATE TABLE users (
    user_id BIGINT PRIMARY KEY,
    email   VARCHAR(255)
);

CREATE TABLE orders (
    order_id   BIGINT PRIMARY KEY,
    user_id    BIGINT REFERENCES users(user_id),
    -- ^ This is the FK on the "many" side
    total      NUMERIC(10,2),
    created_at TIMESTAMPTZ
);

-- Find all orders for a user:
SELECT * FROM orders WHERE user_id = 42;
            

✓

Most common relationship. The foreign key always goes on the "many" side.

-- MANY-TO-MANY: products ↔ categories
-- Needs a JUNCTION (bridge) table in between

CREATE TABLE products (
    product_id BIGINT PRIMARY KEY,
    name       VARCHAR(500)
);

CREATE TABLE categories (
    category_id BIGINT PRIMARY KEY,
    name        VARCHAR(200)
);

-- Junction table: holds the relationship
CREATE TABLE product_categories (
    product_id  BIGINT REFERENCES products(product_id),
    category_id BIGINT REFERENCES categories(category_id),
    PRIMARY KEY (product_id, category_id) -- composite PK
);

-- Find all categories for a product:
SELECT c.name
FROM categories c
JOIN product_categories pc ON c.category_id = pc.category_id
WHERE pc.product_id = 10;
            

Cardinality

Cardinality defines how many instances of one entity relate to another. It also tells you whether a relationship is mandatory or optional.

Type	Notation	Example	DB Implementation
Mandatory 1:1	\|\|—\|\|	Employee must have exactly 1 contract	NOT NULL + UNIQUE FK
Optional 1:1	\|o—\|\|	User may have a profile (optional)	UNIQUE FK (nullable)
Mandatory 1:N	\|\|—<	Order must belong to a user	NOT NULL FK
Optional 1:N	\|o—<	Category may have 0 products	Nullable FK
M:N	>—<	Products ↔ Tags	Junction table

1.2

Keys

Primary, foreign, candidate, alternate & composite keys

▾

Primary Keys

A primary key uniquely identifies every row. There are several strategies:

Natural Key

A real-world unique value like email, passport_number, ISBN

Surrogate Key

System-generated integer (SERIAL / BIGSERIAL). No business meaning

UUID Key

Universally unique 128-bit ID. Great for distributed systems

Identity Column

SQL standard auto-increment. Preferred over SERIAL in modern PG

-- Strategy 1: Natural Key (use email as PK)
CREATE TABLE users_natural (
    email      VARCHAR(255) PRIMARY KEY, -- risky! email can change
    first_name TEXT
);

-- Strategy 2: Surrogate Key (auto-increment integer)
CREATE TABLE users_surrogate (
    user_id    BIGSERIAL   PRIMARY KEY, -- auto-generates: 1, 2, 3...
    email      TEXT        UNIQUE NOT NULL
);

-- Strategy 3: UUID Key (best for distributed / microservices)
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
CREATE TABLE users_uuid (
    user_id    UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    email      TEXT        UNIQUE NOT NULL
);

-- Strategy 4: Identity Column (SQL standard, PG 10+)
CREATE TABLE users_identity (
    user_id    BIGINT      GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email      TEXT        UNIQUE NOT NULL
);
          

💡

When to use what? For most OLTP apps, use BIGINT GENERATED ALWAYS AS IDENTITY. For microservices or APIs where IDs leak to clients, use UUID. Avoid natural keys as PKs unless the value will truly never change.

Foreign Keys

A foreign key enforces referential integrity — it ensures a child row can't exist without a valid parent.

-- Referential Integrity + Cascade options

CREATE TABLE users (
    user_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email   TEXT   UNIQUE NOT NULL
);

CREATE TABLE orders (
    order_id BIGINT        GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id  BIGINT        NOT NULL
                            REFERENCES users(user_id)
                            ON DELETE CASCADE   -- delete user → delete their orders
                            ON UPDATE CASCADE,  -- if user_id changes, update here too
    total    NUMERIC(10,2)
);

-- Other cascade options:
-- ON DELETE SET NULL    → sets user_id to NULL when user deleted
-- ON DELETE RESTRICT    → prevents deletion if orders exist
-- ON DELETE NO ACTION   → default, same as RESTRICT (deferred)

-- Test: try inserting an order with a non-existent user_id
INSERT INTO orders (user_id, total)
VALUES (9999, 150.00);
-- ERROR: insert or update on table "orders" violates foreign key constraint
          

⚠️

CASCADE DELETE is powerful but dangerous. Deleting a user will delete all their orders silently. Prefer soft deletes (adding is_deleted = true) over physical deletes in production.

Candidate, Alternate & Composite Keys

-- CANDIDATE KEY: any column(s) that COULD be the primary key
-- In users: both user_id and email could uniquely identify a row
--   → user_id is chosen as PK (surrogate)
--   → email becomes the ALTERNATE KEY

CREATE TABLE users (
    user_id        BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, -- PK
    email          TEXT   UNIQUE NOT NULL,                            -- Alternate Key
    passport_num   TEXT   UNIQUE                                      -- another Alternate Key
);

-- COMPOSITE KEY: PK made of multiple columns
-- Use when no single column is unique, but combination is

CREATE TABLE order_items (
    order_id   BIGINT REFERENCES orders(order_id),
    product_id BIGINT REFERENCES products(product_id),
    quantity   INT    NOT NULL DEFAULT 1,
    price      NUMERIC(10,2) NOT NULL,

    PRIMARY KEY (order_id, product_id)  -- composite PK
    -- No separate order_item_id needed — the combo is unique
);
          

Key Type	Definition	Example
Candidate Key	Any minimal set of columns that uniquely identifies a row	email, passport_num, user_id
Primary Key	The chosen candidate key — only one per table	user_id
Alternate Key	Candidate keys not chosen as PK (enforced via UNIQUE)	email
Composite Key	PK or UNIQUE constraint spanning multiple columns	(order_id, product_id)

1.3

Normalization

1NF → 2NF → 3NF → BCNF → 4NF → 5NF

▾

Normalization is the process of organizing tables to eliminate redundancy and ensure data integrity. Each normal form adds stricter rules. Click each form below to explore it:

1NF

Atomic values

2NF

No partial deps

3NF

No transitive deps

BCNF

Strict 3NF

4NF

No multi-valued

5NF

Join dependency

1NF — First Normal Form: Atomic Values & No Repeating Groups

Every column must contain a single, indivisible value. No lists, no arrays as column values, no duplicate rows.

-- ❌ VIOLATES 1NF: storing multiple phone numbers in one column
-- and repeating rows for the same order
orders_bad
┌──────────┬─────────────┬──────────────────────────┐
│ order_id │ customer    │ phone                    │
├──────────┼─────────────┼──────────────────────────┤
│ 1        │ Alice       │ 555-1234, 555-5678       │  ← two phones in one cell!
│ 1        │ Alice       │ 555-1234, 555-5678       │  ← duplicate row!
└──────────┴─────────────┴──────────────────────────┘

-- ✅ 1NF COMPLIANT: atomic values, one phone per row
CREATE TABLE customer_phones (
    customer_id BIGINT,
    phone       VARCHAR(20),   -- one value per cell
    PRIMARY KEY (customer_id, phone)
);
          

2NF — Second Normal Form: No Partial Dependencies

Applies when you have a composite primary key. Every non-key column must depend on the entire composite key, not just part of it.

-- ❌ VIOLATES 2NF: product_name depends only on product_id,
--    NOT on the full composite key (order_id, product_id)
order_items_bad
┌──────────┬────────────┬──────────────────┬──────────┐
│ order_id │ product_id │ product_name     │ quantity │
│    PK    │    PK      │  ← partial dep!  │          │
└──────────┴────────────┴──────────────────┴──────────┘

-- ✅ 2NF COMPLIANT: move product_name to its own table
CREATE TABLE products (
    product_id   BIGINT PRIMARY KEY,
    product_name TEXT          -- depends on product_id only → correct!
);

CREATE TABLE order_items (
    order_id   BIGINT,
    product_id BIGINT REFERENCES products(product_id),
    quantity   INT,           -- depends on (order_id, product_id) → correct!
    PRIMARY KEY (order_id, product_id)
);
          

3NF — Third Normal Form: No Transitive Dependencies

No non-key column should depend on another non-key column. All columns must depend directly on the primary key.

-- ❌ VIOLATES 3NF: city depends on zip_code, not on user_id
-- zip_code → city is a TRANSITIVE dependency
users_bad
┌─────────┬──────────┬──────────┬──────────┐
│ user_id │  email   │ zip_code │   city   │
│   PK    │          │          │ ← trans! │
└─────────┴──────────┴──────────┴──────────┘

-- ✅ 3NF COMPLIANT: extract zip → city to its own table
CREATE TABLE zip_codes (
    zip_code VARCHAR(10) PRIMARY KEY,
    city     TEXT,
    state    TEXT
);

CREATE TABLE users (
    user_id  BIGINT       PRIMARY KEY,
    email    TEXT         UNIQUE NOT NULL,
    zip_code VARCHAR(10) REFERENCES zip_codes(zip_code)
    -- city is NOT stored here — get it by joining zip_codes
);
          

BCNF — Boyce-Codd Normal Form: Strict 3NF

Every determinant (anything that determines another column's value) must be a candidate key. Catches edge cases that 3NF misses when there are multiple overlapping candidate keys.

-- Scenario: Students can be tutored by teachers.
-- A student has ONE teacher per subject.
-- Each teacher teaches ONLY ONE subject.

-- ❌ VIOLATES BCNF:
tutoring_bad
┌─────────┬─────────┬───────────┐
│ student │ subject │  teacher  │
│   PK    │   PK    │           │
└─────────┴─────────┴───────────┘
-- Problem: teacher → subject (teacher determines subject)
-- but teacher is NOT a candidate key

-- ✅ BCNF COMPLIANT: decompose
CREATE TABLE teacher_subjects (
    teacher TEXT PRIMARY KEY,
    subject TEXT
);

CREATE TABLE student_tutoring (
    student TEXT,
    teacher TEXT REFERENCES teacher_subjects(teacher),
    PRIMARY KEY (student, teacher)
);
          

💡

In practice, most well-designed databases in 3NF are also in BCNF. You'll mainly need BCNF for complex academic scenarios.

4NF — Fourth Normal Form: No Multi-Valued Dependencies

A table must not have more than one independent multi-valued fact about an entity.

-- Scenario: A person can have multiple hobbies AND multiple phones
-- These are INDEPENDENT facts

-- ❌ VIOLATES 4NF: mixing two independent multi-valued facts
person_data_bad
┌────────┬──────────┬───────────────┐
│ person │  hobby   │     phone     │
├────────┼──────────┼───────────────┤
│ Alice  │ Cooking  │ 555-1234      │
│ Alice  │ Cooking  │ 555-5678      │  ← duplicate hobby!
│ Alice  │ Gaming   │ 555-1234      │  ← duplicate phone!
│ Alice  │ Gaming   │ 555-5678      │
└────────┴──────────┴───────────────┘

-- ✅ 4NF COMPLIANT: separate tables for each multi-valued fact
CREATE TABLE person_hobbies (
    person TEXT,
    hobby  TEXT,
    PRIMARY KEY (person, hobby)
);

CREATE TABLE person_phones (
    person TEXT,
    phone  TEXT,
    PRIMARY KEY (person, phone)
);
          

5NF — Fifth Normal Form: No Join Dependencies

A table must not be decomposable into smaller tables without losing information, unless the join dependencies are implied by the candidate keys. This is rare in practice.

-- Scenario: Agents sell products in cities.
-- A tuple (agent, product, city) means: this agent sells this product in this city
-- But not every combination is valid (not every agent sells every product in every city)

-- ✅ 5NF: if the table CANNOT be losslessly decomposed,
--    keep it as one table

CREATE TABLE agent_product_city (
    agent_id   BIGINT,
    product_id BIGINT,
    city_id    BIGINT,
    PRIMARY KEY (agent_id, product_id, city_id)
);

-- 5NF is mostly theoretical. In production, stop at 3NF/BCNF.
-- Only apply 4NF and 5NF for truly complex domains.
          

⚠️

Real-world advice: Design to 3NF in day-to-day work. Understand BCNF and 4NF conceptually. 5NF is academic — you won't encounter it in most production systems.

1.4

Denormalization

Controlled redundancy for read performance

▾

💡

Denormalization is intentional. After normalizing your database, you sometimes deliberately introduce redundancy to speed up reads. The trade-off: faster reads, more complex writes and storage.

Read Optimization Patterns

There are three common denormalization patterns used in production:

-- AGGREGATION TABLE: store pre-computed totals
-- Problem: COUNT(*) over millions of orders is slow
-- Solution: maintain a counter table

CREATE TABLE user_order_stats (
    user_id       BIGINT       PRIMARY KEY REFERENCES users(user_id),
    total_orders  INT          DEFAULT 0,
    total_spent   NUMERIC(12,2) DEFAULT 0,
    last_order_at TIMESTAMPTZ
);

-- Update this whenever an order is placed (via trigger or app code)
UPDATE user_order_stats
SET
    total_orders  = total_orders + 1,
    total_spent   = total_spent + 150.00,
    last_order_at = NOW()
WHERE user_id = 42;

-- Now reading a user's stats is instant (no aggregation needed):
SELECT total_orders, total_spent FROM user_order_stats WHERE user_id = 42;
            

-- SUMMARY TABLE: daily/hourly rollups for dashboards
-- Instead of scanning millions of order rows for a report,
-- pre-aggregate by day

CREATE TABLE daily_sales_summary (
    summary_date  DATE         PRIMARY KEY,
    total_revenue NUMERIC(15,2) DEFAULT 0,
    order_count   INT          DEFAULT 0,
    new_users     INT          DEFAULT 0
);

-- Populate with a nightly job:
INSERT INTO daily_sales_summary (summary_date, total_revenue, order_count)
SELECT
    DATE_TRUNC('day', created_at)::date AS summary_date,
    SUM(total)                          AS total_revenue,
    COUNT(*)                             AS order_count
FROM orders
WHERE created_at >= NOW() - INTERVAL '1 day'
GROUP BY 1
ON CONFLICT (summary_date) DO UPDATE SET
    total_revenue = EXCLUDED.total_revenue,
    order_count   = EXCLUDED.order_count;
            

-- MATERIALIZED VIEW: PostgreSQL's built-in summary caching
-- Like a regular view, but stores the result physically

CREATE MATERIALIZED VIEW product_sales_summary AS
SELECT
    p.product_id,
    p.name,
    SUM(oi.quantity)          AS units_sold,
    SUM(oi.quantity * oi.price) AS revenue
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.name
WITH DATA;  -- populate immediately

-- Query it like a table (super fast):
SELECT * FROM product_sales_summary ORDER BY revenue DESC LIMIT 10;

-- Refresh when data changes:
REFRESH MATERIALIZED VIEW product_sales_summary;

-- Or refresh concurrently (no lock, needs UNIQUE index):
CREATE UNIQUE INDEX ON product_sales_summary (product_id);
REFRESH MATERIALIZED VIEW CONCURRENTLY product_sales_summary;
            

✓

Materialized Views are the cleanest denormalization tool in PostgreSQL. Use them for reports, dashboards, and heavy analytical queries. Refresh on a schedule or after bulk updates.

When to Normalize vs Denormalize

	Normalized (3NF)	Denormalized
Read speed	Slower (JOINs needed)	Faster (pre-computed)
Write speed	Faster (one place)	Slower (update multiple)
Storage	Less	More
Consistency	Automatic	Manual maintenance
Best for	OLTP (transactions)	Reporting, dashboards

Topic 1: Database Design — all 4 sub-sections covered.

Ready to move on to Topic 2: OLTP Database Design?