Why Your Query Is Slow: A Practical Guide to Indexes

You wrote a query. It flew through your tests. Then it hit production, where the table has two million rows instead of fifty, and the page that used to load instantly now takes eight seconds. Same query, same code. The only thing that changed is the amount of data, and the reason it falls apart at scale almost always comes down to one thing: indexes.

Indexes are the highest-leverage performance tool most developers underuse. You don't need to be a database administrator to get them right. You need a working mental model and the willingness to read what the database is already trying to tell you. Here is the practical version I wish someone had handed me earlier.

What an index actually is

Imagine a 900-page textbook with no index at the back. To find every mention of "deadlock", you read all 900 pages. That is what your database does without an index: it reads every row in the table to find the ones you asked for. This is called a sequential scan, or a full table scan, and it gets linearly slower as the table grows.

An index is the index at the back of the book. It is a separate, sorted data structure (usually a B-tree) that maps column values to the rows that contain them. Because it is sorted, the database can jump straight to what it needs in roughly O(log n) time instead of scanning all n rows.

CREATE INDEX idx_users_email ON users (email);

The tradeoff is real and worth saying out loud: reads get dramatically faster, but every INSERT, UPDATE, and DELETE now has to keep the index up to date, and the index takes disk space. So you index deliberately, for the queries you actually run, not on every column "just in case".

First, make the database show you

Before you add anything, stop guessing and ask the database what it is doing. In Postgres, that is EXPLAIN ANALYZE:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;

You are looking for one line in the output:

Seq Scan on orders means a full table scan. On a large table for a frequent query, this is your smoking gun.
Index Scan or Index Only Scan means it is using an index.

Pay attention to the actual time and the row counts too. MongoDB has the same idea:

db.orders.find({ customerId: 42 }).explain("executionStats")

There, COLLSCAN is the full scan and IXSCAN means an index was used.

The rule of thumb: if a query you run often shows a sequential scan over a big table, you have found the thing to fix.

The fix: index what you filter on

CREATE INDEX idx_orders_customer ON orders (customer_id);

Run the EXPLAIN again. The Seq Scan becomes an Index Scan, and the time drops from seconds to milliseconds. That is the whole loop: find the scan, index the filtered column, confirm the plan changed.

Composite indexes and why column order matters

Most real queries filter on more than one thing:

SELECT * FROM orders
WHERE customer_id = 42 AND status = 'paid'
ORDER BY created_at DESC;

For this you want a composite (multi-column) index:

CREATE INDEX idx_orders_customer_status_created
ON orders (customer_id, status, created_at DESC);

Here is the part people miss: the order of columns in the index matters, because of the leftmost-prefix rule. An index on (a, b, c) can serve queries that filter on a, on a and b, or on a, b, and c, but it is useless for a query that filters only on b or only on c.

Think of a phone book sorted by last name, then first name. It is perfect for finding "Agbo, Sixtus" and fine for finding everyone named "Agbo", but it is no help at all if all you know is the first name. The practical guideline: put the columns you match by equality first, then the column you use for ranges or sorting.

Covering indexes: let the index answer the whole query

If an index already contains every column a query needs, the database can answer entirely from the index and never touch the table at all. Postgres calls this an index-only scan.

CREATE INDEX idx_orders_customer_covering
ON orders (customer_id) INCLUDE (status, total);

A query that selects only status and total for a customer is now served straight from the index. This is a great trick for hot read paths, used carefully.

When indexes don't help, or actively hurt

Indexes are not free wins. They can do nothing, or make things worse:

Low selectivity. Indexing a boolean, or a column where 90% of rows share the same value, often gets ignored. The planner correctly decides scanning is cheaper. Index columns with many distinct values.
Non-sargable predicates. Wrap an indexed column in a function and the index can't be used:
```
WHERE LOWER(email) = 'a@b.com'   -- skips the index on email
```
Fix it by normalizing the data on write, or by creating a functional index: CREATE INDEX ON users (LOWER(email)).
Leading wildcards. WHERE name LIKE '%smith' can't use a normal B-tree. LIKE 'smith%' can.
Too many indexes. Every index is maintained on every write. A write-heavy table with a dozen indexes pays that tax on every insert. Index for the queries you run, then stop.

One more that bites people: in Postgres, foreign keys are not indexed automatically. If you join on a foreign key and never added an index for it, that join is scanning.

How I actually decide what to index

Look at the query's WHERE, JOIN, and ORDER BY clauses. Those columns are your candidates. Then weigh by frequency: a query that runs ten thousand times a minute earns an index far more than a report that runs once a month. Add the index, then run EXPLAIN ANALYZE before and after to confirm it actually got used. Never assume, measure.

The mistakes I see most often:

Adding an index and never checking EXPLAIN to confirm it is used.
Getting the column order wrong in a composite index.
Indexing every column out of fear.
Functions on indexed columns in the WHERE clause.
Forgetting that foreign keys are not auto-indexed.

The cheat sheet

Slow query? Run EXPLAIN ANALYZE first, every time.
A Seq Scan on a big table for a frequent query means add an index on the filtered columns.
Filtering on several columns means a composite index: equality columns first, then range and sort columns.
A read-heavy hot path can benefit from a covering index.
Keep your predicates sargable: no functions or leading wildcards on indexed columns.
Don't over-index. Every index taxes your writes.

Indexes aren't magic, and they aren't free. But most "the database is slow" problems are really "the database is scanning everything because nothing told it where to look." Learn to read the query plan, index the columns your real queries touch, and verify the plan actually changed. That one habit will fix more performance problems than any amount of caching bolted on top.

Cover photo by Jan Antonin Kolar on Unsplash