When the Postgres planner lies to you (and how to tell)
EXPLAIN ANALYZE is not a confession. It's a story the planner tells you about what it expected and then what it did. Both parts can mislead, in different ways.
The two numbers
Every node in the plan has two numbers I care about: the estimated rows and the actual rows. They look like this:
Index Scan using users_pkey on users
(cost=0.43..8.45 rows=1 width=320)
(actual time=0.012..0.013 rows=1 loops=1)
Estimated 1, actual 1. Healthy. The lies start when those numbers diverge by an order of magnitude or more — and especially when they diverge in a way the planner couldn't have known about.
Three kinds of lie
1. The stats are stale. ANALYZE hasn't run since you
bulk-loaded that table. The planner thinks the table has 100k rows;
it has 50M. Run ANALYZE; the lie evaporates. This is
the easy case and accounts for maybe a third of "the planner is
broken" complaints I see.
2. The stats are fine, the correlation is wrong. The
planner knows column A has selectivity 0.01 and column B has
selectivity 0.1, so WHERE A AND B "should" return
0.001 of the table. But A and B are not independent. They might be
perfectly correlated. The planner has no way to know unless you
create extended statistics:
CREATE STATISTICS users_country_lang
(dependencies, ndistinct)
ON country, language FROM users;
ANALYZE users;
I forget about CREATE STATISTICS approximately once a
quarter and rediscover it the hard way.
3. The plan is right, the timing isn't. A plan can have correct row estimates and still be slow because of buffer cache behavior, lock contention, or the JIT taking 200ms to warm up on a short query. EXPLAIN ANALYZE shows wall-clock time including JIT-compile time. If you see a node with 50ms of "Planning Time" and "JIT" in the output, the plan is fine; the engine is being weird.
Reading the diff
My eye goes to one thing first: the loops column.
-> Index Scan using comments_post_id_idx
(actual time=0.014..0.018 rows=8 loops=1247)
That node ran 1247 times. The "rows" is per-loop average, not total. Total rows produced: 1247 × 8 ≈ 10k. Total time: 1247 × 0.018 ≈ 22ms. If you read the per-loop number and forget to multiply, you will be confused for an embarrassing amount of time, and I have been.
BUFFERS, always
Run EXPLAIN (ANALYZE, BUFFERS), every time. The buffer
counts tell you what the timing won't:
- shared hit — pages already in cache. Free.
- shared read — pages from disk. Not free.
- shared dirtied — pages this query had to write back, often because of hint-bit updates. Surprising.
A query that's "fast" because the data is hot in cache will be slow in production after a restart. BUFFERS shows you that. Timing doesn't.
The rest of my checklist
- Are estimates within 2× of actual? If not, ANALYZE. If still not, extended stats.
- Is the plan using the index you expected? If not, why not — selectivity, types, casts?
- Is there a
Sortnode that could be replaced by an index ordering? - Is there a
MaterializeorHashnode spilling to disk? (Look atMemory Usage; if it has "Disk: " in it, you've spilled.) - Are there nested loops with high
loops=? Often a missed hash join opportunity.
The trick I keep relearning
When a query is mysteriously slow and the plan looks fine, run it
twice. Then run EXPLAIN (ANALYZE, BUFFERS) on the
second run. Most of the "lies" I chase turn out to be cache-warmth
artifacts. The plan was telling the truth; I was reading it at the
wrong moment.
Edit, mar 2026: Postgres 18 changes some of this. Updating soon.