p99 is a liar, and you've been lied to

An older version of an argument I keep having. The 2026 update lives on the feed; I'm keeping this here for the people who linked it.

Note (2026): this is the original 2024 post. I rewrote the argument with better examples in "Tail latency for people who write CRUD apps". Read that one first if you have time. I'm leaving this here because a few people have linked it and I don't want to break the link.

The argument, briefly

p99 is the latency below which 99% of your requests complete. It's the most common "tail" metric we use, and it has the property of feeling like the truth.

But: p99 hides the worst 1%. If you serve a million requests an hour, that 1% is ten thousand requests. p99 says nothing about how bad those are. The p99 might be 200 ms; the worst-of-the-worst might be 30 seconds. Most dashboards I've ever seen would tell you everything is fine.

The fix

Look at p99.9. Look at max. Look at the histogram, occasionally. Don't let one number do all the talking.

There's a whole longer thing about coordinated omission and fan-out math, but I'm saving that for the rewrite, which is here.

← archive