EEVDF: a first look at the new Linux scheduler

Linux 6.6 swapped CFS for EEVDF. Most workloads don't care. A few do, in interesting ways. This is the boring report from one of the workloads that does.

What changed, briefly

CFS — the Completely Fair Scheduler — picked the next task by virtual runtime: whoever had run least, runs next. EEVDF — Earliest Eligible Virtual Deadline First — adds a deadline. Each task gets a deadline based on its weight and a "request size", and the scheduler picks the eligible task with the earliest deadline.

The user-visible consequence: latency-sensitive tasks can be tagged with a smaller request size and get scheduled more eagerly without starving anyone else. This is what SCHED_LATENCY_NICE in 6.7+ exposes.

What I tested

A Go HTTP service doing 5k req/s of mixed work — small JSON responses, a couple of database calls, the occasional larger payload. CPU about 40% utilized, two CPU-pinned stress-ng workers in the background to create realistic noise.

Three kernels: 6.1 (CFS), 6.6 (EEVDF, default tuning), 6.8 (EEVDF with latency_nice=-10 on the service).

Numbers

kernel        p50      p99      p99.9    max
6.1 CFS       4.2 ms   23 ms    81 ms    340 ms
6.6 EEVDF     4.1 ms   22 ms    74 ms    290 ms
6.8 EEVDF*    4.0 ms   18 ms    52 ms    190 ms

* with sched_setattr(SCHED_FLAG_LATENCY_NICE) lat=-10

The default-tuning case is a wash. The latency-nice case is real improvement on the tail. p50 doesn't budge, which is what I expected; EEVDF isn't faster, it's fairer about who waits.

What I did not see

I did not see the regressions some people on LWN warned about. My workload doesn't have many interactive desktop tasks competing with long batch jobs, which is where the most painful CFS-to-EEVDF surprises seem to live.

What I'd suggest

If you run latency-sensitive services on Linux, upgrade to 6.6+, measure your tails, then experiment with latency_nice on the hot path. Don't reach for SCHED_FIFO or SCHED_RR; they're a sledgehammer and EEVDF gives you a chisel.

I'll write a follow-up once I have numbers from a workload with more background batch interference. Probably this summer.

← archive