eBPF ringbuf vs perfbuf, with numbers
Two ways to get data from kernel-side eBPF programs to userspace. The newer one is better. Here's how much.
The setup
A simple eBPF program attached to tracepoint:syscalls:sys_enter_openat,
pushing one event per call to userspace. A userspace reader
counting events. Workload: a Go program in a tight loop calling
open() on /dev/null, ten million times.
Two implementations: one using the older perf event array
(BPF_MAP_TYPE_PERF_EVENT_ARRAY), one using ringbuf
(BPF_MAP_TYPE_RINGBUF, since kernel 5.8).
Numbers
impl events lost wallclock userspace cpu
perfbuf 23,142 4.81 s 71%
ringbuf 0 3.94 s 48%
Why ringbuf wins
Ringbuf is one shared buffer, not per-CPU. It uses a real epoll-friendly fd. Reservations are atomic; producers don't have to serialize through a kernel-managed copy. For most workloads where you're shipping events to userspace, ringbuf is the answer in 2025.
Where perfbuf still makes sense: if you genuinely need per-CPU isolation (you're sampling, not collecting every event), or if you're stuck on a kernel before 5.8.
The gotcha
Ringbuf's "no events lost" guarantee is structural — if there's no space, the reservation fails and your bpf program sees that and decides what to do. You have to handle the failure deliberately. Drop? Block? Sample? The decision is yours, not the kernel's. With perfbuf the kernel just drops on your behalf and you might not notice.