eBPF ringbuf vs perfbuf, with numbers

Two ways to get data from kernel-side eBPF programs to userspace. The newer one is better. Here's how much.

The setup

A simple eBPF program attached to tracepoint:syscalls:sys_enter_openat, pushing one event per call to userspace. A userspace reader counting events. Workload: a Go program in a tight loop calling open() on /dev/null, ten million times.

Two implementations: one using the older perf event array (BPF_MAP_TYPE_PERF_EVENT_ARRAY), one using ringbuf (BPF_MAP_TYPE_RINGBUF, since kernel 5.8).

Numbers

impl       events lost   wallclock   userspace cpu
perfbuf    23,142        4.81 s      71%
ringbuf    0             3.94 s      48%

Why ringbuf wins

Ringbuf is one shared buffer, not per-CPU. It uses a real epoll-friendly fd. Reservations are atomic; producers don't have to serialize through a kernel-managed copy. For most workloads where you're shipping events to userspace, ringbuf is the answer in 2025.

Where perfbuf still makes sense: if you genuinely need per-CPU isolation (you're sampling, not collecting every event), or if you're stuck on a kernel before 5.8.

The gotcha

Ringbuf's "no events lost" guarantee is structural — if there's no space, the reservation fails and your bpf program sees that and decides what to do. You have to handle the failure deliberately. Drop? Block? Sample? The decision is yours, not the kernel's. With perfbuf the kernel just drops on your behalf and you might not notice.

← archive