Deep dive: eBPF for zero-overhead network flow monitoring
9 min read · pinned
Traditional network monitoring tools like tcpdump copy packets to userspace, introducing measurable overhead at high packet rates. XDP (eXpress Data Path) programs run in the NIC driver before the kernel network stack, enabling zero-copy per-flow statistics with sub-microsecond overhead.
The trick is using BPF maps as per-CPU hash tables keyed by a 5-tuple (src/dst IP, src/dst port, proto). Each XDP program increments counters in-place — no memory allocation, no locks on the hot path.
TC (traffic control) hooks cover egress — XDP only sees ingress. Together they give full bidirectional visibility. A userspace daemon reads the BPF maps every second via bpf_map_lookup_elem and exports to Prometheus, keeping the kernel path completely allocation-free.
Profiling Go services in production with pprof and Flamegraph
6 min read
Go's runtime/pprof captures CPU, heap, goroutine, and block profiles. The easiest way to enable continuous profiling in a running service is registering the net/http/pprof handler — it adds endpoints under /debug/pprof/ with zero configuration.
To generate a FlameGraph: go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30. The built-in UI renders flame graphs, call trees, and source annotations.
Common pitfall: heap profiles show allocations, not live objects. Use /debug/pprof/heap?gc=1 to force a GC cycle before capturing.
Linux TCP tuning for high-throughput servers: BBR, buffer sizing, fast open
7 min read
CUBIC reacts to packet loss by halving the congestion window. BBR instead models the bottleneck directly — it tracks the maximum delivery rate and minimum RTT observed, keeping the pipe full without over-buffering.
BBR requires the fq packet scheduler for pacing — without it, BBR can burst and trigger loss. For kernels ≥ 6.x, BBRv3 is available as a module.