~60x speed-up of Linux "perf"

I’ve recently been using cargo-flamegraph to profile syndicate-server.

The tool is great, but it uses perf to record and analyze profile data, and perf on Debian has a performance problem: when not linked against libbfd, it shells out to addr2line for every address it needs to look up. Thousands and thousands and thousands of incredibly short-lived processes.

Michał Sidor suggests building against libbfd, something that the Debian maintainers aren’t allowed to do.1 I tried the other approach, suggested by Steinar H. Gunderson, of patching perf to use a long-running addr2line process instead, sending queries to it over a pipe.

This works very well. What used to take endless minutes now takes a few seconds. It makes working with cargo-flamegraph much more pleasant!

With the unpatched, debian-default perf_5.10, running on about 10 seconds of activity in syndicate-server:

$ time /usr/bin/perf_5.10 script -i perf.data >/dev/null

real    12m51.499s
user    11m57.455s
sys     0m53.821s

With my patch:

$ time perf script -i perf.data >/dev/null

real    0m11.335s
user    0m11.047s
sys     0m0.309s

That’s sixty-eight times faster.

You can download the patch here.

Links:

  1. The problem is an unfortunate incompatibility of licenses: perf is GPLv2, not GPLv2+, and libbfd is GPLv3+.