~60x speed-up of Linux "perf"
Thu 9 Sep 2021 13:31 CEST
I’ve recently been using cargo-flamegraph to profile syndicate-server.
The tool is great, but it uses perf
to record and analyze profile
data, and
perf
on Debian has a performance problem:
when not linked against libbfd
, it shells out to addr2line
for
every address it needs to look up. Thousands and thousands and
thousands of incredibly short-lived processes.
Michał Sidor suggests building against libbfd
,
something that the Debian maintainers
aren’t allowed to do.1
I tried the other approach,
suggested by Steinar H. Gunderson,
of patching perf
to use a long-running addr2line
process instead,
sending queries to it over a pipe.
This works very well. What used to take endless minutes now takes a few seconds. It makes working with cargo-flamegraph much more pleasant!
With the unpatched, debian-default perf_5.10
, running on about 10
seconds of activity in syndicate-server
:
$ time /usr/bin/perf_5.10 script -i perf.data >/dev/null
real 12m51.499s
user 11m57.455s
sys 0m53.821s
With my patch:
$ time perf script -i perf.data >/dev/null
real 0m11.335s
user 0m11.047s
sys 0m0.309s
That’s sixty-eight times faster.
You can download the patch here.
Links:
- Debian bug report against
perf
: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911815 - Michał Sidor’s very helpful post: https://michcioperz.com/post/slow-perf-script/
- My patch thread at the
linux-perf-users
mailing list: https://lore.kernel.org/linux-perf-users/20210909112202.1947499-1-tonyg@leastfixedpoint.com/
-
The problem is an unfortunate incompatibility of licenses:
perf
is GPLv2, not GPLv2+, andlibbfd
is GPLv3+. ↩