Re: [PATCH 5/5] selftests/bpf: a simple benchmark tool for /proc/<pid>/maps APIs

Ian Rogers <irogers@xxxxxxxxxx> · Mon, 6 May 2024 11:43:21 -0700



On Mon, May 6, 2024 at 11:32 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Sat, May 4, 2024 at 10:09 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > On Sat, May 4, 2024 at 2:57 PM Andrii Nakryiko
> > <andrii.nakryiko@xxxxxxxxx> wrote:
> > >
> > > On Sat, May 4, 2024 at 8:29 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Fri, May 03, 2024 at 05:30:06PM -0700, Andrii Nakryiko wrote:
> > > > > Implement a simple tool/benchmark for comparing address "resolution"
> > > > > logic based on textual /proc/<pid>/maps interface and new binary
> > > > > ioctl-based PROCFS_PROCMAP_QUERY command.
> > > >
> > > > Of course an artificial benchmark of "read a whole file" vs. "a tiny
> > > > ioctl" is going to be different, but step back and show how this is
> > > > going to be used in the real world overall.  Pounding on this file is
> > > > not a normal operation, right?
> > > >
> > >
> > > It's not artificial at all. It's *exactly* what, say, blazesym library
> > > is doing (see [0], it's Rust and part of the overall library API, I
> > > think C code in this patch is way easier to follow for someone not
> > > familiar with implementation of blazesym, but both implementations are
> > > doing exactly the same sequence of steps). You can do it even less
> > > efficiently by parsing the whole file, building an in-memory lookup
> > > table, then looking up addresses one by one. But that's even slower
> > > and more memory-hungry. So I didn't even bother implementing that, it
> > > would put /proc/<pid>/maps at even more disadvantage.
> > >
> > > Other applications that deal with stack traces (including perf) would
> > > be doing one of those two approaches, depending on circumstances and
> > > level of sophistication of code (and sensitivity to performance).
> >
> > The code in perf doing this is here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/synthetic-events.c#n440
> > The code is using the api/io.h code:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/api/io.h
> > Using perf to profile perf it was observed time was spent allocating
> > buffers and locale related activities when using stdio, so io is a
> > lighter weight alternative, albeit with more verbose code than fscanf.
> > You could add this as an alternate /proc/<pid>/maps reader, we have a
> > similar benchmark in `perf bench internals synthesize`.
> >
>
> If I add a new implementation using this ioctl() into
> perf_event__synthesize_mmap_events(), will it be tested from this
> `perf bench internals synthesize`? I'm not too familiar with perf code
> organization, sorry if it's a stupid question. If not, where exactly
> is the code that would be triggered from benchmark?

Yes it would be triggered :-)

Thanks,
Ian

> > Thanks,
> > Ian
> >
> > >   [0] https://github.com/libbpf/blazesym/blob/ee9b48a80c0b4499118a1e8e5d901cddb2b33ab1/src/normalize/user.rs#L193
> > >
> > > > thanks,
> > > >
> > > > greg k-h
> > >