On Sat, May 4, 2024 at 10:09 PM Ian Rogers <irogers@xxxxxxxxxx> wrote: > > On Sat, May 4, 2024 at 2:57 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Sat, May 4, 2024 at 8:29 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > On Fri, May 03, 2024 at 05:30:06PM -0700, Andrii Nakryiko wrote: > > > > Implement a simple tool/benchmark for comparing address "resolution" > > > > logic based on textual /proc/<pid>/maps interface and new binary > > > > ioctl-based PROCFS_PROCMAP_QUERY command. > > > > > > Of course an artificial benchmark of "read a whole file" vs. "a tiny > > > ioctl" is going to be different, but step back and show how this is > > > going to be used in the real world overall. Pounding on this file is > > > not a normal operation, right? > > > > > > > It's not artificial at all. It's *exactly* what, say, blazesym library > > is doing (see [0], it's Rust and part of the overall library API, I > > think C code in this patch is way easier to follow for someone not > > familiar with implementation of blazesym, but both implementations are > > doing exactly the same sequence of steps). You can do it even less > > efficiently by parsing the whole file, building an in-memory lookup > > table, then looking up addresses one by one. But that's even slower > > and more memory-hungry. So I didn't even bother implementing that, it > > would put /proc/<pid>/maps at even more disadvantage. > > > > Other applications that deal with stack traces (including perf) would > > be doing one of those two approaches, depending on circumstances and > > level of sophistication of code (and sensitivity to performance). > > The code in perf doing this is here: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/synthetic-events.c#n440 > The code is using the api/io.h code: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/api/io.h > Using perf to profile perf it was observed time was spent allocating > buffers and locale related activities when using stdio, so io is a > lighter weight alternative, albeit with more verbose code than fscanf. > You could add this as an alternate /proc/<pid>/maps reader, we have a > similar benchmark in `perf bench internals synthesize`. > If I add a new implementation using this ioctl() into perf_event__synthesize_mmap_events(), will it be tested from this `perf bench internals synthesize`? I'm not too familiar with perf code organization, sorry if it's a stupid question. If not, where exactly is the code that would be triggered from benchmark? > Thanks, > Ian > > > [0] https://github.com/libbpf/blazesym/blob/ee9b48a80c0b4499118a1e8e5d901cddb2b33ab1/src/normalize/user.rs#L193 > > > > > thanks, > > > > > > greg k-h > >