Re: [RFC/PATCH bpf-next 00/20] bpf: Add multi uprobe link

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 27 Apr 2023 15:24:25 -0700

On Thu, Apr 27, 2023 at 5:44 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> On Wed, Apr 26, 2023 at 12:09:59PM -0700, Andrii Nakryiko wrote:
> > On Mon, Apr 24, 2023 at 9:04 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> > >
> > > hi,
> > > this patchset is adding support to attach multiple uprobes and usdt probes
> > > through new uprobe_multi link.
> > >
> > > The current uprobe is attached through the perf event and attaching many
> > > uprobes takes a lot of time because of that.
> > >
> > > The main reason is that we need to install perf event for each probed function
> > > and profile shows perf event installation (perf_install_in_context) as culprit.
> > >
> > > The new uprobe_multi link just creates raw uprobes and attaches the bpf
> > > program to them without perf event being involved.
> > >
> > > In addition to being faster we also save file descriptors. For the current
> > > uprobe attach we use extra perf event fd for each probed function. The new
> > > link just need one fd that covers all the functions we are attaching to.
> >
> > All of the above are good reasons and thanks for tackling multi-uprobe!
> >
> > >
> > > By dropping perf we lose the ability to attach uprobe to specific pid.
> > > We can workaround that by having pid check directly in the bpf program,
> > > but we might need to check for another solution if that will turn out
> > > to be a problem.
> > >
> >
> > I think this is a big deal, because it makes multi-uprobe not a
> > drop-in replacement for normal uprobes even for typical scenarios. It
> > might be why you couldn't do transparent use of uprobe.multi in USDT?
>
> yes
>
> >
> > But I'm not sure why this is a problem? How does perf handle this?
> > Does it do runtime filtering or something more efficient that prevents
> > uprobe to be triggered for other PIDs in the first place? If it's the
> > former, then why can't we do the same simple check ourselves if pid
> > filter is specified?
>
> so the standard uprobe is basically a perf event and as such it can be
> created with 'pid' as a target.. and such perf event will get installed
> only when the process with that pid is scheduled in and uninstalled
> when it's scheduled out
>
> >
> > I also see that uprobe_consumer has filter callback, not sure if it's
> > a better solution just for pid filtering, but might be another way to
> > do this?
>
> yes, that's probably how we will have to do that, will check

callback seems like overkill as we'll be paying indirect call price.
So a simple if statement in either uprobe_prog_run or in
uprobe_multi_link_ret_handler/uprobe_multi_link_handler seems like
better solution, IMO.

>
> >
> > Another aspect I wanted to discuss (and I don't know the right answer)
> > was whether we need to support separate binary path for each offset?
> > It would simplify (and trim down memory usage significantly) a bunch
> > of internals if we knew we are dealing with single inode for each
> > multi-uprobe link. I'm trying to think if it would be limiting in
> > practice to have to create link per each binary, and so far it seems
> > like usually user-space code will do symbol resolution per ELF file
> > anyways, so doesn't seem limiting to have single path + multiple
> > offsets/cookies within that file. For USDTs use case even ref_ctr is
> > probably the same, but I'd keep it 1:1 with offset and cookie anyways.
> > For uniformity and generality.
> >
> > WDYT?
>
> right, it's waste for single binary, but I guess it's not a big waste,
> because when you have single binary you just repeat the same pointer,
> not the path
>
> it's fast enough to be called multiple times for each binary you want
> to trace, but it'd be also nice to be able to attach all in once ;-)
>
> maybe we could have a bit in flags saying paths[0] is valid for all

No need for extra flags. I was just thinking about having a simpler
and more straightforward API, where you don't need to create another
array with tons of duplicated string pointers. No big deal, I'm fine
either way.

>
> >
> > >
> > > Attaching current bpftrace to 1000 uprobes:
> > >
> > >   # BPFTRACE_MAX_PROBES=100000 perf stat -e cycles,instructions \
> > >     ./bpftrace -e 'uprobe:./uprobe_multi:uprobe_multi_func_* { }, i:ms:1 { exit(); }'
> > >     ...
> > >
> > >      126,666,390,509      cycles
> > >       29,973,207,307      instructions                     #    0.24  insn per cycle
> > >
> > >         85.284833554 seconds time elapsed
> > >
> > >
> > > Same bpftrace setup with uprobe_multi support:
> > >
> > >   # perf stat -e cycles,instructions \
> > >     ./bpftrace -e 'uprobe:./uprobe_multi:uprobe_multi_func_* { }, i:ms:1 { exit(); }'
> > >     ...
> > >
> > >        6,818,470,649      cycles
> > >       13,275,510,122      instructions                     #    1.95  insn per cycle
> > >
> > >          1.943269451 seconds time elapsed
> > >
> > >
> > > I'm sending this as RFC because of:
> > >   - I added/exported some new elf_* helper functions in libbpf,
> > >     and I'm not sure that's the best/right way of doing this
> >
> > didn't get to that yet, sounds suspicious :)
> >
> > >   - I'm not completely sure about the usdt integration in bpf_program__attach_usdt,
> > >     I was trying to detect uprobe_multi kernel support first, but ended up with
> > >     just new field for struct bpf_usdt_opts
> >
> > haven't gotten to this yet as well, but it has to be auto-detectable,
> > not an option (at least I don't see why it wouldn't be, but let me get
> > to the patch)
>
> thanks,
> jirka