On Thu, Apr 27, 2023 at 5:44 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > On Wed, Apr 26, 2023 at 12:09:59PM -0700, Andrii Nakryiko wrote: > > On Mon, Apr 24, 2023 at 9:04 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > > > > > hi, > > > this patchset is adding support to attach multiple uprobes and usdt probes > > > through new uprobe_multi link. > > > > > > The current uprobe is attached through the perf event and attaching many > > > uprobes takes a lot of time because of that. > > > > > > The main reason is that we need to install perf event for each probed function > > > and profile shows perf event installation (perf_install_in_context) as culprit. > > > > > > The new uprobe_multi link just creates raw uprobes and attaches the bpf > > > program to them without perf event being involved. > > > > > > In addition to being faster we also save file descriptors. For the current > > > uprobe attach we use extra perf event fd for each probed function. The new > > > link just need one fd that covers all the functions we are attaching to. > > > > All of the above are good reasons and thanks for tackling multi-uprobe! > > > > > > > > By dropping perf we lose the ability to attach uprobe to specific pid. > > > We can workaround that by having pid check directly in the bpf program, > > > but we might need to check for another solution if that will turn out > > > to be a problem. > > > > > > > I think this is a big deal, because it makes multi-uprobe not a > > drop-in replacement for normal uprobes even for typical scenarios. It > > might be why you couldn't do transparent use of uprobe.multi in USDT? > > yes > > > > > But I'm not sure why this is a problem? How does perf handle this? > > Does it do runtime filtering or something more efficient that prevents > > uprobe to be triggered for other PIDs in the first place? If it's the > > former, then why can't we do the same simple check ourselves if pid > > filter is specified? > > so the standard uprobe is basically a perf event and as such it can be > created with 'pid' as a target.. and such perf event will get installed > only when the process with that pid is scheduled in and uninstalled > when it's scheduled out > > > > > I also see that uprobe_consumer has filter callback, not sure if it's > > a better solution just for pid filtering, but might be another way to > > do this? > > yes, that's probably how we will have to do that, will check callback seems like overkill as we'll be paying indirect call price. So a simple if statement in either uprobe_prog_run or in uprobe_multi_link_ret_handler/uprobe_multi_link_handler seems like better solution, IMO. > > > > > Another aspect I wanted to discuss (and I don't know the right answer) > > was whether we need to support separate binary path for each offset? > > It would simplify (and trim down memory usage significantly) a bunch > > of internals if we knew we are dealing with single inode for each > > multi-uprobe link. I'm trying to think if it would be limiting in > > practice to have to create link per each binary, and so far it seems > > like usually user-space code will do symbol resolution per ELF file > > anyways, so doesn't seem limiting to have single path + multiple > > offsets/cookies within that file. For USDTs use case even ref_ctr is > > probably the same, but I'd keep it 1:1 with offset and cookie anyways. > > For uniformity and generality. > > > > WDYT? > > right, it's waste for single binary, but I guess it's not a big waste, > because when you have single binary you just repeat the same pointer, > not the path > > it's fast enough to be called multiple times for each binary you want > to trace, but it'd be also nice to be able to attach all in once ;-) > > maybe we could have a bit in flags saying paths[0] is valid for all No need for extra flags. I was just thinking about having a simpler and more straightforward API, where you don't need to create another array with tons of duplicated string pointers. No big deal, I'm fine either way. > > > > > > > > > Attaching current bpftrace to 1000 uprobes: > > > > > > # BPFTRACE_MAX_PROBES=100000 perf stat -e cycles,instructions \ > > > ./bpftrace -e 'uprobe:./uprobe_multi:uprobe_multi_func_* { }, i:ms:1 { exit(); }' > > > ... > > > > > > 126,666,390,509 cycles > > > 29,973,207,307 instructions # 0.24 insn per cycle > > > > > > 85.284833554 seconds time elapsed > > > > > > > > > Same bpftrace setup with uprobe_multi support: > > > > > > # perf stat -e cycles,instructions \ > > > ./bpftrace -e 'uprobe:./uprobe_multi:uprobe_multi_func_* { }, i:ms:1 { exit(); }' > > > ... > > > > > > 6,818,470,649 cycles > > > 13,275,510,122 instructions # 1.95 insn per cycle > > > > > > 1.943269451 seconds time elapsed > > > > > > > > > I'm sending this as RFC because of: > > > - I added/exported some new elf_* helper functions in libbpf, > > > and I'm not sure that's the best/right way of doing this > > > > didn't get to that yet, sounds suspicious :) > > > > > - I'm not completely sure about the usdt integration in bpf_program__attach_usdt, > > > I was trying to detect uprobe_multi kernel support first, but ended up with > > > just new field for struct bpf_usdt_opts > > > > haven't gotten to this yet as well, but it has to be auto-detectable, > > not an option (at least I don't see why it wouldn't be, but let me get > > to the patch) > > thanks, > jirka