Re: [RFC bpf-next 09/16] bpf: Add BPF_TRAMPOLINE_BATCH_ATTACH support

Jiri Olsa <jolsa@xxxxxxxxxx> · Sun, 25 Oct 2020 20:41:23 +0100

On Fri, Oct 23, 2020 at 03:23:10PM -0700, Andrii Nakryiko wrote:
> On Fri, Oct 23, 2020 at 1:31 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >
> > On Fri, 23 Oct 2020 13:03:22 -0700
> > Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > > Basically, maybe ftrace subsystem could provide a set of APIs to
> > > prepare a set of functions to attach to. Then BPF subsystem would just
> > > do what it does today, except instead of attaching to a specific
> > > kernel function, it would attach to ftrace's placeholder. I don't know
> > > anything about ftrace implementation, so this might be far off. But I
> > > thought that looking at this problem from a bit of a different angle
> > > would benefit the discussion. Thoughts?
> >
> > I probably understand bpf internals as much as you understand ftrace
> > internals ;-)
> >
> 
> Heh :) But while we are here, what do you think about this idea of
> preparing a no-op trampoline, that a bunch (thousands, potentially) of
> function entries will jump to. And once all that is ready and patched
> through kernel functions entry points, then allow to attach BPF
> program or ftrace callback (if I get the terminology right) in a one
> fast and simple operation? For users that would mean that they will
> either get calls for all or none of attached kfuncs, with a simple and
> reliable semantics.

so the main pain point the batch interface is addressing, is that
every attach (BPF_RAW_TRACEPOINT_OPEN command) calls register_ftrace_direct,
and you'll need to do the same for nop trampoline, no?

I wonder if we could create some 'transaction object' represented
by fd and add it to bpf_attr::raw_tracepoint

then attach (BPF_RAW_TRACEPOINT_OPEN command) would add program to this
new 'transaction object' instead of updating ftrace directly

and when the collection is done (all BPF_RAW_TRACEPOINT_OPEN command
are executed), we'd call new bpf syscall command on that transaction
and it would call ftrace interface

something like:

  bpf(TRANSACTION_NEW) = fd
  bpf(BPF_RAW_TRACEPOINT_OPEN) for prog_fd_1, fd
  bpf(BPF_RAW_TRACEPOINT_OPEN) for prog_fd_2, fd
  ...
  bpf(TRANSACTION_DONE) for fd

jirka

> 
> Something like this, where bpf_prog attachment (which replaces nop)
> happens as step 2:
> 
> +------------+  +----------+  +----------+
> |  kfunc1    |  |  kfunc2  |  |  kfunc3  |
> +------+-----+  +----+-----+  +----+-----+
>        |             |             |
>        |             |             |
>        +---------------------------+
>                      |
>                      v
>                  +---+---+           +-----------+
>                  |  nop  +----------->  bpf_prog |
>                  +-------+           +-----------+
> 
> 
> > Anyway, what I'm currently working on, is a fast way to get to the
> > arguments of a function. For now, I'm just focused on x86_64, and only add
> > 6 argments.
> >
> > The main issue that Alexei had with using the ftrace trampoline, was that
> > the only way to get to the arguments was to set the "REGS" flag, which
> > would give a regs parameter that contained a full pt_regs. The problem with
> > this approach is that it required saving *all* regs for every function
> > traced. Alexei felt that this was too much overehead.
> >
> > Looking at Jiri's patch, I took a look at the creation of the bpf
> > trampoline, and noticed that it's copying the regs on a stack (at least
> > what is used, which I think could be an issue).
> 
> Right. And BPF doesn't get access to the entire pt_regs struct, so it
> doesn't have to pay the prices of saving it.
> 
> But just FYI. Alexei is out till next week, so don't expect him to
> reply in the next few days. But he's probably best to discuss these
> nitty-gritty details with :)
> 
> >
> > For tracing a function, one must store all argument registers used, and
> > restore them, as that's how they are passed from caller to callee. And
> > since they are stored anyway, I figure, that should also be sent to the
> > function callbacks, so that they have access to them too.
> >
> > I'm working on a set of patches to make this a reality.
> >
> > -- Steve
>