On Sun, Oct 25, 2020 at 12:41 PM Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > On Fri, Oct 23, 2020 at 03:23:10PM -0700, Andrii Nakryiko wrote: > > On Fri, Oct 23, 2020 at 1:31 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > > > > > On Fri, 23 Oct 2020 13:03:22 -0700 > > > Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > > Basically, maybe ftrace subsystem could provide a set of APIs to > > > > prepare a set of functions to attach to. Then BPF subsystem would just > > > > do what it does today, except instead of attaching to a specific > > > > kernel function, it would attach to ftrace's placeholder. I don't know > > > > anything about ftrace implementation, so this might be far off. But I > > > > thought that looking at this problem from a bit of a different angle > > > > would benefit the discussion. Thoughts? > > > > > > I probably understand bpf internals as much as you understand ftrace > > > internals ;-) > > > > > > > Heh :) But while we are here, what do you think about this idea of > > preparing a no-op trampoline, that a bunch (thousands, potentially) of > > function entries will jump to. And once all that is ready and patched > > through kernel functions entry points, then allow to attach BPF > > program or ftrace callback (if I get the terminology right) in a one > > fast and simple operation? For users that would mean that they will > > either get calls for all or none of attached kfuncs, with a simple and > > reliable semantics. > > so the main pain point the batch interface is addressing, is that > every attach (BPF_RAW_TRACEPOINT_OPEN command) calls register_ftrace_direct, > and you'll need to do the same for nop trampoline, no? I guess I had a hope that if we know it's a nop that we are installing, then we can do it without extra waiting, which should speed it up quite a bit. > > I wonder if we could create some 'transaction object' represented > by fd and add it to bpf_attr::raw_tracepoint > > then attach (BPF_RAW_TRACEPOINT_OPEN command) would add program to this > new 'transaction object' instead of updating ftrace directly > > and when the collection is done (all BPF_RAW_TRACEPOINT_OPEN command > are executed), we'd call new bpf syscall command on that transaction > and it would call ftrace interface > This is conceptually something like what I had in mind, but I had a single BPF program attached to many kernel functions in mind. Something that's impossible today, as you mentioned in another thread. > something like: > > bpf(TRANSACTION_NEW) = fd > bpf(BPF_RAW_TRACEPOINT_OPEN) for prog_fd_1, fd > bpf(BPF_RAW_TRACEPOINT_OPEN) for prog_fd_2, fd > ... > bpf(TRANSACTION_DONE) for fd > > jirka > > > > > Something like this, where bpf_prog attachment (which replaces nop) > > happens as step 2: > > > > +------------+ +----------+ +----------+ > > | kfunc1 | | kfunc2 | | kfunc3 | > > +------+-----+ +----+-----+ +----+-----+ > > | | | > > | | | > > +---------------------------+ > > | > > v > > +---+---+ +-----------+ > > | nop +-----------> bpf_prog | > > +-------+ +-----------+ > > > > > > > Anyway, what I'm currently working on, is a fast way to get to the > > > arguments of a function. For now, I'm just focused on x86_64, and only add > > > 6 argments. > > > > > > The main issue that Alexei had with using the ftrace trampoline, was that > > > the only way to get to the arguments was to set the "REGS" flag, which > > > would give a regs parameter that contained a full pt_regs. The problem with > > > this approach is that it required saving *all* regs for every function > > > traced. Alexei felt that this was too much overehead. > > > > > > Looking at Jiri's patch, I took a look at the creation of the bpf > > > trampoline, and noticed that it's copying the regs on a stack (at least > > > what is used, which I think could be an issue). > > > > Right. And BPF doesn't get access to the entire pt_regs struct, so it > > doesn't have to pay the prices of saving it. > > > > But just FYI. Alexei is out till next week, so don't expect him to > > reply in the next few days. But he's probably best to discuss these > > nitty-gritty details with :) > > > > > > > > For tracing a function, one must store all argument registers used, and > > > restore them, as that's how they are passed from caller to callee. And > > > since they are stored anyway, I figure, that should also be sent to the > > > function callbacks, so that they have access to them too. > > > > > > I'm working on a set of patches to make this a reality. > > > > > > -- Steve > > >