Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 19 Apr 2021 21:51:39 -0700

On Fri, Apr 16, 2021 at 8:03 AM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> Hi,
>
> On Thu, 15 Apr 2021 17:00:07 -0400
> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> >
> > [
> >   Added Masami, as I didn't realize he wasn't on Cc. He's the maintainer of
> >   kretprobes.
> >
> >   Masami, you may want to use lore.kernel.org to read the history of this
> >   thread.
> > ]
> >
> > On Thu, 15 Apr 2021 13:45:06 -0700
> > Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > > > I don't know how the BPF code does it, but if you are tracing the exit
> > > > of a function, I'm assuming that you hijack the return pointer and replace
> > > > it with a call to a trampoline that has access to the arguments. To do
> > >
> > > As Jiri replied, BPF trampoline doesn't do it the same way as
> > > kretprobe does it. Which gives the fexit BPF program another critical
> > > advantage over kretprobe -- we know traced function's entry IP in both
> > > entry and exit cases, which allows us to generically correlate them.
> > >
> > > I've tried to figure out how to get that entry IP from kretprobe and
> > > couldn't find any way. Do you know if it's possible at all or it's a
> > > fundamental limitation of the way kretprobe is implemented (through
> > > hijacking return address)?
>
> Inside the kretprobe handler, you can get the entry IP from kretprobe as below;
>
> static int my_kretprobe_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> {
>         struct kretprobe *rp = get_kretprobe(ri);
>         unsigned long entry = (unsigned long)rp->kp.addr;
>         unsigned long retaddr = (unsigned long)ri->ret_addr;
>         ...
> }

Great. In kprobe_perf_func(), which seems to be the callback that
triggers kretprobe BPF programs, we can get that struct kretprobe
through tk->rp. So we'll just need to figure out how to pass that into
the BPF program in a sane way. Thanks!

>
> It is ensured that rp != NULL in the handler.
>
> >
> > The function graph tracer has the entry IP on exit, today. That's how we
> > can trace and show this:
> >
> >  # cd /sys/kernel/tracing
> >  # echo 1 > echo 1 > options/funcgraph-tail
> >  # echo function_graph > current_tracer
> >  # cat trace
> > # tracer: function_graph
> > #
> > # CPU  DURATION                  FUNCTION CALLS
> > # |     |   |                     |   |   |   |
> >  7)   1.358 us    |  rcu_idle_exit();
> >  7)   0.169 us    |  sched_idle_set_state();
> >  7)               |  cpuidle_reflect() {
> >  7)               |    menu_reflect() {
> >  7)   0.170 us    |      tick_nohz_idle_got_tick();
> >  7)   0.585 us    |    } /* menu_reflect */
> >  7)   1.115 us    |  } /* cpuidle_reflect */
> >
> > That's how we can show the tail function that's called. I'm sure kreprobes
> > could do the same thing.
>
> Yes, I have to update the document how to do that (and maybe introduce 2 functions
> to wrap the entry/retaddr code)
>
> >
> > The patch series I shared with Jiri, was work to allow kretprobes to be
> > built on top of the function graph tracer.
> >
> > https://lore.kernel.org/lkml/20190525031633.811342628@xxxxxxxxxxx/
> >
> > The feature missing from that series, and why I didn't push it (as I had
> > ran out of time to work on it), was that kreprobes wants the full regs
> > stack as well. And since kretprobes was the main client of this work, that
> > I decided to work on this at another time. But like everything else, I got
> > distracted by other work, and didn't realize it has been almost 2 years
> > since looking at it :-p
> >
> > Anyway, IIRC, Masami wasn't sure that the full regs was ever needed for the
> > return (who cares about the registers on return, except for the return
> > value?)
>
> I think kretprobe and ftrace are for a bit different usage. kretprobe can be
> used for something like debugger. In that case, accessing full regs stack
> will be more preferrable. (BTW, what the not "full regs" means? Does that
> save partial registers?)
>
>
> Thank you,
>
> > But this code could easily save the parameters as well.
> >
> > >
> > > > this you need a shadow stack to save the real return as well as the
> > > > parameters of the function. This is something that I have patches that do
> > > > similar things with function graph.
> > > >
> > > > If you want this feature, lets work together and make this work for both
> > > > BPF and ftrace.
> > >
> > > Absolutely, ultimately for users it doesn't matter what specific
> > > mechanism is used under the cover. It just seemed like BPF trampoline
> > > has all the useful tracing features (entry IP and input arguments in
> > > fexit) already and is just mostly missing a quick batch attach API. If
> > > we can get the same from ftrace, all the better.
> >
> > Let me pull these patches out again, and see what we can do. Since then,
> > I've added the code that lets function tracer save parameters and the
> > stack, and function graph can use that as well.
> >
> >
> > -- Steve
>
>
> --
> Masami Hiramatsu <mhiramat@xxxxxxxxxx>