Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union

Florent Revest <revest@xxxxxxxxxxxx> · Thu, 3 Aug 2023 18:37:54 +0200

On Thu, Aug 3, 2023 at 5:42 PM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> On Wed, 2 Aug 2023 16:44:09 +0200
> Florent Revest <revest@xxxxxxxxxxxx> wrote:
>
> > On Wed, Aug 2, 2023 at 1:09 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > >
> > > On Tue, 1 Aug 2023 15:18:56 -0700
> > > Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > > On Tue, Aug 1, 2023 at 8:32 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, 1 Aug 2023 11:20:36 -0400
> > > > > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > > > >
> > > > > > The solution was to come up with ftrace_regs, which just means it has all
> > > > > > the registers to extract the arguments of a function and nothing more. Most
> > > > >
> > > > > This isn't 100% true. The ftrace_regs may hold a fully filled pt_regs. As
> > > > > the FTRACE_WITH_REGS callbacks still get passed a ftrace_regs pointer. They
> > > > > will do:
> > > > >
> > > > >         void callback(..., struct ftrace_regs *fregs) {
> > > > >                 struct pt_regs *regs = ftrace_get_regs(fregs);
> > > > >
> > > > >
> > > > > Where ftrace_get_regs() will return the pt_regs only if it is fully filled.
> > > > > If it is not, then it returns NULL. This was what the x86 maintainers
> > > > > agreed with.
> > > >
> > > > arch/arm64/include/asm/ftrace.h:#define arch_ftrace_get_regs(regs) NULL
> > > >
> > > > Ouch. That's very bad.
> > > > We care a lot about bpf running well on arm64.
> > >
> > > [ Adding Mark and Florent ]
> >
> > Ah, thanks Steve! That's my favorite can of worms :) I actually
> > consider sending a talk proposal to the tracing MC at LPC "pt_regs -
> > the good the bad and the ugly" on this very topic because I care about
> > unblocking BPF "multi_kprobe" (which really is fprobe) on arm64, maybe
> > it would be interesting.
>
> Ah, it is almost same as my talk :)

Oh, I didn't know! I submitted a proposal today but if the talks have
a lot of overlap maybe it's best that only you give your talk, since
you're the actual maintainer :) or we could co-present if there's
something I could add but I think you have all the background anyway

> > I pointed this out in
> > https://lore.kernel.org/all/CABRcYm+esb8J2O1v6=C+h+HSa5NxraPUgo63w7-iZj0CXbpusg@xxxxxxxxxxxxxx/#t
> > when Masami proposed adding calls from fprobe to perf. If every
> > subsystem makes different assumptions about "how sparse" their pt_regs
> > is and they call into one another, this could lead to... interesting
> > bugs. (eg: currently, we don't populate a fake pstate in ftrace_regs.
> > so we'd need to fake it when creating a sparse pt_regs _for Perf_,
> > knowing that Perf specifically expects this reg to be set. this would
> > require a struct copy anyway and some knowledge about how the data
> > will be consumed, in an arch- and subsystem- specific way)
>
> yeah, sorry I missed that point. I should remove it until we can fix it.

Uh, I shouldn't have buried my important comments so far down the
email :/ I wasn't sure whether you had missed the paragraph.

> > On the other hand, untangling all code paths that come from
> > trampolines (with a light regs structure) from those that come from an
> > exception (with a pt_regs) could lead to a lot of duplicated code, and
> > converting between each subsystem's idea of a light regs structure
> > (what if perf introduces a perf_regs now ?) would be tedious and slow
> > (lots of copies ?).
>
> This is one discussion point I think. Actually, using pt_regs in kretprobe
> (and rethook) is histrical accident. Originally, it had put a kprobe on
> the function return trampoline to hook it. So keep the API compatiblity
> I made the hand assembled code to save the pt_regs on the stack.
>
> My another question is if we have the fprobe to trace (hook) the function
> return, why we still need the kretprobe itself. I think we can remove
> kretprobe and use fprobe exit handler, because "function" probing will
> be done by fprobe, not kprobe. And then, we can simplify the kprobe
> interface and clarify what it is -- "kprobe is a wrapper of software
> breakpoint". And we don't need to think about duplicated code anymore :)

That sounds reasonable to me

> As I said, I would like to phase out the kretprobe itself because it
> provides the same feature of fprobe, which is confusing. jprobe was
> removed a while ago, and now kretprobe is. But we can not phase out
> it at once. So I think we will keep current kretprobe trampoline on
> arm64 and just add new ftrace_regs based rethook. Then remove the
> API next release. (after all users including systemtap is moved)

Heads up to BPF folks though since they also have BPF "kretprobe"
program types which would break in a similar fashion as multi_kprobe
(even though BPF kretprobe programs have also been discouraged for a
while in favor of BPF fexit programs)

> > > The reason I started the FTRACE_WITH_ARGS (which gave us ftrace_regs) in
> > > the first place, was because of the overhead you reported to me with
> > > ftrace_regs_caller and why you wanted to go the direct trampoline approach.
> > > That's when I realized I could use a subset because those registers were
> > > already being saved. The only reason FTRACE_WITH_REGS was created was it
> > > had to supply full pt_regs (including flags) and emulate a breakpoint for
> > > the kprobes interface. But in reality, nothing really needs all that.
> > >
> > > > It's not about access to args.
> > > > pt_regs is passed from bpf prog further into all kinds of perf event
> > > > functions including stack walking.
> >
> > If all accesses are done in BPF bytecode, we could (theoretically)
> > have the verifier and JIT work together to deny accesses to
> > unpopulated fields, or relocate pt_regs accesses to ftrace_regs
> > accesses to keep backward compatibility with existing multi_kprobe BPF
> > programs.
>
> Yeah, that is what I would like to suggest, and what my patch does.
> (let me update rethook too, it'll be a bit tricky since I don't want
> break anything)

I agree with Alexei that this is an unnecessary amount of complexity
in the verifier just to avoid a struct copy though. It's good to know
that we _could_ do it if we really need to someday but then again, if
a user chooses an interface that gets a pt_regs they shouldn't expect
high performance. Therefore, I think it's ok for BPF multi_kprobe to
copy fields from a ftrace_regs to a pt_regs on stack, especially if it
avoids so much additional complexity in the verifier.