Re: [PATCH v3 bpf-next 7/9] libbpf: implement __arg_ctx fallback logic

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 9 Jan 2024 17:58:47 -0800

On Tue, Jan 9, 2024 at 9:17 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Mon, Jan 8, 2024 at 5:49 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Mon, Jan 8, 2024 at 3:45 PM Andrii Nakryiko
> > <andrii.nakryiko@xxxxxxxxx> wrote:
> > >
> > > >
> > > > Not artificially, but only when pt_regs in bpf prog doesn't match
> > > > what kernel is passing.
> > > > I think allowing only:
> > > >   handle_event_user(void *ctx __arg_ctx)
> > > > and prog will cast it to pt_regs immediately is less surprising
> > > > and proper C code,
> > > > but
> > > >   handle_event_user(struct pt_regs *ctx __arg_ctx)
> > > > is also ok when pt_regs is indeed what is being passed.
> > > > Which will be the case for x86.
> > > > And will be fine on arm64 too, because
> > > > arch/arm64/include/asm/ptrace.h
> > > > struct pt_regs {
> > > >         union {
> > > >                 struct user_pt_regs user_regs;
> > > >
> > > > but if arm64 ever changes that layout we should start failing to load.
> > >
> > > Ok, I'm glad you agreed to allow `struct pt_regs *`. I also will say
> > > that (as it stands right now) passing `struct pt_regs *` is valid on
> > > all architectures, because that's what kernel passes around internally
> > > as context for uprobe, kprobe, and kprobe-multi. See uprobe_prog_run,
> > > kprobe_multi_link_prog_run, and perf_trace_run_bpf_submit, we always
> > > pass real `struct pt_regs *`.
> >
> > Right, but for perf event progs it's actually bpf_user_pt_regs_t:
> > ctx.regs = perf_arch_bpf_user_pt_regs(regs);
> > bpf_prog_run(prog, &ctx);
> > yet all such progs are written assuming struct pt_regs
> > which is not correct.
>
> Yes, SEC("perf_event") programs have very different context
> (bpf_perf_event_data, where *pointer to pt_regs* is the first field,
> so it's not compatible even memory layout-wise). So I'm not going to
> allow struct pt_regs there.

Not quite. I don't think we're on the same page.

Technically SEC("perf_event") bpf_prog should see a pointer to:
struct bpf_perf_event_data {
        bpf_user_pt_regs_t regs;
        __u64 sample_period;
        __u64 addr;
};

but a lot of them are written as:
SEC("perf_event")
int handle_pe(struct pt_regs *ctx)

and it's working, because of magic (or ugliness, it depends on pov) that
we do in pe_prog_convert_ctx_access() (that inserts extra load).

The part where I'm saying:
"written assuming struct pt_regs which is not correct".

The incorrect part is that the prog have access only to bpf_user_pt_regs_t
and the prog should be written as:
SEC("perf_event")
int pe_prog(bpf_user_pt_regs_t *ctx)
or as
SEC("perf_event")
int pe_prog(struct bpf_perf_event_data *ctx)

but in generic case not as:
SEC("perf_event")
int pe_prog(struct pt_regs *ctx)

because that is valid only on archs where bpf_user_pt_regs_t == pt_regs.

> > It's a bit of a mess while strict type checking should make it better.
> >
> > BPF is a strictly typed assembly language and the verifier
> > should not be violating its own promises of type checking when
> > it sees arg_ctx.
> >
> > The reason I was proposing to restrict both kernel and libbpf
> > to 'void *ctx __arg_ctx' is because it's trivial to implement
> > in both.
> > To allow 'struct correct_type *ctx __arg_ctx' generically is much more
> > work.
>
> Yes, it's definitely more complicated (but kernel has full BTF info,
> so maybe not too bad, I need to try). I'll give it a try, if it's too
> bad, we can discuss a fallback plan.

right. the kernel has btf and everything, but as seen in perf_event example
above it's not trivial to do the type match... even in the kernel.
Matching the accurate type in libbpf is imo too much complexity.

> But we should try at least,
> forcing users to do unnecessary void * casts to u64[] or tracepoint
> struct is suboptimal from usability POV.

That part of usability concerns I still don't understand.
Where do you see "suboptimal usability" in the code:

SEC("perf_event")
int pe_prog(void *ctx __arg_ctx)
{
  struct pt_regs *regs = ctx;

?
It's a pretty clear interface and the program author knows exactly
what it's doing. It's an explicit cast because the user wrote it.
Clear, unambiguous and no surprises.

Also we shouldn't forget interaction with CO-RE (preserve_access_index)
and upcoming preserve_context_offset.

When people do #include <vmlinux.h> they get unnecessary CO-RE-ed
'struct pt_regs *' and hidden type conversion by libbpf/kernel
is another level of debug complexity.

libbpf doesn't even print what it did in bpf_program_fixup_func_info()
no matter the verbosity flag.

> >
> > > So, I'll add kprobe/multi-kprobe special handling to allows `struct
> > > pt_regs *` then, ok?
> >
> > If you mean to allow 'void *ctx __arg_ctx' in kernel and libbpf and
> > in addition allow 'struct pt_reg *ctx __arg_ctx' for kprobe and other
> > prog types where that's what is being passed then yes.
> > Totally fine with me.
> > These two are easy to enforce in kernel and libbpf.
>
> Ok, great.
>
> >
> > > Yes, of course, sk_buff instead of pt_regs is definitely broken. But
> > > that will be detected even by the compiler.
> >
> > Right. C can do casts, but in bpf asm the verifier promises strict type
> > checking and it goes further and makes safety decisions based on types.
> >
>
> It feels like you are thinking about PTR_TO_BTF_ID only and
> extrapolating that behavior to everything else. You know that it's not
> like that in general.

imo trusted ptr_to_btf_id and ptr_to_ctx are equivalent.
They're fully trusted from the verifier pov and fully correct
from bpf prog pov. So neither should have hidden type casts.

Of course, I remember bpf_rdonly_cast. But this one is untrusted.
It's equivalent to C type cast.

> > > Anyways, I can add special casing for pt_regs and start enforcing
> > > types. A bit hesitant to do that on libbpf side, still, due to that
> > > eager global func behavior, which deviates from kernel, but if you
> > > insist I'll do it.
> >
> > I don't understand this part.
> > Both kernel and libbpf can check
> > if (btf_type_id(ctx) == 'struct pt_regs'
> >   && prog_type == kprobe) allow such __arg_ctx.
> >
>
> I was thinking about the case where we have __arg_ctx and type doesn't
> match expectation. Should libbpf error out? Or emit a warning and
> proceed without adjusting types? If we do the warning and let verifier
> reject invalid program, I think it will be better and then these
> concerns of mine about lazy vs eager global subprog verification
> behavior are irrelevant.

I think warn in libbpf is fine.
Old kernel will likely fail verification since types won't match
and libbpf warn will serve its purpose,
but for the new kernel both libbpf and kernel will print
two similar looking warns?

>
> > >
> > > (Eduard, I'll add feature detection for the need to rewrite BTF at the
> > > same time, just FYI)
> > >
> > > Keep in mind, though, for non-global subprogs kernel doesn't enforce
> > > any types, so you can really pass sk_buff into pt_regs argument, if
> > > you really want to, but kernel will happily still assume PTR_TO_CTX
> > > (and I'm sure you know this as well, so this is mostly for others and
> > > for completeness).
> >
> > static functions are very different. They're typeless and will stay
> > typeless for some time.
>
> Right. And they are different in how we verify and so on. But from the
> user's perspective they are still just functions and (in my mind at
> least), the less difference between static and global subprogs there
> are, the better.
>
> But anyways, I'll do what we agreed above. I'll proceed with libbpf
> emitting warning and not doing anything for __arg_ctx, unless you feel
> strongly libbpf should error out.

warn is fine. thanks.