Re: bpf helpers freeze. Was: [PATCH v2 bpf-next 0/6] Dynptr convenience helpers

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 9 Jan 2023 09:46:28 -0800

On Thu, Jan 5, 2023 at 6:54 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Jan 05, 2023 at 01:01:56PM -0800, Andrii Nakryiko wrote:
> > Didn't find the best place to put this, so it will be here. I think it
> > would be beneficial to discuss BPF helpers freeze in BPF office hours.
> > So I took the liberty to put it up for next BPF office hours, 9am, Jan
> > 12th 2022. I hope that some more people that have exposure to
> > real-world BPF application and pains associated with all that could
> > join the discussion, but obviously anyone is welcome as well, no
> > matter which way they are leaning.
> >
> > Please consider joining, see details on Zoom meeting at [0]
> >
> > For the rest, please see below. I'll be out for a few days and won't
> > be able to reply, my apologies.
> >
> >   [0] https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit#gid=0
>
> Thanks for adding it to the agenda.
> Hopefully we'll be able to converge faster on a call.

Yep, hopefully. Looking forward to BPF office hours this week.

>
> There are several things to discuss:
> 1. whether or not to freeze helpers.
> 2. whether dynptr accessors should be helpers or kfuncs.
> 3. whether your future inline iterators should be helpers or kfuncs.
> 4. whether cilium's bpf_sock_destroy should be helper or kfunc.
>
> If we hard freeze helpers in 1 it automatically decides the fate for 2, 3, 4.
> We can soft freeze the helpers then 2,3,4 are up for discussion.
> Looks like the thread so far was primarily about 1.

The thread started as 2 and got expanded to 1, but I agree that 2, 3,
and 4 are all separate topics (just predicated on 1 being decided in
favor of not freezing helpers).

> 4 was touched separately. Daniel hasn't replied yet to my suggestion for it to be kfunc.
> You insist that 2 and 3 must be helpers.
> No one seen the patches for 3. I've seen you whiteboard them. It's impossible
> for others to participate without patches, so let's postpone that.

Sure, as I intended to do in [0], except if BPF helpers are
hard-frozen, there would be no discussion to have. But hopefully it's
clear that my example with iterators was about stability and
generality of certain concepts (looping) and how libbpf has stable API
expectations and responsibilities as well.

  [0] https://lore.kernel.org/bpf/CAEf4BzbVoiVSa1_49CMNu-q5NnOvmaaHsOWxed-nZo9rioooWg@xxxxxxxxxxxxxx/

>
> Let's try to focus this thread on 2 assuming both helpers and kfuncs
> are on the table for dynptrs...
>
> > conclusions. I think it's even possible to deprecate BPF helpers, if
> > we really want to. In the end, technically, the only UAPI part about
> > BPF helper is it's ID. That should stay fixed. We do change over time
> > which helpers are available in which program types. Yes, it would be
> > really bad to change helper signature and I'd be very much against
> > this, but from my perspective (and I'm sure others will disagree),
> > it's in the realm of possibility to do gradual deprecation of some
> > helpers. We'll leave BPF_FUNC_xxx enumerator intact, of course, but
> > add a simple wrapper that will just -ENOTSUP.
>
> Unfortunately you're completely wrong in the above paragraph.
> I suggest to read this Linus's rant first:
> https://lkml.org/lkml/2012/12/23/75
>
> Everything that user space sees we cannot change.
> We can try to, but it will be reverted if users complain.

I very well might be and it was my opinion (which I explicitly
acknowledged as certainly being controversial).

This is a completely separate discussion, but on one hand we say it's
fine to remove or change kfuncs, because kfuncs are only visible to
BPF programs, which are kernel-to-kernel programs and user-space rules
do not apply. On the other hand, BPF helpers are also only visible to
BPF programs, the only user-space visible part is enum name and ID.
Yet they are treated very differently.

It's fine, but to me it's more of an issue of a user contract, rather
than some technicality about being defined in some header. It feels
like we should be able to define a contract that some range of IDs
will be "unstable" in the sense that they might start eventually
returning -ENOTSUP if we have reasonable confidence they are not
useful anymore.

But it's just my opinion, and no amount of shouting at me will change that fact.

And as I said before, I don't think BPF helpers are a big maintenance
liability in the first place.

> That's why we never try unless there is a very strong reason like security issue.
>
> For example your last commit to uapi/bpf.h
> commit 8a76145a2ec2 ("bpf: explicitly define BPF_FUNC_xxx integer values")
> is a leap of faith.
> Though we tried to make it as transparent as possible and
> I googled BPF_FUNC_MAPPER before applying the patch to see in what weird ways
> people can use the macro, there is still a non zero chance that
> we would have to revert it if users complain loud enough.
>
> For example cilium has this bit of code:
> https://github.com/cilium/ebpf/blob/master/asm/func.go
> I suspect it's broken now, because you've changed 'FN' macro in that commit.
> Cilium folks are unlikely to complain and demand a revert, so we should be safe
> in this regard, but we cannot assume that for other users.

Sure, all above is true and we discussed all that when reviewing that
patch. And I liked that we could weigh pros and cons in that
particular case, and hopefully can keep doing that.

>
> It should be obvious that we cannot deprecate helpers with ENOTSUP
> or deprecate them in any other way.

I'm fine with that.

>
> > E.g., Linus requested bpf_probe_read() to not exist and not be used,
> > everyone agreed. Good opportunity?
>
> It's an exception that proves the rule.
> 1. it's a security issue that's why uapi breakage was on the table.
> 2. it wasn't completely removed. See:
>
> #ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
>         case BPF_FUNC_probe_read:
>                 return security_locked_down(LOCKDOWN_BPF_READ_KERNEL) < 0 ?
>                        NULL : &bpf_probe_read_compat_proto;

Sure, not disputing this. I do think that it's just another example
emphasizing that the world is not black and white and there *has* to
be nuance in every decision.

>
> > The point is that UAPI stability is not the end of the world and
> > paranoia is bad. We shouldn't get paralyzed because we add APIs. We do
> > that to libbpf and APIs will stay stable within entire 1.x version.
> > Yes, we don't have such a nice "luxury" with kernel, but see above.
>
> Exactly. See above. There is no way at all to deprecate helpers.

OK.

>
> > >
> > > - On the one hand you're arguing that in some cases, _no_ API
> > >   instability is acceptable. That in general, the main kernel <-> kernel
> > >   BPF program API boundary is equivalent to UAPI, and that it's _never_
> > >   acceptable for us to ever, _ever_ deprecate certain APIs because
> >
> > you are being hyperbolic and overdramatic again for no good reason,
> > "ever, _ever_" -- really? There is no such thing.
>
> Andrii, it's really _ever_. You need to internalize that first
> before we discuss this topic again during office hours.

I'll try to. My point (somewhat subtle, perhaps) was that humans are
very bad about planning 5-10-20-50 years ahead. So any "ever" is
overdramatized and hyperbolic. There might be no BPF, Linux, or
computers in current form in 50 years. I refuse to stress about not
being able to remove BPF helpers in 50 years, sorry.

>
> > I'd probably still go for it. But building some tool like perf or
> > retsnoop -- I'd think twice if I want to take dependency on BPF map
> > (or dynptr for that matter), if it potentially limits the
> > applicability of my application.
>
> A quote from retsnoop readme:
> "
> NOTE: Retsnoop relies on BPF CO-RE technology, so please make sure your Linux
> kernel is built with CONFIG_DEBUG_INFO_BTF=y kernel config. Without this
> retsnoop will refuse to start.
> "
> and in calib_feat.bpf.c
> /* Detect if bpf_get_func_ip() helper is supported by the kernel.
> /* Detect if fentry/fexit re-entry protection is implemented.
> /* Detect if fexit is safe to use for long-running and sleepable
> /* Detect if bpf_get_branch_snapshot() helper is supported.
> /* Detect if BPF_MAP_TYPE_RINGBUF map is supported.
> /* Detect if BPF cookie is supported for kprobes.
> /* Detect if multi-attach kprobes are supported.
>
> If the feature is useful you will use it. In retsnoop and everywhere else.
> Regardless whether it's arch dependent, kernel dependent or unstable.

But I'm just a hostage of these BPF quirks and I very much would like
not to be (or at the very least minimize them)! Do you think I'm happy
that retsnoop won't work on so many different kernel configs and
arches, even though retsnoop would be very useful there? I'm happy I
don't make money off of retsnoop, so I can afford to just say "sorry,
retsnoop won't work in your particular situation, too bad". But if I
had a company and some product that relied on BPF, any such hurdle
would be painful and result in extra support, maintenance, developer
work, lost opportunity, hurdles in adoption, just headaches.

"If the feature is useful you will use it" is missing the nuance
again. Almost every feature can be worked around. And if some feature
adds too many unnecessary complexities and/or dependencies, I might
choose to just work around it. Or use some older feature that's less
convenient, less performant, maybe more fragile, but works.

E.g., instead of using bpf_ringbuf_reserve_dynptr() to minimize amount
of data sent over ringbuf, I'll choose to do bigger fixed-sized chunk,
lose efficiency, but not reduce a variety of kernels and systems that
my app will work on. But in some other situation this extra efficiency
might be the difference between product viability and death, so yeah,
I'll take that hit and do the extra work.

But again, as a BPF user I will feel as a hostage, knowing that it
didn't *have* to be this way.

That's why I'm fighting so passionately *to not add unnecessary
dependencies and complications*.

>
> > I disagree about ripping the bandaid and precluding dynptr framework
> > to be whole before we solve various problems I pointed out in [1]
> > (which unfortunately was mostly ignored, it seems).
>
> Let's look at your
> https://lore.kernel.org/all/CAEf4BzZM0+j6DXMgu2o2UvjtzoOxcjsJtT8j-jqVZYvAqxc52g@xxxxxxxxxxxxxx/
> "
> 1. Generic accessors to check validity of *any* dynptr, and it's
> inherent properties like offset, available size, read-only property
> (just as useful somethings as bpf_ringbuf_query() is for ringbufs,
> both for debugging and for various heuristics in production).
>
> bpf_dynptr_is_null(struct bpf_dynptr *ptr)
> long bpf_dynptr_get_size(struct bpf_dynptr *ptr)
> long bpf_dynptr_get_offset(struct bpf_dynptr *ptr)
> bpf_dynptr_is_rdonly(struct bpf_dynptr *ptr)
>
> There is nothing to add or remove here. No flags, no change in semantics.
> "
>
> You're arguing that it's obviously stable material.
> Like:
> +BPF_CALL_1(bpf_dynptr_get_offset, struct bpf_dynptr_kern *, ptr)
> +{
> +    if (!ptr->data)
> +         return -EINVAL;
> +
> +    return ptr->offset;
> +}
>
> but we can do it now in native bpf code:
>
> static inline int bpf_dynptr_get_offset(const struct bpf_dynptr *uptr)
> {
>      struct bpf_dynptr_kern *ptr = bpf_rdonly_cast(uptr, bpf_core_type_id_kernel(struct bpf_dynptr_kern));
>
>      if (!ptr->data)
>           return -EINVAL;
>
>      return ptr->offset;
> }
>
> No kernel changes necessary. No UAPI helpers. No kfuncs.
> CO-RE will take care of kernel version differences.
>
> Do you still insist that it should be a stable uapi helper ?

Yes!

bpf_rdonly_cast() is kfunc, with all the consequences. And we are not
just exposing internal implementation details of dynptr, we *expect*
users to know, care, and follow them. Neither is great.

These simple helpers I can implement with BPF_CORE_READ() even,
without kfunc dependency, as I already explained before. And it will
even work on kernels with no CO-RE support, thanks to BTFgen.

But I do not consider that a good approach and good API, sorry.
Certainly doesn't make me feel like dynptr is a core first-class
concept in BPF.

And I actually have no such solution for
bpf_dynptr_clone()/bpf_dynptr_advance()/bpf_dynptr_trim(), which is
absolutely critical to make dynptr a standard interface for passing
variable-sized chunks of memory to other helpers and kfuncs.

>
> > And for the "for loop iterator", I absolutely do not want to have a
> > useful generic abstraction for repeatable loop, that will have few
> > asterisks associated with them, dictating which arches and what kernel
> > config values (beyond basic BPF ones) should be ensured to make
> > iteration work. Kills any motivation to finish it.
>
> I'm really sad that you went down this ultimatum path.

This wasn't my intent and that's not what I'm doing here. I'm
explaining my motivation and how I feel about core concepts being part
of stable BPF API offerings. And how the inflexible BPF freeze
approach will hurt adoption. And yes, I'm afraid it might hurt even
the addition of new features if people feel that their work can't be
used universally because of arbitrary policies.

Human factor is real. Don't be sad, but try to see the argument behind
all the words and examples.

> Essentially you're saying: "loop iterator has to be stable helper or
> I quit working on it."
> Say we cave in and accepted your demand. Later you do another ultimatum

I hope you can "cave in" based on technical arguments and feedback
from users of BPF technology, which have to deal with real-world
aspects of all the BPF machinery. And then already have enough to care
about, no need to make their life harder.

I'm saying the loop iterator has to be a stable helper to be
universally used and universally recommended as *the solution for
doing repeatable work*. Without thinking about BTF, kfuncs,
arch-specific stuff. Because there is no reason why a loop iterator
would require any of that.

> and we cannot cave in for whatever reason. You stay true to your words
> and quit BPF development. Now we're stuck with your uapi that we cannot
> change, cannot improve, but still have to maintain it _forever_
> without you because you quit. That would suck.

This was always a risk for many years, that didn't stop BPF from
gaining lots of useful functionality, even if we'd retrospectively
would like to do some things differently.

> Let's get back to discussing technical merits without ultimatums. Ok?

That's what I've been (and still am) doing all this time.