Re: bpf helpers freeze. Was: [PATCH v2 bpf-next 0/6] Dynptr convenience helpers

Maxim Mikityanskiy <maxtram95@xxxxxxxxx> · Thu, 12 Jan 2023 00:56:57 +0200

On Tue, Jan 03, 2023 at 03:51:07PM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 03, 2023 at 12:43:58PM +0100, Daniel Borkmann wrote:
> > Discoverability plus being able to know semantics from a user PoV to figure out when
> > workarounds for older/newer kernels are required to be able to support both kernels.
> 
> Sounds like your concern is that there could be a kfunc that changed it semantics,
> but kept exact same name and arguments? Yeah. That would be bad, but we should prevent
> such patches from landing. It's up to us to define sane and user friendly deprecation of kfuncs.

I would advocate for adding versioning to BPF API (be it helpers or
"stable" kfuncs). Right now we have two extremes: helpers that can't be
changed/fixed/deprecated ever, and kfuncs that can be changed at any
time, so the end users can't be sure new kernel won't break their stuff.
Detecting and fixing the breakage can also be tricky: end users have to
write different probes on a case-by-case basis, and sometimes it's not
just a matter of checking the number of function parameters or presence
of some definition (such difficulties happen when backporting drivers to
older kernels, so I assume it may be an issue for BPF programs as well).

Let's say we add a version number to the kernel, and the BPF program
also has an API version number it's compiled for. Whenever something
changes in the stable API on the kernel side, the version number is
increased. At the same time, compatibility on the kernel side is
preserved for some reasonable period of time (2 years, 5 years,
whatever), which means that if the kernel loads a BPF program with an
older version number, and that version is within the supported period of
time, the kernel will behave in the old way, i.e. verify the old
signature of a function, preserve the old behavior, etc.

This approach has the following upsides:

1. End users can stop worrying that some function changes unexpectedly,
and they can have a smoother migration plan.

2. Clear deprecation schedule.

3. Easy way to probe for needed functionality, it's just a matter of
comparing numbers: the BPF program loader checks that the kernel is new
enough, and the kernel checks that the BPF program's API is not too old.

4. Kernel maintainers will have a deprecation strategy.

Cons:

1. Arguably a maintainance burden to preserve compatibility on the
kernel side, but I would say it's a balance between helpers (which are
maintainance burden forever) and kfuncs (which can be changed in every
kernel version without keeping any compatibility). "Kfunc that changed
its semantics is bad, we should prevent such patches" are just words,
but if the developer needs to keep both versions for a while, it will
serve as a calm-down mechanism to prevent changes that aren't really
necessary. At the same time, the dead code will stop accumulating,
because it can be removed according to the schedule.

2. Having a single version number complicates backporting features to
older kernels, it would require backporting all previous features
chronologically, even if there is no direct dependency. Having multiple
version numbers (per feature) is cumbersome for the BPF program to
declare. However, this issue is not new, it's already the case for BPF
helpers (you can't backport new helpers skipping some other, because the
numbers in the list must match).

The above description intentionally doesn't specify whether it should be
applied to helpers or kfuncs, because it's a universal concept, about
which I would like to hear opinions about versioning without bias to
helpers or kfuncs.

Regarding freezing helpers, I think there should be a solution for
deprecating obsolete stuff. There are historical examples of removing
things from UAPI: removing i386 support, ipchains, devfs, IrDA
subsystem, even a few architectures [1]. If we apply the versioning
approach to helpers, we can make long-waiting incompatible changes in
v1, keeping the current set of helpers as v0, used for programs that
don't declare a version. Eventually (in 5 years, in 10 years, whatever
sounds reasonable) we can drop v0 and remove the support for unversioned
BPF programs altogether, similar to how other big things were removed
from the kernel. Does it sound feasible?

[1]: https://lwn.net/Articles/748074/

> "Proper BPF helper" model is broken.
> static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1;
> 
> is a hack that works only when compiler optimizes the code.

What if we replace codegen for helpers, so that it becomes something
like this?

static inline void *bpf_map_lookup_elem(void *map, const void *key)
{
	// pseudocode alert!
	asm("call 1" : : "r1"(map), "r2"(key));
}

I.e. can we just throw in some inline BPF assembly that prepares
registers and invokes a call instruction with the helper number? That
should be portable across clang and gcc, allowing to stop relying on
optimizations.

Any caveats?