Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH

Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> · Sun, 10 Dec 2023 16:03:05 +0900

On 2023/11/22 14:36, Akihiko Odaki wrote:
On 2023/11/22 14:25, Song Liu wrote:
On Mon, Nov 20, 2023 at 12:05 AM Akihiko Odaki 
<akihiko.odaki@xxxxxxxxxx> wrote:

On 2023/11/20 6:02, Song Liu wrote:
[...]
In contrast, our intended use case is more like a normal application.
So, for example, a user may download a container and run QEMU 
(including
the BPF program) installed in the container. As such, it is nice if 
the
ABI is stable across kernel releases, but it is not guaranteed for
kfuncs. Such a use case is already covered with the eBPF steering
program so I want to maintain it if possible.

TBH, I don't think stability should be a concern for kfuncs used by 
QEMU.
Many core BPF APIs are now implemented as kfuncs: bpf_dynptr_*,
bpf_rcu_*, etc. As long as there are valid use cases,these kfuncs will
be supported.

Documentation/bpf/kfuncs.rst still says:
  > kfuncs provide a kernel <-> kernel API, and thus are not bound by 
any
  > of the strict stability restrictions associated with kernel <-> user
  > UAPIs.

Is it possible to change the statement like as follows:
"Most kfuncs provide a kernel <-> kernel API, and thus are not bound by
any of the strict stability restrictions associated with kernel <-> user
UAPIs. kfuncs that have same stability restrictions associated with
UAPIs are exceptional, and must be carefully reviewed by subsystem (and
BPF?) maintainers as any other UAPIs are."

I am afraid this is against the intention to not guarantee UAPI-level 
stability
for kfuncs.

Is it possible to ensure that a QEMU binary with the eBPF program 
included works on different kernel versions without UAPI-level stability 
then? Otherwise, I think we need to think of the minimal UAPI addition 
that exposes the feature I propose, and the two options I presented 
first are the candidates of such: the stable BPF change or tuntap 
interface change.

Regards,
Akihiko Odaki

Now the discussion is stale again so let me summarize the discussion:

A tuntap device can have an eBPF steering program to let the userspace 
decide which tuntap queue should be used for each packet. QEMU uses this 
feature to implement the RSS algorithm for virtio-net emulation. Now, 
the virtio specification has a new feature to report hash values 
calculated with the RSS algorithm. The goal of this RFC is to report 
such hash values from the eBPF steering program to the userspace.

There are currently three ideas to implement the proposal:

1. Abandon eBPF steering program and implement RSS in the kernel.

It is possible to implement the RSS algorithm in the kernel as it's 
strictly defined in the specification. However, there are proposals for 
relevant virtio specification changes, and abandoning eBPF steering 
program will loose the ability to implement those changes in the 
userspace. There are concerns that this lead to more UAPI changes in the 
end.

2. Add BPF kfuncs.

Adding BPF kfuncs is *the* standard way to add BPF interfaces. hid-bpf 
is a good reference for this.

The problem with BPF kfuncs is that kfuncs are not considered as stable 
as UAPI. In my understanding, it is not problematic for things like 
hid-bpf because programs using those kfuncs affect the entire system 
state and expected to be centrally managed. Such BPF programs can be 
updated along with the kernel in a manner similar to kernel modules.

The use case of tuntap steering/hash reporting is somewhat different 
though; the eBPF program is more like a part of application (QEMU or 
potentially other VMM) and thus needs to be portable. For example, a 
user may expect a Debian container with QEMU installed to work on Fedora.

BPF kfuncs do still provide some level of stability, but there is no 
documentation that tell how stable they are. The worst case scenario I 
can imagine is that a future legitimate BPF change breaks QEMU, letting 
the "no regressions" rule force the change to be reverted. Some 
assurance that kind scenario will not happen is necessary in my opinion.

3. Add BPF program type derived from the conventional steering program type

In principle, it's just to add a feature to report four more bytes to 
the conventional steering program. However, BPF program types are frozen 
for feature additions and the proposed change will break the feature freeze.

So what's next? I'm inclined to option 3 due to its minimal ABI/API 
change, but I'm also fine with option 2 if it is possible to guarantee 
the ABI/API stability necessary to run pre-built QEMUs on future kernel 
versions by e.g., explicitly stating the stability of kfuncs. If no 
objection arises, I'll resend this series with the RFC prefix dropped 
for upstream inclusion. If it's decided to go for option 1 or 2, I'll 
post a new version of the series implementing the idea.

Regards,
Akihiko Odaki