Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH

Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> · Mon, 11 Dec 2023 14:04:27 +0900

On 2023/12/11 10:40, Song Liu wrote:
On Sat, Dec 9, 2023 at 11:03 PM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote:

On 2023/11/22 14:36, Akihiko Odaki wrote:
On 2023/11/22 14:25, Song Liu wrote:
[...]

Now the discussion is stale again so let me summarize the discussion:

A tuntap device can have an eBPF steering program to let the userspace
decide which tuntap queue should be used for each packet. QEMU uses this
feature to implement the RSS algorithm for virtio-net emulation. Now,
the virtio specification has a new feature to report hash values
calculated with the RSS algorithm. The goal of this RFC is to report
such hash values from the eBPF steering program to the userspace.

There are currently three ideas to implement the proposal:

1. Abandon eBPF steering program and implement RSS in the kernel.

It is possible to implement the RSS algorithm in the kernel as it's
strictly defined in the specification. However, there are proposals for
relevant virtio specification changes, and abandoning eBPF steering
program will loose the ability to implement those changes in the
userspace. There are concerns that this lead to more UAPI changes in the
end.

2. Add BPF kfuncs.

Adding BPF kfuncs is *the* standard way to add BPF interfaces. hid-bpf
is a good reference for this.

The problem with BPF kfuncs is that kfuncs are not considered as stable
as UAPI. In my understanding, it is not problematic for things like
hid-bpf because programs using those kfuncs affect the entire system
state and expected to be centrally managed. Such BPF programs can be
updated along with the kernel in a manner similar to kernel modules.

The use case of tuntap steering/hash reporting is somewhat different
though; the eBPF program is more like a part of application (QEMU or
potentially other VMM) and thus needs to be portable. For example, a
user may expect a Debian container with QEMU installed to work on Fedora.

BPF kfuncs do still provide some level of stability, but there is no
documentation that tell how stable they are. The worst case scenario I
can imagine is that a future legitimate BPF change breaks QEMU, letting
the "no regressions" rule force the change to be reverted. Some
assurance that kind scenario will not happen is necessary in my opinion.

I don't think we can provide stability guarantees before seeing something
being used in the field. How do we know it will be useful forever? If a
couple years later, there is only one person using it somewhere in the
world, why should we keep supporting it? If there are millions of virtual
machines using it, why would you worry about it being removed?

I have a different opinion about providing stability guarantees; I 
believe it is safe to provide such a guarantee without actual use in a 
field. We develop features expecting there are real uses, and if it 
turns out otherwise, we can break the stated guarantee since there is no 
real use cases anyway. It is fine even breaking UAPIs in such a case, 
which is stated in Documentation/admin-guide/reporting-regressions.rst.

So I rather feel easy about guaranteeing UAPI stability; we can just 
guarantee the UAPI-level stability for a particular kfunc and use it 
from QEMU expecting the stability. If the feature is found not useful, 
QEMU and the kernel can just remove it.

I'm more concerned about the other case, which means that there will be 
wide uses of this feature. A kernel developer may assume the stability 
of the interface is like one of kernel internal APIs 
(Documentation/bpf/kfuncs.rst says kfuncs are like EXPORT_SYMBOL_GPL) 
and decide to change it, breaking old QEMU binaries and that's something 
I would like to avoid.

Regarding the breakage scenario, I think we can avoid the kfuncs removal 
just by saying "we won't remove them". I'm more worried the case that a 
change in the BPF kfunc infrastucture requires to recompile the binary.

So, in short, I don't think we can say "kfuncs are like 
EXPORT_SYMBOL_GPL" and "you can freely use kfuncs in a normal userspace 
application like QEMU" at the same time.

3. Add BPF program type derived from the conventional steering program type

In principle, it's just to add a feature to report four more bytes to
the conventional steering program. However, BPF program types are frozen
for feature additions and the proposed change will break the feature freeze.

So what's next? I'm inclined to option 3 due to its minimal ABI/API
change, but I'm also fine with option 2 if it is possible to guarantee
the ABI/API stability necessary to run pre-built QEMUs on future kernel
versions by e.g., explicitly stating the stability of kfuncs. If no
objection arises, I'll resend this series with the RFC prefix dropped
for upstream inclusion. If it's decided to go for option 1 or 2, I'll
post a new version of the series implementing the idea.

Probably a dumb question, but does this RFC fall into option 3? If
that's the case, I seriously don't think it's gonna happen.

Yes, it's option 3.

I would recommend you give option 2 a try and share the code. This is
probably the best way to move the discussion forward.

I'd like to add a documentation change to say the added kfuncs are 
exceptional cases that are not like EXPORT_SYMBOL_GPL in that case. Will 
it work?

Regards,
Akihiko Odaki