Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH

Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> · Mon, 20 Nov 2023 17:05:40 +0900

On 2023/11/20 6:02, Song Liu wrote:
On Sun, Nov 19, 2023 at 12:03 AM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote:

[...]

Unfortunately no. The communication with the userspace can be done with
two different means:
- usual socket read/write
- vhost for direct interaction with a KVM guest

The BPF map may be a valid option for socket read/write, but it is not
for vhost. In-kernel vhost may fetch hash from the BPF map, but I guess
it's not a standard way to have an interaction between the kernel code
and a BPF program.

I am very new to areas like vhost and KVM. So I don't really follow.
Does this mean we have the guest kernel reading data from host eBPF
programs (loaded by Qemu)?

Yes, the guest will read hashes calculated by the host, and the 
interface is strictly defined with the virtio-net specification.

Unfortunately, however, it is not acceptable for the BPF subsystem
because the "stable" BPF is completely fixed these days. The
"unstable/kfunc" BPF is an alternative, but the eBPF program will be
shipped with a portable userspace program (QEMU)[1] so the lack of
interface stability is not tolerable.

bpf kfuncs are as stable as exported symbols. Is exported symbols
like stability enough for the use case? (I would assume yes.)

Another option is to hardcode the algorithm that was conventionally
implemented with eBPF steering program in the kernel[2]. It is possible
because the algorithm strictly follows the virtio-net specification[3].
However, there are proposals to add different algorithms to the
specification[4], and hardcoding the algorithm to the kernel will
require to add more UAPIs and code each time such a specification change
happens, which is not good for tuntap.

The requirement looks similar to hid-bpf. Could you explain why that
model is not enough? HID also requires some stability AFAICT.

I have little knowledge with hid-bpf, but I assume it is more like a
"safe" kernel module; in my understanding, it affects the system state
and is intended to be loaded with some kind of a system daemon. It is
fine to have the same lifecycle with the kernel for such a BPF program;
whenever the kernel is updated, the distributor can recompile the BPF
program with the new kernel headers and ship it along with the kernel
just as like a kernel module.

In contrast, our intended use case is more like a normal application.
So, for example, a user may download a container and run QEMU (including
the BPF program) installed in the container. As such, it is nice if the
ABI is stable across kernel releases, but it is not guaranteed for
kfuncs. Such a use case is already covered with the eBPF steering
program so I want to maintain it if possible.

TBH, I don't think stability should be a concern for kfuncs used by QEMU.
Many core BPF APIs are now implemented as kfuncs: bpf_dynptr_*,
bpf_rcu_*, etc. As long as there are valid use cases,these kfuncs will
be supported.

Documentation/bpf/kfuncs.rst still says:
> kfuncs provide a kernel <-> kernel API, and thus are not bound by any
> of the strict stability restrictions associated with kernel <-> user
> UAPIs.

Is it possible to change the statement like as follows:
"Most kfuncs provide a kernel <-> kernel API, and thus are not bound by 
any of the strict stability restrictions associated with kernel <-> user
UAPIs. kfuncs that have same stability restrictions associated with 
UAPIs are exceptional, and must be carefully reviewed by subsystem (and 
BPF?) maintainers as any other UAPIs are."

Regards,
Akihiko Odaki