On Sun, 29 Sep 2024 16:10:47 +0900 Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > On 2024/09/29 11:07, Jason Wang wrote: > > On Fri, Sep 27, 2024 at 3:51 PM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > >> > >> On 2024/09/27 13:31, Jason Wang wrote: > >>> On Fri, Sep 27, 2024 at 10:11 AM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > >>>> > >>>> On 2024/09/25 12:30, Jason Wang wrote: > >>>>> On Tue, Sep 24, 2024 at 5:01 PM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > >>>>>> > >>>>>> virtio-net have two usage of hashes: one is RSS and another is hash > >>>>>> reporting. Conventionally the hash calculation was done by the VMM. > >>>>>> However, computing the hash after the queue was chosen defeats the > >>>>>> purpose of RSS. > >>>>>> > >>>>>> Another approach is to use eBPF steering program. This approach has > >>>>>> another downside: it cannot report the calculated hash due to the > >>>>>> restrictive nature of eBPF. > >>>>>> > >>>>>> Introduce the code to compute hashes to the kernel in order to overcome > >>>>>> thse challenges. > >>>>>> > >>>>>> An alternative solution is to extend the eBPF steering program so that it > >>>>>> will be able to report to the userspace, but it is based on context > >>>>>> rewrites, which is in feature freeze. We can adopt kfuncs, but they will > >>>>>> not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM > >>>>>> and vhost_net). > >>>>>> > >>>>> > >>>>> I wonder if we could clone the skb and reuse some to store the hash, > >>>>> then the steering eBPF program can access these fields without > >>>>> introducing full RSS in the kernel? > >>>> > >>>> I don't get how cloning the skb can solve the issue. > >>>> > >>>> We can certainly implement Toeplitz function in the kernel or even with > >>>> tc-bpf to store a hash value that can be used for eBPF steering program > >>>> and virtio hash reporting. However we don't have a means of storing a > >>>> hash type, which is specific to virtio hash reporting and lacks a > >>>> corresponding skb field. > >>> > >>> I may miss something but looking at sk_filter_is_valid_access(). It > >>> looks to me we can make use of skb->cb[0..4]? > >> > >> I didn't opt to using cb. Below is the rationale: > >> > >> cb is for tail call so it means we reuse the field for a different > >> purpose. The context rewrite allows adding a field without increasing > >> the size of the underlying storage (the real sk_buff) so we should add a > >> new field instead of reusing an existing field to avoid confusion. > >> > >> We are however no longer allowed to add a new field. In my > >> understanding, this is because it is an UAPI, and eBPF maintainers found > >> it is difficult to maintain its stability. > >> > >> Reusing cb for hash reporting is a workaround to avoid having a new > >> field, but it does not solve the underlying problem (i.e., keeping eBPF > >> as stable as UAPI is unreasonably hard). In my opinion, adding an ioctl > >> is a reasonable option to keep the API as stable as other virtualization > >> UAPIs while respecting the underlying intention of the context rewrite > >> feature freeze. > > > > Fair enough. > > > > Btw, I remember DPDK implements tuntap RSS via eBPF as well (probably > > via cls or other). It might worth to see if anything we miss here. > > Thanks for the information. I wonder why they used cls instead of > steering program. Perhaps it may be due to compatibility with macvtap > and ipvtap, which don't steering program. > > Their RSS implementation looks cleaner so I will improve my RSS > implementation accordingly. > DPDK needs to support flow rules. The specific case is where packets are classified by a flow, then RSS is done across a subset of the queues. The support for flow in TUN driver is more academic than useful, I fixed it for current BPF, but doubt anyone is using it really. A full steering program would be good, but would require much more complexity to take a general set of flow rules then communicate that to the steering program.