Re: [xdp-cloud] Re: Questions about Offloads and XDP-Hints regarding a Cloud-Provider Use-Case

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 28.09.22 um 20:07 schrieb Jesper Dangaard Brouer:

On 28/09/2022 15.54, Marcus Wichelmann wrote:

I'm working for a cloud hosting provider and we're working on a new XDP-based networking stack for our VM-Hosts that uses XDP to accelerate the connectivity of our qemu/KVM VMs to the outside.


Welcome to the community!

Thank you!

Sounds like an excellent use-case and
opportunity for speeding up the RX packets from physical NIC into the
VM.  Good to hear someone (again) having this use-case. I've personally
not been focused on this use-case lately, mostly because community
members that I was interacting with changed jobs, away from cloud
hosting companies. Good to have a user back in this area!

Good to hear! Also, we'll probably not be the last ones coming up with this use-case. ;)


For this, we use XDP_REDIRECT to forward packets between the physical host NIC and the VM tap devices. The main issue we have now is, that our VM guests have some virtio NIC offloads enabled: rx/tx checksumming, TSO/GSO, GRO and Scatter-Gather.

Supporting RX-checksumming is part of the plans for XDP-hints, although
virtio_net is not part of my initial patchset.

Great!

XDP-redirect with GRO and Scatter-Gather frames are part of the
multi-buff effort (Cc Lorenzo), but currently XDP_REDIRECT with
multi-buff is disabled (except for cpumap), because the lack of
XDP-feature bits, meaning we cannot determine (in kernel) if receiving
net_device supports multi-buff (Cc Kumar).

Can this also be solved with XDP-Hints or is this an unrelated issue?

The XDP multi-buffer support needed for TSO/GSO seems to be mostly there

A subtle detail is that both XDP-hints and XDP multi-buff are needed to
get GRO/GSO kernel infra working.  For the kernel to construct GRO-SKB
based packets on XDP-redirected incoming xdp_frame's, the kernel code
requires both RX-csum and RX-hash before coalescing GRO frames.

already, but, to our understanding, the last missing part for full TSO/GSO support is a way to tell the physical NIC to perform the TSO/GSO offload.


The TSO/GSO side is usually the TX side.  The VM should be able to send
out normal TSO/GSO (multi-buffer) packets.

Currently the VM sends out multi-buffer packets, but after redirecting them, they are probably not getting segmented on the way out of the physical NIC. Or, as you wrote earlier, the XDP multi-buffer support isn't even used there and the packet just gets truncated on the way into XDP. I've not exactly traced that down yet, but you probably know better than me what's happening there.

Because of that, the TX side offloads are more critical to us because we cannot easily disable them in the VMs. The RX side is less of an issue, because we have control over the physical NIC configuration and could temporarily disable all offloads there, until XDP supports them (which would of course be better). So RX offloads are very nice to have, but missing TX offloads are a show-stopper for this use-case, if we don't find a way to disable the offloads on all customer VMs.

> Or are you saying this also gets disabled when enabling XDP on the
> virtio_net RX side?

I'm not sure what you mean with that. What gets disabled?

I've seen  the latest LPC 2022 talk from Jesper Dangaard Brouer regarding the planned XDP-Hints feature. But this was mainly about Checksum and VLAN offloads. Is supporting TSO/GSO also one of the goals you have in mind with these XDP-Hints proposals?


As mentioned TSO/GSO is TX side. We (Cc Magnus) also want to extend
XDP-hints to TX-side, to allow asking the HW to perform different
offloads. Lets land RX-side first.

Makes sense, thanks for clarifying your roadmap!

The "XDP Cloud-Provider" project page describes a very similar use-case to what we plan to do. What's the goal of this project?


Yes, this sounds VERY similar to your use-case.

I think you are referring to this:
  [1] https://xdp-project.net/areas/xdp-cloud-provider.html
  [2] https://github.com/xdp-project/xdp-cloud

The GitHub Link is a 404. Maybe this repository is private-only?

We had two Cloud Hosting companies interested in this use-case and
started a "sub" xdp-project, with the intent of working together on
code[2] that implements concrete BPF tools, that functions as building
blocks that the individual companies can integrate into their systems,
separating out customer provisioning to the companies.
(p.s. this approach have worked well for xdp-cpumap-tc[3] scaling tool)

I wonder what these common building blocks could be. I think this would be mostly just a program that calls XDP-Redirect and also some XDP-Hints handling in the future. This could also be demonstrated as an example program. While looking at our current XDP-Stack design draft, I think everything beyond that is highly specific to how the network infrastructure of the cloud hosting environment is designed and will probably be hard to apply to the requirements of other providers.

But of course, having a simple reference implementation of a XDP datapath that demonstrates how XDP can be used to connect VMs to the outside, would still be very useful. For documentation purposes, maybe not su much as a framework.

We're very interested in the work on XDP-Hints and in the performance benefits that the offloading support could bring to XDP and I would be thankful if you could help us with some of our questions.

Thanks for showing interest in XDP-hints, this is very motivating for me
personally to continue this work upstream.

Even though I'm not familar enough with the kernel networking code to comment on the related proposals, please let me know when there is something else we can do to help getting things upstreamed.

Marcus



[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux