Re: [xdp-cloud] Re: Questions about Offloads and XDP-Hints regarding a Cloud-Provider Use-Case

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




(Answered inline, below)

On 29/09/2022 15.16, Marcus Wichelmann wrote:
Am 28.09.22 um 20:07 schrieb Jesper Dangaard Brouer:

On 28/09/2022 15.54, Marcus Wichelmann wrote:

I'm working for a cloud hosting provider and we're working on a new XDP-based networking stack for our VM-Hosts that uses XDP to accelerate the connectivity of our qemu/KVM VMs to the outside.


Welcome to the community!

Thank you!

Sounds like an excellent use-case and
opportunity for speeding up the RX packets from physical NIC into the
VM.  Good to hear someone (again) having this use-case. I've personally
not been focused on this use-case lately, mostly because community
members that I was interacting with changed jobs, away from cloud
hosting companies. Good to have a user back in this area!

Good to hear! Also, we'll probably not be the last ones coming up with this use-case. ;)


Yes, and remember to look at the effort done by people before...

I urge you to read David Ahern's slides:

https://legacy.netdevconf.info/0x14/pub/slides/24/netdev-0x14-XDP-and-the-cloud.pdf

It is a details step-by-step explanation of your use-case, along with
the pitfalls and gotchas.  If you hit an issue, do remember to bring it
to the attention of the community (e.g. xdp-newbies), then lurking
kernel engineers likely will get motivated to fix these issues upstream.
(Like slides explain improvements for redirects in kernel v5.4 + v5.6 + v5.8)

For this, we use XDP_REDIRECT to forward packets between the physical host NIC and the VM tap devices. The main issue we have now is, that our VM guests have some virtio NIC offloads enabled: rx/tx checksumming, TSO/GSO, GRO and Scatter-Gather.

Supporting RX-checksumming is part of the plans for XDP-hints, although
virtio_net is not part of my initial patchset.

Great!

It should be trivial to add to virtio_net.

XDP-redirect with GRO and Scatter-Gather frames are part of the
multi-buff effort (Cc Lorenzo), but currently XDP_REDIRECT with
multi-buff is disabled (except for cpumap), because the lack of
XDP-feature bits, meaning we cannot determine (in kernel) if receiving
net_device supports multi-buff (Cc Kumar).

Can this also be solved with XDP-Hints or is this an unrelated issue?


This is unrelated to XDP-hints.

The XDP multi-buffer support needed for TSO/GSO seems to be mostly there

A subtle detail is that both XDP-hints and XDP multi-buff are needed to
get GRO/GSO kernel infra working.  For the kernel to construct GRO-SKB
based packets on XDP-redirected incoming xdp_frame's, the kernel code
requires both RX-csum and RX-hash before coalescing GRO frames.

already, but, to our understanding, the last missing part for full TSO/GSO support is a way to tell the physical NIC to perform the TSO/GSO offload.


The TSO/GSO side is usually the TX side.  The VM should be able to send
out normal TSO/GSO (multi-buffer) packets.

Currently the VM sends out multi-buffer packets, but after redirecting them, they are probably not getting segmented on the way out of the physical NIC. Or, as you wrote earlier, the XDP multi-buffer support isn't even used there and the packet just gets truncated on the way into XDP. I've not exactly traced that down yet, but you probably know better than me what's happening there.

XDP program on tap-device will likely cause drops of multi-buffer packets (send out by VM).

(1) First of all this XDP-tap program need to use the newer XDP program sub-type that known about multi-buffer packets.

(2) I'm not sure XDP-tap (virio_net) got multi-buffer support.
 Lorenzo or Jason do you know?

Because of that, the TX side offloads are more critical to us because we cannot easily disable them in the VMs. The RX side is less of an issue, because we have control over the physical NIC configuration and could temporarily disable all offloads there, until XDP supports them (which would of course be better). So RX offloads are very nice to have, but missing TX offloads are a show-stopper for this use-case, if we don't find a way to disable the offloads on all customer VMs.

 > Or are you saying this also gets disabled when enabling XDP on the
 > virtio_net RX side?

I'm not sure what you mean with that. What gets disabled?


See Ahern's slide "Redirecting VM Egress Traffic".

The libvirt config (or Qemu/kvm params) currently need to disables many
of the offloads for XDP-on-tap to work.

IMHO this is something we kernel developers need to fix/improve.
(Cc Jason + Lorenzo)

I've seen  the latest LPC 2022 talk from Jesper Dangaard Brouer regarding the planned XDP-Hints feature. But this was mainly about Checksum and VLAN offloads. Is supporting TSO/GSO also one of the goals you have in mind with these XDP-Hints proposals?


As mentioned TSO/GSO is TX side. We (Cc Magnus) also want to extend
XDP-hints to TX-side, to allow asking the HW to perform different
offloads. Lets land RX-side first.

Makes sense, thanks for clarifying your roadmap!


For your own roadmap, waiting for "TX-XDP-hints" is likely problematic.

Thus, I would likely recommend NOT XDP-redirecting (TCP) traffic coming
from the VMs, which will hit the XDP-tap BPF program.  The XDP-tap
program could selectively XDP-redirect the UDP packets (if your
measurements show it to be faster).

Start with XDP redirecting from the physical NIC device into the VMs.
The XDP-hints coming from physical NIC device should be trivially to
convert into the format KVM needs.
Looking at kernel code we need to populate struct virtio_net_hdr (which
is inside struct tun_xdp_hdr).


The "XDP Cloud-Provider" project page describes a very similar use-case to what we plan to do. What's the goal of this project?


Yes, this sounds VERY similar to your use-case.

I think you are referring to this:
  [1] https://xdp-project.net/areas/xdp-cloud-provider.html
  [2] https://github.com/xdp-project/xdp-cloud

The GitHub Link is a 404. Maybe this repository is private-only?

Yes, sorry about that git repo is marked private, because the project didn't take off.


We had two Cloud Hosting companies interested in this use-case and
started a "sub" xdp-project, with the intent of working together on
code[2] that implements concrete BPF tools, that functions as building
blocks that the individual companies can integrate into their systems,
separating out customer provisioning to the companies.
(p.s. this approach have worked well for xdp-cpumap-tc[3] scaling tool)

I wonder what these common building blocks could be. I think this would be mostly just a program that calls XDP-Redirect and also some XDP-Hints handling in the future. This could also be demonstrated as an example program.

Sure.
I recommend you start with coding an eBPF example program, and if you
want my help please base it on https://github.com/xdp-project/bpf-examples

While looking at our current XDP-Stack design draft, I think everything beyond that is highly specific to how the network infrastructure of the cloud hosting environment is designed and will probably be hard to apply to the requirements of other providers.


Hmm... I kind of disagree, but that should not stop you.
I still encourage to decouple customer/VM provisioning in your design.

But of course, having a simple reference implementation of a XDP datapath that demonstrates how XDP can be used to connect VMs to the outside, would still be very useful. For documentation purposes, maybe not su much as a framework.

Great, lets start with PoC/MVP as sub-dir under:
 https://github.com/xdp-project/bpf-examples

If we can iterate over a public 'xdp-cloud' bpf-example, then the
community can easier reproduce the issues that devel process brings up.

--Jesper




[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux