Re: [PATCH v2 4/4] virtio-net: Add support for USO features

Jason Wang <jasowang@xxxxxxxxxx> · Tue, 30 Jul 2024 10:04:38 +0800

On Tue, Jul 30, 2024 at 12:43 AM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote:
>
> On 2024/07/29 23:29, Peter Xu wrote:
> > On Mon, Jul 29, 2024 at 01:45:12PM +0900, Akihiko Odaki wrote:
> >> On 2024/07/29 12:50, Jason Wang wrote:
> >>> On Sun, Jul 28, 2024 at 11:19 PM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote:
> >>>>
> >>>> On 2024/07/27 5:47, Peter Xu wrote:
> >>>>> On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel P. Berrangé wrote:
> >>>>>> On Fri, Jul 26, 2024 at 10:43:42AM -0400, Peter Xu wrote:
> >>>>>>> On Fri, Jul 26, 2024 at 09:48:02AM +0100, Daniel P. Berrangé wrote:
> >>>>>>>> On Fri, Jul 26, 2024 at 09:03:24AM +0200, Thomas Huth wrote:
> >>>>>>>>> On 26/07/2024 08.08, Michael S. Tsirkin wrote:
> >>>>>>>>>> On Thu, Jul 25, 2024 at 06:18:20PM -0400, Peter Xu wrote:
> >>>>>>>>>>> On Tue, Aug 01, 2023 at 01:31:48AM +0300, Yuri Benditovich wrote:
> >>>>>>>>>>>> USO features of virtio-net device depend on kernel ability
> >>>>>>>>>>>> to support them, for backward compatibility by default the
> >>>>>>>>>>>> features are disabled on 8.0 and earlier.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Signed-off-by: Yuri Benditovich <yuri.benditovich@xxxxxxxxxx>
> >>>>>>>>>>>> Signed-off-by: Andrew Melnychecnko <andrew@xxxxxxxxxx>
> >>>>>>>>>>>
> >>>>>>>>>>> Looks like this patch broke migration when the VM starts on a host that has
> >>>>>>>>>>> USO supported, to another host that doesn't..
> >>>>>>>>>>
> >>>>>>>>>> This was always the case with all offloads. The answer at the moment is,
> >>>>>>>>>> don't do this.
> >>>>>>>>>
> >>>>>>>>> May I ask for my understanding:
> >>>>>>>>> "don't do this" = don't automatically enable/disable virtio features in QEMU
> >>>>>>>>> depending on host kernel features, or "don't do this" = don't try to migrate
> >>>>>>>>> between machines that have different host kernel features?
> >>>>>>>>>
> >>>>>>>>>> Long term, we need to start exposing management APIs
> >>>>>>>>>> to discover this, and management has to disable unsupported features.
> >>>>>>>>>
> >>>>>>>>> Ack, this likely needs some treatments from the libvirt side, too.
> >>>>>>>>
> >>>>>>>> When QEMU automatically toggles machine type featuers based on host
> >>>>>>>> kernel, relying on libvirt to then disable them again is impractical,
> >>>>>>>> as we cannot assume that the libvirt people are using knows about
> >>>>>>>> newly introduced features. Even if libvirt is updated to know about
> >>>>>>>> it, people can easily be using a previous libvirt release.
> >>>>>>>>
> >>>>>>>> QEMU itself needs to make the machine types do that they are there
> >>>>>>>> todo, which is to define a stable machine ABI.
> >>>>>>>>
> >>>>>>>> What QEMU is missing here is a "platform ABI" concept, to encode
> >>>>>>>> sets of features which are tied to specific platform generations.
> >>>>>>>> As long as we don't have that we'll keep having these broken
> >>>>>>>> migration problems from machine types dynamically changing instead
> >>>>>>>> of providing a stable guest ABI.
> >>>>>>>
> >>>>>>> Any more elaboration on this idea?  Would it be easily feasible in
> >>>>>>> implementation?
> >>>>>>
> >>>>>> In terms of launching QEMU I'd imagine:
> >>>>>>
> >>>>>>      $QEMU -machine pc-q35-9.1 -platform linux-6.9 ...args...
> >>>>>>
> >>>>>> Any virtual machine HW features which are tied to host kernel features
> >>>>>> would have their defaults set based on the requested -platform. The
> >>>>>> -machine will be fully invariant wrt the host kernel.
> >>>>>>
> >>>>>> You would have -platform hlep to list available platforms, and
> >>>>>> corresonding QMP "query-platforms" command to list what platforms
> >>>>>> are supported on a given host OS.
> >>>>>>
> >>>>>> Downstream distros can provide their own platforms definitions
> >>>>>> (eg "linux-rhel-9.5") if they have kernels whose feature set
> >>>>>> diverges from upstream due to backports.
> >>>>>>
> >>>>>> Mgmt apps won't need to be taught about every single little QEMU
> >>>>>> setting whose default is derived from the kernel. Individual
> >>>>>> defaults are opaque and controlled by the requested platform.
> >>>>>>
> >>>>>> Live migration has clearly defined semantics, and mgmt app can
> >>>>>> use query-platforms to validate two hosts are compatible.
> >>>>>>
> >>>>>> Omitting -platform should pick the very latest platform that is
> >>>>>> cmpatible with the current host (not neccessarily the latest
> >>>>>> platform built-in to QEMU).
> >>>>>
> >>>>> This seems to add one more layer to maintain, and so far I don't know
> >>>>> whether it's a must.
> >>>>>
> >>>>> To put it simple, can we simply rely on qemu cmdline as "the guest ABI"?  I
> >>>>> thought it was mostly the case already, except some extremely rare
> >>>>> outliers.
> >>>>>
> >>>>> When we have one host that boots up a VM using:
> >>>>>
> >>>>>      $QEMU1 $cmdline
> >>>>>
> >>>>> Then another host boots up:
> >>>>>
> >>>>>      $QEMU2 $cmdline -incoming XXX
> >>>>>
> >>>>> Then migration should succeed if $cmdline is exactly the same, and the VM
> >>>>> can boot up all fine without errors on both sides.
> >>>>>
> >>>>> AFAICT this has nothing to do with what kernel is underneath, even not
> >>>>> Linux?  I think either QEMU1 / QEMU2 has the option to fail.  But if it
> >>>>> didn't, I thought the ABI should be guaranteed.
> >>>>>
> >>>>> That's why I think this is a migration violation, as 99.99% of other device
> >>>>> properties should be following this rule.  The issue here is, we have the
> >>>>> same virtio-net-pci cmdline on both sides in this case, but the ABI got
> >>>>> break.
> >>>>>
> >>>>> That's also why I was suggesting if the property contributes to the guest
> >>>>> ABI, then AFAIU QEMU needs to:
> >>>>>
> >>>>>      - Firstly, never quietly flipping any bit that affects the ABI...
> >>>>>
> >>>>>      - Have a default value of off, then QEMU will always allow the VM to boot
> >>>>>        by default, while advanced users can opt-in on new features.  We can't
> >>>>>        make this ON by default otherwise some VMs can already fail to boot,
> >>>>
> >>>> It may not be necessary the case that old features are supported by
> >>>> every systems. In an extreme case, a user may migrate a VM from Linux to
> >>>> Windows, which probably doesn't support any offloading at all. A more
> >>>> convincing scenario is RSS offloading with eBPF; using eBPF requires a
> >>>> privilege so we cannot assume it is always available even on the latest
> >>>> version of Linux.
> >>>
> >>> I don't get why eBPF matters here. It is something that is not noticed
> >>> by the guest and we have a fallback anyhow.
>
> It is noticeable for the guest, and the fallback is not effective with
> vhost.

It's a bug then. Qemu can fallback to tuntap if it sees issues in vhost.

Thanks