Re: [PATCH v2 4/4] virtio-net: Add support for USO features

Jason Wang <jasowang@xxxxxxxxxx> · Mon, 29 Jul 2024 11:50:25 +0800

On Sun, Jul 28, 2024 at 11:19 PM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote:
>
> On 2024/07/27 5:47, Peter Xu wrote:
> > On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel P. Berrangé wrote:
> >> On Fri, Jul 26, 2024 at 10:43:42AM -0400, Peter Xu wrote:
> >>> On Fri, Jul 26, 2024 at 09:48:02AM +0100, Daniel P. Berrangé wrote:
> >>>> On Fri, Jul 26, 2024 at 09:03:24AM +0200, Thomas Huth wrote:
> >>>>> On 26/07/2024 08.08, Michael S. Tsirkin wrote:
> >>>>>> On Thu, Jul 25, 2024 at 06:18:20PM -0400, Peter Xu wrote:
> >>>>>>> On Tue, Aug 01, 2023 at 01:31:48AM +0300, Yuri Benditovich wrote:
> >>>>>>>> USO features of virtio-net device depend on kernel ability
> >>>>>>>> to support them, for backward compatibility by default the
> >>>>>>>> features are disabled on 8.0 and earlier.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Yuri Benditovich <yuri.benditovich@xxxxxxxxxx>
> >>>>>>>> Signed-off-by: Andrew Melnychecnko <andrew@xxxxxxxxxx>
> >>>>>>>
> >>>>>>> Looks like this patch broke migration when the VM starts on a host that has
> >>>>>>> USO supported, to another host that doesn't..
> >>>>>>
> >>>>>> This was always the case with all offloads. The answer at the moment is,
> >>>>>> don't do this.
> >>>>>
> >>>>> May I ask for my understanding:
> >>>>> "don't do this" = don't automatically enable/disable virtio features in QEMU
> >>>>> depending on host kernel features, or "don't do this" = don't try to migrate
> >>>>> between machines that have different host kernel features?
> >>>>>
> >>>>>> Long term, we need to start exposing management APIs
> >>>>>> to discover this, and management has to disable unsupported features.
> >>>>>
> >>>>> Ack, this likely needs some treatments from the libvirt side, too.
> >>>>
> >>>> When QEMU automatically toggles machine type featuers based on host
> >>>> kernel, relying on libvirt to then disable them again is impractical,
> >>>> as we cannot assume that the libvirt people are using knows about
> >>>> newly introduced features. Even if libvirt is updated to know about
> >>>> it, people can easily be using a previous libvirt release.
> >>>>
> >>>> QEMU itself needs to make the machine types do that they are there
> >>>> todo, which is to define a stable machine ABI.
> >>>>
> >>>> What QEMU is missing here is a "platform ABI" concept, to encode
> >>>> sets of features which are tied to specific platform generations.
> >>>> As long as we don't have that we'll keep having these broken
> >>>> migration problems from machine types dynamically changing instead
> >>>> of providing a stable guest ABI.
> >>>
> >>> Any more elaboration on this idea?  Would it be easily feasible in
> >>> implementation?
> >>
> >> In terms of launching QEMU I'd imagine:
> >>
> >>    $QEMU -machine pc-q35-9.1 -platform linux-6.9 ...args...
> >>
> >> Any virtual machine HW features which are tied to host kernel features
> >> would have their defaults set based on the requested -platform. The
> >> -machine will be fully invariant wrt the host kernel.
> >>
> >> You would have -platform hlep to list available platforms, and
> >> corresonding QMP "query-platforms" command to list what platforms
> >> are supported on a given host OS.
> >>
> >> Downstream distros can provide their own platforms definitions
> >> (eg "linux-rhel-9.5") if they have kernels whose feature set
> >> diverges from upstream due to backports.
> >>
> >> Mgmt apps won't need to be taught about every single little QEMU
> >> setting whose default is derived from the kernel. Individual
> >> defaults are opaque and controlled by the requested platform.
> >>
> >> Live migration has clearly defined semantics, and mgmt app can
> >> use query-platforms to validate two hosts are compatible.
> >>
> >> Omitting -platform should pick the very latest platform that is
> >> cmpatible with the current host (not neccessarily the latest
> >> platform built-in to QEMU).
> >
> > This seems to add one more layer to maintain, and so far I don't know
> > whether it's a must.
> >
> > To put it simple, can we simply rely on qemu cmdline as "the guest ABI"?  I
> > thought it was mostly the case already, except some extremely rare
> > outliers.
> >
> > When we have one host that boots up a VM using:
> >
> >    $QEMU1 $cmdline
> >
> > Then another host boots up:
> >
> >    $QEMU2 $cmdline -incoming XXX
> >
> > Then migration should succeed if $cmdline is exactly the same, and the VM
> > can boot up all fine without errors on both sides.
> >
> > AFAICT this has nothing to do with what kernel is underneath, even not
> > Linux?  I think either QEMU1 / QEMU2 has the option to fail.  But if it
> > didn't, I thought the ABI should be guaranteed.
> >
> > That's why I think this is a migration violation, as 99.99% of other device
> > properties should be following this rule.  The issue here is, we have the
> > same virtio-net-pci cmdline on both sides in this case, but the ABI got
> > break.
> >
> > That's also why I was suggesting if the property contributes to the guest
> > ABI, then AFAIU QEMU needs to:
> >
> >    - Firstly, never quietly flipping any bit that affects the ABI...
> >
> >    - Have a default value of off, then QEMU will always allow the VM to boot
> >      by default, while advanced users can opt-in on new features.  We can't
> >      make this ON by default otherwise some VMs can already fail to boot,
>
> It may not be necessary the case that old features are supported by
> every systems. In an extreme case, a user may migrate a VM from Linux to
> Windows, which probably doesn't support any offloading at all. A more
> convincing scenario is RSS offloading with eBPF; using eBPF requires a
> privilege so we cannot assume it is always available even on the latest
> version of Linux.

I don't get why eBPF matters here. It is something that is not noticed
by the guest and we have a fallback anyhow.

>
> >
> >    - If the host doesn't support the feature while the cmdline enabled it,
> >      it needs to fail QEMU boot rather than flipping, so that it says "hey,
> >      this host does not support running such VM specified, due to XXX
> >      feature missing".
>
> This is handled in:
>
> "virtio-net: Convert feature properties to OnOffAuto"
> https://patchew.org/QEMU/20240714-auto-v3-0-e27401aabab3@xxxxxxxxxx/

I may miss something but I think "Auto" doesn't make sense to libvirt.

>
> >
> > That's the only way an user could understand what happened, and IMHO that's
> > a clean way that we stick with QEMU cmdline on defining the guest ABI,
> > while in which the machine type is the fundation of such definition, as the
> > machine type can decides many of the rest compat properties.  And that's
> > the whole point of the compat properties too (to make sure the guest ABI is
> > stable).
> >
> > If kernel breaks it easily, all compat property things that we maintain can
> > already stop making sense in general, because it didn't define the whol
> > guest ABI..
> >
> > So AFAIU that's really what we used for years, I hope I didn't overlook
> > somehting.  And maybe we don't yet need the "-platform" layer if we can
> > keep up with this rule?
>
> I think a device which cannot conform to that rule should be
> non-migratable. For example, virtio-gpu-gl does not conform to it, and
> does not support migration either.
>
> Regards,
> Akihiko Odaki
>

Thanks