Re: [PATCH v2 4/4] virtio-net: Add support for USO features

Jason Wang <jasowang@xxxxxxxxxx> · Thu, 1 Aug 2024 10:28:25 +0800

On Wed, Jul 31, 2024 at 8:58 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Wed, Jul 31, 2024 at 03:41:00AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 31, 2024 at 08:04:24AM +0100, Daniel P. Berrangé wrote:
> > > On Tue, Jul 30, 2024 at 05:32:48PM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, Jul 30, 2024 at 04:03:53PM -0400, Peter Xu wrote:
> > > > > On Tue, Jul 30, 2024 at 03:22:50PM -0400, Michael S. Tsirkin wrote:
> > > > > > This is not what we did historically. Why should we start now?
> > > > >
> > > > > It's a matter of whether we still want migration to randomly fail, like
> > > > > what this patch does.
> > > > >
> > > > > Or any better suggestions?  I'm definitely open to that.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > --
> > > > > Peter Xu
> > > >
> > > > Randomly is an overstatement. You need to switch between kernels
> > > > where this feature differs. We did it with a ton of features
> > > > in the past, donnu why we single out USO now.
> > >
> > > This has been a problem with a ton of features in the past. We've
> > > ignored the problem, but that doesn't make it the right solution
> > >
> > > With regards,
> > > Daniel
> >
> > Pushing it to domain xml does not really help,
> > migration will still fail unexpectedly (after wasting
> > a ton of resources copying memory, and getting
> > a downtime bump, I might add).
>
> Could you elaborate why it would fail if with what I proposed?
>
> Note that if this is a generic comment about "any migration can fail if we
> found a device mismatch", we have plan to fix that to some degree. It's
> just that we don't have enough people working on these topics yet. See:
>
> https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake
>
> It includes:
>
>  "Check device tree on both sides, etc., to make sure the migration is
>   applicable. E.g., we should fail early and clearly on any device
>   mismatch."
>
> However I don't think it'll cover all checks, e.g. I _think_ even if we
> verify VMSDs then post_load() hooks can still fail, and there can be some
> corner cases to think.  And of course, this may not even apply to virtio
> since virtio manages migration itself, without providing a top-level vmsd.
>
> >
> > The right solution is to have a tool that can query
> > backends, and that given the results from all of the cluster,
> > generate a set of parameters that will ensure migration works.

This seems to be very hard for vhost-users.

> > Kind of like qemu-img, but for migration.
>
> This is adding extra work, IMHO.
>
> If we stick with "qemu cmdline as guest ABI" concept, I think we're all
> fine, as that work is done by QEMU booting up first on both sides,
> including dest.

Probably, letting Qemu to probe is much easier than rewriting the
probe in the upper layer.

>  Basically Libvirt already plays this role of the new tool
> without any new code to be added at all: what captured on the boot failure
> log will be the output of that tool if we write it.
>
> Thanks,

Thanks

>
> --
> Peter Xu
>