On Tue, Jun 18, 2024 at 03:26:03PM +0000, Jon Kohler wrote: > > > > On Jun 18, 2024, at 11:14 AM, Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote: > > > > !-------------------------------------------------------------------| > > CAUTION: External Email > > > > |-------------------------------------------------------------------! > > > > On Tue, Jun 18, 2024 at 08:06:06AM -0700, Jon Kohler wrote: > >> Add plumbing for QEMU's switchover-ack migration capability, which > >> helps lower the downtime during VFIO migrations. This capability is > >> enabled by default as long as both the source and destination support > >> it. > >> > >> Note: switchover-ack depends on the return path capability, so this may > >> not be used when VIR_MIGRATE_TUNNELLED flag is set. > >> > >> Extensive details about the qemu switchover-ack implementation are > >> available in the qemu series v6 cover letter [1] where the highlight is > >> the extreme reduction in guest visible downtime. In addition to the > >> original test results below, I saw a roughly ~20% reduction in downtime > >> for VFIO VGPU devices at minimum. > >> > >> === Test results === > >> > >> The below table shows the downtime of two identical migrations. In the > >> first migration swithcover ack is disabled and in the second it is > >> enabled. The migrated VM is assigned with a mlx5 VFIO device which has > >> 300MB of device data to be migrated. > >> > >> +----------------------+-----------------------+----------+ > >> | Switchover ack | VFIO device data size | Downtime | > >> +----------------------+-----------------------+----------+ > >> | Disabled | 300MB | 1900ms | > >> | Enabled | 300MB | 420ms | > >> +----------------------+-----------------------+----------+ > >> > >> Switchover ack gives a roughly 4.5 times improvement in downtime. > >> The 1480ms difference is time that is used for resource allocation for > >> the VFIO device in the destination. Without switchover ack, this time is > >> spent when the source VM is stopped and thus the downtime is much > >> higher. With switchover ack, the time is spent when the source VM is > >> still running. > >> > >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_project_qemu-2Ddevel_cover_20230621111201.29729-2D1-2Davihaih-40nvidia.com_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=LXbohsflI2O3b_WWKMx82xYBwAgUvO4yct1RrzuTOSs&e= > >> > >> Signed-off-by: Jon Kohler <jon@xxxxxxxxxxx> > >> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> > >> Cc: Avihai Horon <avihaih@xxxxxxxxxx> > >> Cc: Markus Armbruster <armbru@xxxxxxxxxx> > >> Cc: Peter Xu <peterx@xxxxxxxxxx> > >> Cc: YangHang Liu <yanghliu@xxxxxxxxxx> > >> --- > >> include/libvirt/libvirt-domain.h | 11 +++++++++++ > >> src/libvirt-domain.c | 20 ++++++++++++++++++++ > >> src/qemu/qemu_migration.h | 1 + > >> src/qemu/qemu_migration_params.c | 8 +++++++- > >> src/qemu/qemu_migration_params.h | 1 + > >> 5 files changed, 40 insertions(+), 1 deletion(-) > >> > >> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h > >> index 2f5b01bbfe..9543629f30 100644 > >> --- a/include/libvirt/libvirt-domain.h > >> +++ b/include/libvirt/libvirt-domain.h > >> @@ -1100,6 +1100,17 @@ typedef enum { > >> * Since: 8.5.0 > >> */ > >> VIR_MIGRATE_ZEROCOPY = (1 << 20), > >> + > >> + /* Use switchover ack migration capability to reduce downtime on VFIO > >> + * device migration. This prevents the source from stopping the VM and > >> + * completing the migration until an ACK is received from the destination > >> + * that it's OK to do so. Thus, a VFIO device can make sure that its > >> + * initial bytes were sent and loaded in the destination before the > >> + * source VM is stopped. > >> + * > >> + * Since: 10.5.0 > >> + */ > >> + VIR_MIGRATE_SWITCHOVER_ACK = (1 << 21), > >> } virDomainMigrateFlags; > > > > Do we really need a flag for this ? Is there a credible scenario > > in which this flag works, and yet shouldn't be used by libvirt ? > > > > IOW, can we just "do the right thing" and always enable this, > > except for TUNNELLED mode. > > The flag makes the error handling in src/libvirt-domain.c a bit > smoother, but if we dropped that, we could simply depend on > the “party” logic in qemuMigrationParamsFlagMap to do the > right thing, which as far as I can tell does exactly what you’re > talking about. > > In short, if we are OK with dropping the error handling given > by VIR_EXCLUSIVE_FLAGS_GOTO(), then we could > simplify this. If we don't add a VIR_MIGRATE_SWITCHOVER_ACK flag at all, then there are no errors that need to be checked using VIR_EXCLUSIVE_FLAGS_GOTO in the first place. Libvirt should automatically use switchover-ack, if-and-only-if, both src+dst QEMU support switchover-ack AND TUNNELLED flag is NOT requested. IOW, just make it "do the right thing" With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|