On Mon, Jun 24, 2024 at 10:38:51 -0700, Jon Kohler wrote: > Add plumbing for QEMU's switchover-ack migration capability, which > helps lower the downtime during VFIO migrations. This capability is > enabled by default as long as both the source and destination support > it. > > Note: switchover-ack depends on the return path capability, so this may > not be used when VIR_MIGRATE_TUNNELLED flag is set. > > Extensive details about the qemu switchover-ack implementation are > available in the qemu series v6 cover letter [1] where the highlight is > the extreme reduction in guest visible downtime. In addition to the > original test results below, I saw a roughly ~20% reduction in downtime > for VFIO VGPU devices at minimum. > > === Test results === > > The below table shows the downtime of two identical migrations. In the > first migration swithcover ack is disabled and in the second it is > enabled. The migrated VM is assigned with a mlx5 VFIO device which has > 300MB of device data to be migrated. > > +----------------------+-----------------------+----------+ > | Switchover ack | VFIO device data size | Downtime | > +----------------------+-----------------------+----------+ > | Disabled | 300MB | 1900ms | > | Enabled | 300MB | 420ms | > +----------------------+-----------------------+----------+ > > Switchover ack gives a roughly 4.5 times improvement in downtime. > The 1480ms difference is time that is used for resource allocation for > the VFIO device in the destination. Without switchover ack, this time is > spent when the source VM is stopped and thus the downtime is much > higher. With switchover ack, the time is spent when the source VM is > still running. > > [1] https://patchwork.kernel.org/project/qemu-devel/cover/20230621111201.29729-1-avihaih@xxxxxxxxxx/ > > Signed-off-by: Jon Kohler <jon@xxxxxxxxxxx> > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> > Cc: Avihai Horon <avihaih@xxxxxxxxxx> > Cc: Markus Armbruster <armbru@xxxxxxxxxx> > Cc: Peter Xu <peterx@xxxxxxxxxx> > Cc: YangHang Liu <yanghliu@xxxxxxxxxx> > --- > v1 > - https://lists.libvirt.org/archives/list/devel@xxxxxxxxxxxxxxxxx/thread/2XCWPYAUE7HUIMSAYYWAUUYGGZ6WYR53/ > v1 -> v2: > - Addressed comments to simplify approach (Daniel, Jiri) > --- > src/qemu/qemu_migration.h | 1 + > src/qemu/qemu_migration_params.c | 8 +++++++- > src/qemu/qemu_migration_params.h | 1 + > 3 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h > index ed62fd4a91..cd89e100e1 100644 > --- a/src/qemu/qemu_migration.h > +++ b/src/qemu/qemu_migration.h > @@ -62,6 +62,7 @@ > VIR_MIGRATE_NON_SHARED_SYNCHRONOUS_WRITES | \ > VIR_MIGRATE_POSTCOPY_RESUME | \ > VIR_MIGRATE_ZEROCOPY | \ > + VIR_MIGRATE_SWITCHOVER_ACK | \ > 0) > > /* All supported migration parameters and their types. */ This is a leftover from v1, it's no longer needed. I removed this hunk and pushed the patch. Thanks. Reviewed-by: Jiri Denemark <jdenemar@xxxxxxxxxx>