> On Jun 18, 2024, at 11:14 AM, Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote: > > !-------------------------------------------------------------------| > CAUTION: External Email > > |-------------------------------------------------------------------! > > On Tue, Jun 18, 2024 at 08:06:06AM -0700, Jon Kohler wrote: >> Add plumbing for QEMU's switchover-ack migration capability, which >> helps lower the downtime during VFIO migrations. This capability is >> enabled by default as long as both the source and destination support >> it. >> >> Note: switchover-ack depends on the return path capability, so this may >> not be used when VIR_MIGRATE_TUNNELLED flag is set. >> >> Extensive details about the qemu switchover-ack implementation are >> available in the qemu series v6 cover letter [1] where the highlight is >> the extreme reduction in guest visible downtime. In addition to the >> original test results below, I saw a roughly ~20% reduction in downtime >> for VFIO VGPU devices at minimum. >> >> === Test results === >> >> The below table shows the downtime of two identical migrations. In the >> first migration swithcover ack is disabled and in the second it is >> enabled. The migrated VM is assigned with a mlx5 VFIO device which has >> 300MB of device data to be migrated. >> >> +----------------------+-----------------------+----------+ >> | Switchover ack | VFIO device data size | Downtime | >> +----------------------+-----------------------+----------+ >> | Disabled | 300MB | 1900ms | >> | Enabled | 300MB | 420ms | >> +----------------------+-----------------------+----------+ >> >> Switchover ack gives a roughly 4.5 times improvement in downtime. >> The 1480ms difference is time that is used for resource allocation for >> the VFIO device in the destination. Without switchover ack, this time is >> spent when the source VM is stopped and thus the downtime is much >> higher. With switchover ack, the time is spent when the source VM is >> still running. >> >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_project_qemu-2Ddevel_cover_20230621111201.29729-2D1-2Davihaih-40nvidia.com_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=LXbohsflI2O3b_WWKMx82xYBwAgUvO4yct1RrzuTOSs&e= >> >> Signed-off-by: Jon Kohler <jon@xxxxxxxxxxx> >> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> >> Cc: Avihai Horon <avihaih@xxxxxxxxxx> >> Cc: Markus Armbruster <armbru@xxxxxxxxxx> >> Cc: Peter Xu <peterx@xxxxxxxxxx> >> Cc: YangHang Liu <yanghliu@xxxxxxxxxx> >> --- >> include/libvirt/libvirt-domain.h | 11 +++++++++++ >> src/libvirt-domain.c | 20 ++++++++++++++++++++ >> src/qemu/qemu_migration.h | 1 + >> src/qemu/qemu_migration_params.c | 8 +++++++- >> src/qemu/qemu_migration_params.h | 1 + >> 5 files changed, 40 insertions(+), 1 deletion(-) >> >> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h >> index 2f5b01bbfe..9543629f30 100644 >> --- a/include/libvirt/libvirt-domain.h >> +++ b/include/libvirt/libvirt-domain.h >> @@ -1100,6 +1100,17 @@ typedef enum { >> * Since: 8.5.0 >> */ >> VIR_MIGRATE_ZEROCOPY = (1 << 20), >> + >> + /* Use switchover ack migration capability to reduce downtime on VFIO >> + * device migration. This prevents the source from stopping the VM and >> + * completing the migration until an ACK is received from the destination >> + * that it's OK to do so. Thus, a VFIO device can make sure that its >> + * initial bytes were sent and loaded in the destination before the >> + * source VM is stopped. >> + * >> + * Since: 10.5.0 >> + */ >> + VIR_MIGRATE_SWITCHOVER_ACK = (1 << 21), >> } virDomainMigrateFlags; > > Do we really need a flag for this ? Is there a credible scenario > in which this flag works, and yet shouldn't be used by libvirt ? > > IOW, can we just "do the right thing" and always enable this, > except for TUNNELLED mode. The flag makes the error handling in src/libvirt-domain.c a bit smoother, but if we dropped that, we could simply depend on the “party” logic in qemuMigrationParamsFlagMap to do the right thing, which as far as I can tell does exactly what you’re talking about. In short, if we are OK with dropping the error handling given by VIR_EXCLUSIVE_FLAGS_GOTO(), then we could simplify this. Happy to go either way with this, let me know what you think about the error handling comments above. Thanks for the quick review - Jon > > > With regards, > Daniel > -- > |: https://urldefense.proofpoint.com/v2/url?u=https-3A__berrange.com&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=dejjCPlhp3d29jOJm8xIAoRxolMcjVexK07G3qdQt5A&e= -o- https://urldefense.proofpoint.com/v2/url?u=https-3A__www.flickr.com_photos_dberrange&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=y0K6Ram4tL7Uu74R-yqo-gibJM9XxrdVy5eB8D8DGCQ&e= :| > |: https://urldefense.proofpoint.com/v2/url?u=https-3A__libvirt.org&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=o2SVP1vo3ALW1s_3cY_vd2DKCHcbK-zMR9Mpz6X5NzY&e= -o- https://urldefense.proofpoint.com/v2/url?u=https-3A__fstop138.berrange.com&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=Xn9lrZVPIT-j2dxLgZ44mqA2eOqDAZ5tIInN9zNMNT4&e= :| > |: https://urldefense.proofpoint.com/v2/url?u=https-3A__entangle-2Dphoto.org&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=gR-qzVxTnd6m6kXnUhxi0xNjkEphbpwc6SdNMmhBb_s&e= -o- https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instagram.com_dberrange&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=SbmO2KPGog2k1fYbN30msRFiwCUndhS_KyizZSGB4mI&e= :|