Re: [PATCH] qemu: add support for qemu switchover-ack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jun 18, 2024, at 11:14 AM, Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Tue, Jun 18, 2024 at 08:06:06AM -0700, Jon Kohler wrote:
>> Add plumbing for QEMU's switchover-ack migration capability, which
>> helps lower the downtime during VFIO migrations. This capability is
>> enabled by default as long as both the source and destination support
>> it.
>> 
>> Note: switchover-ack depends on the return path capability, so this may
>> not be used when VIR_MIGRATE_TUNNELLED flag is set.
>> 
>> Extensive details about the qemu switchover-ack implementation are
>> available in the qemu series v6 cover letter [1] where the highlight is
>> the extreme reduction in guest visible downtime. In addition to the
>> original test results below, I saw a roughly ~20% reduction in downtime
>> for VFIO VGPU devices at minimum.
>> 
>>  === Test results ===
>> 
>>  The below table shows the downtime of two identical migrations. In the
>>  first migration swithcover ack is disabled and in the second it is
>>  enabled. The migrated VM is assigned with a mlx5 VFIO device which has
>>  300MB of device data to be migrated.
>> 
>>  +----------------------+-----------------------+----------+
>>  |    Switchover ack    | VFIO device data size | Downtime |
>>  +----------------------+-----------------------+----------+
>>  |       Disabled       |         300MB         |  1900ms  |
>>  |       Enabled        |         300MB         |  420ms   |
>>  +----------------------+-----------------------+----------+
>> 
>>  Switchover ack gives a roughly 4.5 times improvement in downtime.
>>  The 1480ms difference is time that is used for resource allocation for
>>  the VFIO device in the destination. Without switchover ack, this time is
>>  spent when the source VM is stopped and thus the downtime is much
>>  higher. With switchover ack, the time is spent when the source VM is
>>  still running.
>> 
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_project_qemu-2Ddevel_cover_20230621111201.29729-2D1-2Davihaih-40nvidia.com_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=LXbohsflI2O3b_WWKMx82xYBwAgUvO4yct1RrzuTOSs&e=
>> 
>> Signed-off-by: Jon Kohler <jon@xxxxxxxxxxx>
>> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
>> Cc: Avihai Horon <avihaih@xxxxxxxxxx>
>> Cc: Markus Armbruster <armbru@xxxxxxxxxx>
>> Cc: Peter Xu <peterx@xxxxxxxxxx>
>> Cc: YangHang Liu <yanghliu@xxxxxxxxxx>
>> ---
>> include/libvirt/libvirt-domain.h | 11 +++++++++++
>> src/libvirt-domain.c             | 20 ++++++++++++++++++++
>> src/qemu/qemu_migration.h        |  1 +
>> src/qemu/qemu_migration_params.c |  8 +++++++-
>> src/qemu/qemu_migration_params.h |  1 +
>> 5 files changed, 40 insertions(+), 1 deletion(-)
>> 
>> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
>> index 2f5b01bbfe..9543629f30 100644
>> --- a/include/libvirt/libvirt-domain.h
>> +++ b/include/libvirt/libvirt-domain.h
>> @@ -1100,6 +1100,17 @@ typedef enum {
>>      * Since: 8.5.0
>>      */
>>     VIR_MIGRATE_ZEROCOPY = (1 << 20),
>> +
>> +    /* Use switchover ack migration capability to reduce downtime on VFIO
>> +     * device migration. This prevents the source from stopping the VM and
>> +     * completing the migration until an ACK is received from the destination
>> +     * that it's OK to do so. Thus, a VFIO device can make sure that its
>> +     * initial bytes were sent and loaded in the destination before the
>> +     * source VM is stopped.
>> +     *
>> +     * Since: 10.5.0
>> +     */
>> +    VIR_MIGRATE_SWITCHOVER_ACK = (1 << 21),
>> } virDomainMigrateFlags;
> 
> Do we really need a flag for this ?  Is there a credible scenario
> in which this flag works, and yet shouldn't be used by libvirt ?
> 
> IOW, can we just "do the right thing" and always enable this,
> except for TUNNELLED mode.

The flag makes the error handling in src/libvirt-domain.c a bit
smoother, but if we dropped that, we could simply depend on
the “party” logic in qemuMigrationParamsFlagMap to do the
right thing, which as far as I can tell does exactly what you’re
talking about.

In short, if we are OK with dropping the error handling given
by VIR_EXCLUSIVE_FLAGS_GOTO(), then we could
simplify this.

Happy to go either way with this, let me know what you think
about the error handling comments above.

Thanks for the quick review - 
Jon

> 
> 
> With regards,
> Daniel
> -- 
> |: https://urldefense.proofpoint.com/v2/url?u=https-3A__berrange.com&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=dejjCPlhp3d29jOJm8xIAoRxolMcjVexK07G3qdQt5A&e=       -o-    https://urldefense.proofpoint.com/v2/url?u=https-3A__www.flickr.com_photos_dberrange&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=y0K6Ram4tL7Uu74R-yqo-gibJM9XxrdVy5eB8D8DGCQ&e=  :|
> |: https://urldefense.proofpoint.com/v2/url?u=https-3A__libvirt.org&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=o2SVP1vo3ALW1s_3cY_vd2DKCHcbK-zMR9Mpz6X5NzY&e=          -o-            https://urldefense.proofpoint.com/v2/url?u=https-3A__fstop138.berrange.com&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=Xn9lrZVPIT-j2dxLgZ44mqA2eOqDAZ5tIInN9zNMNT4&e=  :|
> |: https://urldefense.proofpoint.com/v2/url?u=https-3A__entangle-2Dphoto.org&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=gR-qzVxTnd6m6kXnUhxi0xNjkEphbpwc6SdNMmhBb_s&e=     -o-    https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instagram.com_dberrange&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=SbmO2KPGog2k1fYbN30msRFiwCUndhS_KyizZSGB4mI&e=  :|






[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux