Qemu currently implements pre-copy live migration. VM memory pages are first copied from the source hypervisor to the destination, potentially multiple times as pages get dirtied during transfer, then VCPU state is migrated. Unfortunately, if the VM dirties memory faster than the network bandwidth, then pre-copy cannot finish. `virsh` currently includes an option to suspend a VM after a timeout, so that migration may finish, but at the expense of downtime. A future version of qemu will implement post-copy live migration. The VCPU state is first migrated to the destination hypervisor, then memory pages are pulled from the source hypervisor. Post-copy has the potential to do migration with zero-downtime, despite the VM dirtying pages fast, with minimum performance impact. On the other hand, while post-copy is in progress, any network failure would render the VM unusable, as its memory is partitioned between the source and destination hypervisor. Therefore, post-copy should only be used when necessary. Post-copy migration in qemu will work as follows: (1) The `x-postcopy-ram` migration capability needs to be set. (2) Migration is started. (3) When the user decides so, post-copy migration is activated by sending the `migrate-start-postcopy` command. (4) Qemu acknowledges by setting migration status to `postcopy-active`. This patch series implements two ways to access post-copy functionality: low-level and high-level. The low-level API implements a mechanism that basically requires the libvirt user to manually go through the above steps, by calling migration with `VIR_MIGRATE_ENABLE_POSTCOPY`, then during migration from a separate thread, call `virDomainMigrateStartPostCopy`. The choice of when migration should switch from pre-copy to post-copy is left entirely to the user. The high-level API implements a policy that automatically triggers post-copy after one pass of pre-copy, which experiments have shown to minimize downtime. Using it is also simpler: the user only has to call migration with `VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY`. TODO: - Wait for qemu API to become stable, i.e., drop `x-` - Wait for qemu to offer notification for migration state change v4: - Rename low-level API flag to `VIR_MIGRATE_ENABLE_POSTCOPY` - Added high-level API flag `VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY` - Do not introduce a new job type and use migration status instead - Added both low- and high-level interface to virsh - Tested with OpenStack Icehouse Cristian Klein (11): Added public API for post-copy migration qemu: added low-level post-copy migration functions qemu: implemented VIR_MIGRATE_ENABLE_POSTCOPY qemu: implemented post-copy migration logic qemu: implement virDomainMigrateStartPostCopy virsh: added --enable-postcopy and migrate-start-postcopy virsh: added --postcopy-after to migrate command qemu: retrieve dirty sync count qemu: implemented VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY virsh: added --postcopy-after-precopy to migrate Revert "Do not allow changing the UUID of a nwfilter" include/libvirt/libvirt-domain.h | 5 ++ src/conf/nwfilter_conf.c | 11 --- src/driver-hypervisor.h | 5 ++ src/libvirt-domain.c | 92 +++++++++++++++++++++ src/libvirt_public.syms | 1 + src/qemu/qemu_driver.c | 60 ++++++++++++++ src/qemu/qemu_migration.c | 169 +++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 4 +- src/qemu/qemu_monitor.c | 24 +++++- src/qemu/qemu_monitor.h | 5 ++ src/qemu/qemu_monitor_json.c | 27 ++++++- src/qemu/qemu_monitor_json.h | 1 + src/qemu/qemu_monitor_text.c | 1 + src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 ++- src/remote_protocol-structs | 5 ++ tests/qemumonitorjsontest.c | 1 + tools/virsh-domain.c | 116 ++++++++++++++++++++++++++- tools/virsh.pod | 21 +++++ 19 files changed, 536 insertions(+), 25 deletions(-) -- 1.9.1 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list