This series implements a new VIR_MIGRATE_POSTCOPY_RESUME flag (virsh migrate --resume) for recovering from a failed post-copy migration. You can also fetch the series from my gitlab fork (the last RFC patch is missing there): git fetch https://gitlab.com/jirkade/libvirt.git post-copy-recovery Version 2: - rebased and changed Since tags to 8.5.0 - even patches marked as "no change" can be a bit different as required by rebasing to the current master or changes in other patches - replaced a few patches with the "qemu: Drop QEMU_CAPS_MIGRATION_EVENT" series: - [03/80] qemu: Return state from qemuMonitorGetMigrationCapabilities - [04/80] qemu: Enable migration events only when disabled - [20/80] qemu: Use switch in qemuDomainGetJobInfoMigrationStats - see individual patches for additional details - most of the patches were acked in v1, the following patches did not earn a Reviewed-by tag, were changed and lost the tag, or were added since the previous version of this series: - [03] Introduce VIR_DOMAIN_RUNNING_POSTCOPY_FAILED - [04] qemu: Keep domain running on dst on failed post-copy migration - [15] qemu: Restore async job start timestamp on reconnect - [19] qemu: Use switch in qemuProcessHandleMigrationStatus - [20] qemu: Handle 'postcopy-paused' migration state - [21] qemu: Add support for postcopy-recover QEMU migration state - [33] qemu: Introduce qemuMigrationDstFinishActive - [34] qemu: Handle migration job in qemuMigrationDstFinish - [45] qemu: Make qemuMigrationCheckPhase failure fatal - [48] qemu: Use QEMU_MIGRATION_PHASE_POSTCOPY_FAILED - [52] qemu: Implement VIR_MIGRATE_POSTCOPY_RESUME for Begin phase - [60] qemu: Use autoptr for mig in qemuMigrationDstPrepareFresh - [76] qemu: Implement VIR_DOMAIN_ABORT_JOB_POSTCOPY flag - [79] Introduce VIR_JOB_MIGRATION_SAFE job type - [80] qemu: Fix VSERPORT_CHANGE event in post-copy migration - [81] RFC: qemu: Keep vCPUs paused while migration is in postcopy-paused Jiri Denemark (81): qemu: Add debug messages to job recovery code qemumonitorjsontest: Test more migration capabilities Introduce VIR_DOMAIN_RUNNING_POSTCOPY_FAILED qemu: Keep domain running on dst on failed post-copy migration qemu: Explicitly emit events on post-copy failure qemu: Make qemuDomainCleanupAdd return void conf: Introduce virDomainObjIsFailedPostcopy helper conf: Introduce virDomainObjIsPostcopy helper qemu: Introduce qemuProcessCleanupMigrationJob qemu: Rename qemuDomainObjRestoreJob as qemuDomainObjPreserveJob qemu: Add qemuDomainObjRestoreAsyncJob qemu: Keep migration job active after failed post-copy qemu: Abort failed post-copy when we haven't called Finish yet qemu: Restore failed migration job on reconnect qemu: Restore async job start timestamp on reconnect qemu: Drop forward declarations in migration code qemu: Don't wait for migration job when migration is running qemu: Fetch paused migration stats qemu: Use switch in qemuProcessHandleMigrationStatus qemu: Handle 'postcopy-paused' migration state qemu: Add support for postcopy-recover QEMU migration state qemu: Create domain object at the end of qemuMigrationDstFinish qemu: Move success-only code out of endjob in qemuMigrationDstFinish qemu: Separate success and failure path in qemuMigrationDstFinish qemu: Rename "endjob" label in qemuMigrationDstFinish qemu: Generate migration cookie in Finish phase earlier qemu: Make final part of migration Finish phase reusable qemu: Drop obsolete comment in qemuMigrationDstFinish qemu: Preserve error in qemuMigrationDstFinish qemu: Introduce qemuMigrationDstFinishFresh qemu: Introduce qemuMigrationDstFinishOffline qemu: Separate cookie parsing for qemuMigrationDstFinishOffline qemu: Introduce qemuMigrationDstFinishActive qemu: Handle migration job in qemuMigrationDstFinish qemu: Make final part of migration Confirm phase reusable qemu: Make sure migrationPort is released even in callbacks qemu: Pass qemuDomainJobObj to qemuMigrationDstComplete qemu: Finish completed unattended migration qemu: Ignore missing memory statistics in query-migrate qemu: Improve post-copy migration handling on reconnect qemu: Check flags incompatible with offline migration earlier qemu: Introduce qemuMigrationSrcBeginXML helper qemu: Add new migration phases for post-copy recovery qemu: Separate protocol checks from qemuMigrationJobSetPhase qemu: Make qemuMigrationCheckPhase failure fatal qemu: Refactor qemuDomainObjSetJobPhase qemu: Do not set job owner in qemuMigrationJobSetPhase qemu: Use QEMU_MIGRATION_PHASE_POSTCOPY_FAILED Introduce VIR_MIGRATE_POSTCOPY_RESUME flag virsh: Add --postcopy-resume option for migrate command qemu: Don't set VIR_MIGRATE_PAUSED for post-copy resume qemu: Implement VIR_MIGRATE_POSTCOPY_RESUME for Begin phase qemu: Refactor qemuMigrationSrcPerformPhase qemu: Separate starting migration from qemuMigrationSrcRun qemu: Add support for 'resume' parameter of migrate QMP command qemu: Implement VIR_MIGRATE_POSTCOPY_RESUME for Perform phase qemu: Implement VIR_MIGRATE_POSTCOPY_RESUME for Confirm phase qemu: Introduce qemuMigrationDstPrepareFresh qemu: Refactor qemuMigrationDstPrepareFresh qemu: Use autoptr for mig in qemuMigrationDstPrepareFresh qemu: Add support for migrate-recover QMP command qemu: Rename qemuMigrationSrcCleanup qemu: Refactor qemuMigrationAnyConnectionClosed qemu: Handle incoming migration in qemuMigrationAnyConnectionClosed qemu: Start a migration phase in qemuMigrationAnyConnectionClosed qemu: Implement VIR_MIGRATE_POSTCOPY_RESUME for Prepare phase qemu: Implement VIR_MIGRATE_POSTCOPY_RESUME for Finish phase qemu: Create completed jobData in qemuMigrationSrcComplete qemu: Register qemuProcessCleanupMigrationJob after Begin phase qemu: Call qemuDomainCleanupAdd from qemuMigrationJobContinue qemu: Implement VIR_MIGRATE_POSTCOPY_RESUME for peer-to-peer migration qemu: Enable support for VIR_MIGRATE_POSTCOPY_RESUME Add virDomainAbortJobFlags public API qemu: Implement virDomainAbortJobFlags Add VIR_DOMAIN_ABORT_JOB_POSTCOPY flag for virDomainAbortJobFlags qemu: Implement VIR_DOMAIN_ABORT_JOB_POSTCOPY flag virsh: Add --postcopy option for domjobabort command NEWS: Add support for post-copy recovery Introduce VIR_JOB_MIGRATION_SAFE job type qemu: Fix VSERPORT_CHANGE event in post-copy migration RFC: qemu: Keep vCPUs paused while migration is in postcopy-paused NEWS.rst | 5 + docs/manpages/virsh.rst | 17 +- examples/c/misc/event-test.c | 3 + include/libvirt/libvirt-domain.h | 26 + src/conf/domain_conf.c | 33 + src/conf/domain_conf.h | 8 + src/driver-hypervisor.h | 5 + src/hypervisor/domain_job.c | 2 + src/hypervisor/domain_job.h | 5 + src/libvirt-domain.c | 83 +- src/libvirt_private.syms | 2 + src/libvirt_public.syms | 5 + src/qemu/qemu_domain.c | 10 +- src/qemu/qemu_domain.h | 6 +- src/qemu/qemu_domainjob.c | 106 +- src/qemu/qemu_domainjob.h | 16 +- src/qemu/qemu_driver.c | 104 +- src/qemu/qemu_migration.c | 2420 ++++++++++++----- src/qemu/qemu_migration.h | 43 +- src/qemu/qemu_monitor.c | 22 + src/qemu/qemu_monitor.h | 10 + src/qemu/qemu_monitor_json.c | 127 +- src/qemu/qemu_monitor_json.h | 7 + src/qemu/qemu_process.c | 405 ++- src/qemu/qemu_process.h | 3 + src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 14 +- src/remote_protocol-structs | 5 + tests/qemumonitorjsontest.c | 32 +- .../migration-in-params-in.xml | 2 +- .../migration-out-nbd-bitmaps-in.xml | 2 +- .../migration-out-nbd-out.xml | 2 +- .../migration-out-nbd-tls-out.xml | 2 +- .../migration-out-params-in.xml | 2 +- tools/virsh-domain-event.c | 3 +- tools/virsh-domain-monitor.c | 1 + tools/virsh-domain.c | 24 +- 37 files changed, 2628 insertions(+), 935 deletions(-) -- 2.35.1