On Tue, May 10, 2022 at 17:20:35 +0200, Jiri Denemark wrote: > When post-copy migration fails, we can't just abort the migration and > resume the domain on the source host as it is already running on the > destination host and no host has a complete state of the domain memory. > Instead of the current approach of just marking the domain on both ends > as paused/running with a post-copy failed sub state, we will keep the > migration job active (even though the migration API will return failure) > so that the state is more visible and we can better control what APIs > can be called on the domains and even allow for resuming the migration. > > Signed-off-by: Jiri Denemark <jdenemar@xxxxxxxxxx> > --- > src/qemu/qemu_migration.c | 94 ++++++++++++++++++++++++++++----------- > 1 file changed, 68 insertions(+), 26 deletions(-) > @@ -5445,11 +5479,12 @@ qemuMigrationSrcPerformPhase(virQEMUDriver *driver, > goto endjob; > > endjob: > - if (ret < 0) { > + if (ret < 0 && !virDomainObjIsFailedPostcopy(vm)) { > qemuMigrationParamsReset(driver, vm, VIR_ASYNC_JOB_MIGRATION_OUT, > jobPriv->migParams, priv->job.apiFlags); > qemuMigrationJobFinish(vm); > } else { > + qemuDomainCleanupAdd(vm, qemuProcessCleanupMigrationJob); > qemuMigrationJobContinue(vm); > } This logic change is a bit obscure and IMO would benefit from a comment stating that we want to continue all post-copy migration jobs and all successful other migrations. Reviewed-by: Peter Krempa <pkrempa@xxxxxxxxxx>