On Wed, May 18, 2022 at 15:40:18 +0200, Jiri Denemark wrote: > On Thu, May 12, 2022 at 14:14:15 +0200, Peter Krempa wrote: > > On Tue, May 10, 2022 at 17:21:15 +0200, Jiri Denemark wrote: > > > Mostly we just need to check whether the domain is in a failed post-copy > > > migration that can be resumed. > > > > > > Signed-off-by: Jiri Denemark <jdenemar@xxxxxxxxxx> > > > --- > > > src/qemu/qemu_migration.c | 143 ++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 143 insertions(+) > > > > > > diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c > > > index c111dd8686..99b1d4b88b 100644 > > > --- a/src/qemu/qemu_migration.c > > > +++ b/src/qemu/qemu_migration.c > > > @@ -2717,6 +2717,143 @@ qemuMigrationSrcBeginPhase(virQEMUDriver *driver, > > > flags); > > > } > > > > > > + > > > +static bool > > > +qemuMigrationAnyCanResume(virDomainObj *vm, > > > + virDomainAsyncJob job, > > > + unsigned long flags, > > > + qemuMigrationJobPhase expectedPhase) > > > +{ > > > + qemuDomainObjPrivate *priv = vm->privateData; > > > + > > > + VIR_DEBUG("vm=%p, job=%s, flags=0x%lx, expectedPhase=%s", > > > + vm, virDomainAsyncJobTypeToString(job), flags, > > > + qemuDomainAsyncJobPhaseToString(VIR_ASYNC_JOB_MIGRATION_OUT, > > > + expectedPhase)); > > > + > > > + if (!(flags & VIR_MIGRATE_POSTCOPY)) { > > > + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", > > > + _("resuming failed post-copy migration requires " > > > + "post-copy to be enabled")); > > > > Error messages shall not have line breaks. > > > > Also ... requiring this flag seems to be a bit pointless and ... > > cosmetic at best. Since the previous migration would need to be started > > with the post-copy flag already which is checked below. > > Right, the previous migration must have been started as post-copy, but > resuming it is post-copy too. We could have implied the flag based on > POSTCOPY_RESUME (which is ugly) or all relevant places checking POSTCOPY > would need to check POSTCOPY_RESUME too (which is ugly as well). And > with the expected use case (repeating the failed migration command with > all the flags and arguments and adding an extra POSTCOPY_RESUME flag) in > mind requiring POSTCOPY to be used with POSTCOPY_RESUME looks like the > the best option to me. I was kind of referring to the fact that it's implied that postcopy is being resumed with a POSTCOPY_RESUME flag and you need to check anyways that the migration is indeed postcopy, so this feels redundant, but I think it's okay. > > > > > + return false; > > > + } > > > + > > > + /* This should never happen since POSTCOPY_RESUME is newer than > > > + * CHANGE_PROTECTION, but let's check it anyway in case we're talking to > > > + * a weired client. > > > + */ > > > + if (job == VIR_ASYNC_JOB_MIGRATION_OUT && > > > + expectedPhase < QEMU_MIGRATION_PHASE_PERFORM_RESUME && > > > + !(flags & VIR_MIGRATE_CHANGE_PROTECTION)) { > > > + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", > > > + _("resuming failed post-copy migration requires " > > > + "change protection")); > > > > Same here. > > > > > + return NULL; > > > + } > > > > > > [...] > > > > > +static char * > > > +qemuMigrationSrcBeginResume(virQEMUDriver *driver, > > > + virDomainObj *vm, > > > + const char *xmlin, > > > + char **cookieout, > > > + int *cookieoutlen, > > > + unsigned long flags) > > > +{ > > > + virDomainJobStatus status; > > > + > > > + if (qemuMigrationAnyRefreshStatus(driver, vm, VIR_ASYNC_JOB_MIGRATION_OUT, > > > + &status) < 0) > > > + return NULL; > > > + > > > + if (status != VIR_DOMAIN_JOB_STATUS_POSTCOPY_PAUSED) { > > > + virReportError(VIR_ERR_OPERATION_INVALID, "%s", > > > + _("QEMU reports migration is still running")); > > > + return NULL; > > > + } > > > + > > > + return qemuMigrationSrcBeginXML(driver, vm, xmlin, > > > + cookieout, cookieoutlen, 0, NULL, 0, flags); > > > > Certain steps here around the non-shared-storage migration are redundant > > at this point and IMO should be skipped. > > Which is what happens in qemuMigrationSrcBegin: > > if (flags & VIR_MIGRATE_POSTCOPY_RESUME) { > xml = qemuMigrationSrcBeginResumePhase(...); > goto cleanup; > } > > ... > > Or did I miss anything? In 'qemuMigrationSrcBeginXML' qemuMigrationSrcBeginPhaseBlockDirtyBitmaps is called which does a bunch of setup for bitmap migration which is pointless when recovering as storage is copied already if the CPUs are running on the destination.