On Thu, May 12, 2022 at 13:50:08 +0200, Peter Krempa wrote: > On Tue, May 10, 2022 at 17:21:11 +0200, Jiri Denemark wrote: > > This phase marks a migration protocol as broken in a post-copy phase. > > Libvirt is no longer actively watching the migration in this phase as > > the migration API that started the migration failed. > > > > This may either happen when post-copy migration really fails (QEMU > > enters postcopy-paused migration state) or when the migration still > > progresses between both QEMU processes, but libvirt lost control of it > > because the connection between libvirt daemons (in p2p migration) or a > > daemon and client (non-p2p migration) was closed. For example, when one > > of the daemons was restarted. > > > > Signed-off-by: Jiri Denemark <jdenemar@xxxxxxxxxx> > > --- > > src/qemu/qemu_migration.c | 15 +++++++++++---- > > src/qemu/qemu_process.c | 16 +++++++++++++--- > > 2 files changed, 24 insertions(+), 7 deletions(-) > > > > [...] > > > > @@ -6327,9 +6334,9 @@ qemuMigrationProcessUnattended(virQEMUDriver *driver, > > vm->def->name); > > > > if (job == VIR_ASYNC_JOB_MIGRATION_IN) > > - phase = QEMU_MIGRATION_PHASE_FINISH3; > > + phase = QEMU_MIGRATION_PHASE_FINISH_RESUME; > > else > > - phase = QEMU_MIGRATION_PHASE_CONFIRM3; > > + phase = QEMU_MIGRATION_PHASE_CONFIRM_RESUME; > > > > if (qemuMigrationJobStartPhase(vm, phase) < 0) > > return; > > This hunk seems to be misplaced or at least doesn't really seem to be > related to anything this patch is claiming to do. Thanks to changes in this patch migration is in QEMU_MIGRATION_PHASE_POSTCOPY_FAILED phase when we get here so to make it all work, we need to use RESUME phases > QEMU_MIGRATION_PHASE_POSTCOPY_FAILED. Otherwise qemuMigrationCheckPhase would complain. Perhaps this could be moved to "Add new migration phases for post-copy recovery", but I haven't checked for sure. > > [..] > > > @@ -3751,9 +3752,18 @@ qemuProcessRecoverMigration(virQEMUDriver *driver, > > return -1; > > > > if (rc > 0) { > > + job->phase = QEMU_MIGRATION_PHASE_POSTCOPY_FAILED; > > + > > if (migStatus == VIR_DOMAIN_JOB_STATUS_POSTCOPY) { > > VIR_DEBUG("Post-copy migration of domain %s still running, it " > > "will be handled as unattended", vm->def->name); > > + > > + if (state == VIR_DOMAIN_RUNNING) > > + reason = VIR_DOMAIN_RUNNING_POSTCOPY; > > + else > > + reason = VIR_DOMAIN_PAUSED_POSTCOPY; > > This bit also doesn't seem to be justified by what this patch is > supposed to do. Until now broken migration protocol in post-copy phase has been indicated with VIR_DOMAIN_*_POSTCOPY_FAILED states, which changed in this patch and we can use the right state here depending on the QEMU state. That said, it could be moved into a separate patch. Jirka