On 09/12/2018 08:55 AM, Jiri Denemark wrote: > The only place where VIR_DOMAIN_EVENT_RESUMED is generated is the RESUME I assume you meant "should be generated" > event handler to make sure we don't generate duplicate events or state > changes. In the worse case the duplicity can revert or cover changes > done by other event handlers. > > For example, after QEMU sent RESUME, BLOCK_IO_ERROR, and STOP events > we could happily mark the domain as running and report > VIR_DOMAIN_EVENT_RESUMED to registered clients. > This patch removes the non QEMU event handling resume processing such as 1. User initiated domain resume 2. Resume after revert from snapshot 3. Resume on migration source after failure detected 4. Resume on migration destination once finished 5. Resume after Shutdown/Reboot or Panic when guest is configured to process a fake reboot. deferring completely to the resume event QEMU will send as a result of the CPUs being successfully started. ... My caveats: 1. I think that list is correct - I may have miss-worded #5 - I forget exactly how that code works until I'm debugging it ;-) 2. The CPUs being successfully restarted is I think the "cont" command being successful. > https://bugzilla.redhat.com/show_bug.cgi?id=1612943 > > Signed-off-by: Jiri Denemark <jdenemar@xxxxxxxxxx> > --- > src/qemu/qemu_driver.c | 13 ------------- > src/qemu/qemu_migration.c | 36 ++++++------------------------------ > src/qemu/qemu_process.c | 10 ++++------ > 3 files changed, 10 insertions(+), 49 deletions(-) > What I type above could be "massaged" into the commit message - so it's "easier" at a glance to understand what's going on. It's of course my wording and making sure I've properly understood what happening without being able to stop by your cube and ask directly ;-). [...] > diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c > index 825a9d399b..4771f26938 100644 > --- a/src/qemu/qemu_migration.c > +++ b/src/qemu/qemu_migration.c [...] > > @@ -5074,13 +5060,8 @@ qemuMigrationDstFinish(virQEMUDriverPtr driver, > goto endjob; > } > > - if (inPostCopy) { > + if (inPostCopy) > doKill = false; > - event = virDomainEventLifecycleNewFromObj(vm, > - VIR_DOMAIN_EVENT_RESUMED, > - VIR_DOMAIN_EVENT_RESUMED_POSTCOPY); > - virObjectEventStateQueue(driver->domainEventState, event); > - } > } > > if (mig->jobInfo) { > @@ -5111,11 +5092,6 @@ qemuMigrationDstFinish(virQEMUDriverPtr driver, > > dom = virGetDomain(dconn, vm->def->name, vm->def->uuid, vm->def->id); > > - event = virDomainEventLifecycleNewFromObj(vm, > - VIR_DOMAIN_EVENT_RESUMED, > - VIR_DOMAIN_EVENT_RESUMED_MIGRATED); > - virObjectEventStateQueue(driver->domainEventState, event); > - > if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_PAUSED) { > virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_USER); > event = virDomainEventLifecycleNewFromObj(vm, For this path 2 events were generated if @inPostCopy == True - one right after the qemuProcessStartCPUs for RESUMED_POSTCOPY and then one for SUSPENDED_PAUSED once qemuMigrationDstWaitForCompletion occurred. IIUC, the changes here when @inPostCopy == true would only generate the POSTCOPY event, but not the MIGRATED event. Or is there something subtle I'm missing. Will the post copy processing generate two resume events? If so, we perhaps should document that. If not, then perhaps this second event that's being deleted needs to be moved inside that inPostCopy condition just above with the note that this one case is "abnormal" (or you could just do the *StartCPUs again with the RESUMED_MIGRATED and close your eyes ;-)). Everything else is fine - it's just this one case. John [...] Curious, for the future, would it be useful to change the FakeReboot path to generate a different detail reason? It's hard to distinguish between someone using virsh resume (or recovery from Save, Dump, Snapshot failure) from that fake reboot path - at least from just the event. I guess it seems VIR_DOMAIN_EVENT_RESUMED_UNPAUSED is just too generic since it's more than just a "normal resume due to admin unpause", but that I would think would be extra given the "scope" of this particular problem. Second curiosity... would it be useful/easy to generate an event on the failure scenario for StartCPUs? Knowing from whence we were called, we should be able to generate a suspended event rather than having the callers decide. -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list