Although qemuDomainRevertToSnapshot() correctly begins a job at the start, the job is implicitly ended if qemuProcessStop() is called because it lives in the QEMU driver's private data that is purged during qemuProcessStop(). This commit restores the job by calling qemuProcessBeginJob() after qemuProcessStop() invocations. The first chunk is meant to fix a reported bug (see below). The bug's reproducibility on my machine was initially way lower than the reported 50% (more like 5%) but could be boosted to pretty much 100% by having virt-manager connected, displaying the testing domain in viewer. With the fix, the bug cannot be reproduced on my machine under any scenario I could think of. The actual crash was due to the thread performing revert which, not acquiring a job properly, was able to change the QEMU monitor structure under the thread performing snapshot delete while it was waiting on a condition variable. An interesting question is whether this commit takes the right approach to the fix. In particular, qemuProcessStop() arguably should not clear jobs, in which case the right thing to do would be to fix qemuProcessStop(). However, qemuProcessStop() has about twenty callers, and while none of them seems vulnerable to the kind of problems caused by clearing jobs (unlike qemuDomainRevertToSnapshot() who starts QEMU again right after stopping it), some of them might rely on jobs being cleared (this is not always easy to check conclusively). All in all, this commit's solution seems to be the least bad fix which doesn't require a potentially risky refactor. https://bugzilla.redhat.com/show_bug.cgi?id=1610207 Signed-off-by: Pavel Mores <pmores@xxxxxxxxxx> --- src/qemu/qemu_driver.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 18bd0101e7..b3769cc479 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16637,9 +16637,15 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, VIR_DOMAIN_EVENT_STOPPED, detail); virObjectEventStateQueue(driver->domainEventState, event); - /* Start after stop won't be an async start job, so - * reset to none */ - jobType = QEMU_ASYNC_JOB_NONE; + /* We have to begin a new job as the original one (begun + * near the top of this function) was lost during the purge + * of private data in qemuProcessStop() above. + */ + if (qemuProcessBeginJob(driver, vm, + VIR_DOMAIN_JOB_OPERATION_SNAPSHOT_REVERT, + flags) < 0) { + goto cleanup; + } goto load; } } @@ -16774,6 +16780,11 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, detail); + if (qemuProcessBeginJob(driver, vm, + VIR_DOMAIN_JOB_OPERATION_SNAPSHOT_REVERT, + flags) < 0) { + goto cleanup; + } } if (qemuDomainSnapshotRevertInactive(driver, vm, snap) < 0) { -- 2.21.0 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list