Some mgmt still use polling for block job completion. After job completion the job failure/success is infered by inspecting domain xml. With legacy block job processing this does not always work. The issue deals with how libvirt processes events. If no other thread is waiting for blockjob event then event processing if offloaded to worker thread. If now virDomainGetBlockJobInfo API is called then as block job is already dismissed in legacy scheme the API returns 0 but backing chain is not yet updated as processing yet be done in worker thread. Now mgmt checks backing chain right after return from the API call and detects error. This happens quite often under load. I guess because of we have only one worker thread for all the domains. The event delivery is synchronous in qemu and block job completed event is sent in job finalize step so if block job is absent the event is already delivered and we just need to process it. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> --- src/qemu/qemu_driver.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 05917eb..25f66df 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14740,8 +14740,15 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, ret = qemuMonitorGetBlockJobInfo(qemuDomainGetMonitor(vm), job->name, &rawInfo); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; - if (ret <= 0) + if (ret < 0) + goto endjob; + if (ret == 0) { + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV)) + qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE); goto endjob; + } if (qemuBlockJobInfoTranslate(&rawInfo, info, disk, flags & VIR_DOMAIN_BLOCK_JOB_INFO_BANDWIDTH_BYTES) < 0) { -- 1.8.3.1