Backround --------- For QEMU block device jobs, the "ready" boolean field (part of QMP `query-block-jobs`) was introduced in commit ef6dbf1 (available in QEMU v2.2.0 or above): http://git.qemu.org/?p=qemu.git;a=commitdiff;h=ef6dbf1e4 -- blockjob: Add "ready" field "When a block job signals readiness, this is currently reported only through QMP. If qemu wants to use block jobs for internal tasks, there needs to be another way to correctly detect when a block job may be completed. For this reason, introduce a bool "ready" which is set when the block job may be completed." And, libvirt was fixed to use the above field in this commit (available in libvirt v1.2.18 or above): http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=eae5924 -- qemu: Update state of block job to READY only if it actually is ready RFC --- Currently libvirt block APIs (& consequently higher-level applications like Nova which use these APIs) rely on polling for job completion via virDomainGetBlockJobInfo(), which uses QMP `query-block-jobs`, and waits for QEMU to report "offset" == "len", which translates to libvirt "cur" == "end". Based on this, libvirt can take an action (whether to gracefully abort, or pivot to the copy in case of a COPY job). Since QEMU reports the "ready": true field (followed by a BLOCK_JOB_READY QMP event). It would be helpful if libvirt expose this via an API, so upper layers could instead use that, rather than polling. Problem scenario ---------------- When virDomainBlockRebase() is invoked to start a copy job, then aborting the said copy operation with virDomainBlockJobAbort() + flag VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT can result in a potential race condition (due to the way the virDomainGetBlockJobInfo() reports the job status) where the pivot operation fails. Race condition window ~~~~~~~~~~~~~~~~~~~~~ libvirt finds cur==end AND sends a pivot request, all in the window before QEMU would have sent "ready": true field [emitted as part of the QMP `query-block-jobs` command's response, indicating that the job has actually completed], however the pivot request fails because it requires "ready": true. So Eric Blake suggests: QEMU 2.0 or 1.x probably had a synchronous setup where you could never observer cur==end on a non-ready job. But I don't remember enough history to point to when QEMU switched jobs to be a bit more asynchronous. Maybe there was no qemu regression - maybe it was BECAUSE of other block-job additions in 2.2 that offset==len was no longer reliable. I don't know that for sure. But what it DOES sound like is that IF qemu reports "ready": false, offset==len is not reliable, and libvirt should be taught to fudge that. And hopefully, QEMU too old to report "ready:" at all is reliable with regards to offset==len, because that's all we have to go by. For now, I filed this upstream libvirt bug: https://bugzilla.redhat.com/show_bug.cgi?id=1382165 -- virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the "ready" field of `query-block-jobs` However, exposing the "ready" boolean from QMP `query-block-jobs` might be worth considering. -- /kashyap -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list