On Thu, Oct 06, 2016 at 01:34:59PM +0200, Peter Krempa wrote: > On Thu, Oct 06, 2016 at 12:26:50 +0200, Kashyap Chamarthy wrote: [...] > > And, libvirt was fixed to use the above field in this commit (available > > in libvirt v1.2.18 or above): > > > > http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=eae5924 -- qemu: > > Update state of block job to READY only if it actually is ready [1] > > > > RFC > > --- > > > > Currently libvirt block APIs (& consequently higher-level applications > > like Nova which use these APIs) rely on polling for job completion via > > libvirt is _not_ polling the data. Libvirt relies on the events from > qemu which are also exposed as libvirt events. Err, you're right. I know you mean the below enum: enum virConnectDomainEventBlockJobStatus { VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0 VIR_DOMAIN_BLOCK_JOB_FAILED = 1 VIR_DOMAIN_BLOCK_JOB_CANCELED = 2 VIR_DOMAIN_BLOCK_JOB_READY = 3 VIR_DOMAIN_BLOCK_JOB_LAST = 4 } Which are used internally in the event processing code in qemuBlockJobEventProcess() [libvirt/src/qemu/qemu_blockjob.c] that is used to update the disk mirroring state based on events from QEMU. > > virDomainGetBlockJobInfo(), which uses QMP `query-block-jobs`, and > > waits for QEMU to report "offset" == "len", which translates to > > libvirt "cur" == "end". Based on this, libvirt can take an action > > (whether to gracefully abort, or pivot to the copy in case of a COPY > > job). > > Again false. Internally we honor the ready state Oops, I do realize that libvirt honors seen the 'ready' boolean -- I even mentioned the libvirt commit message (eae5924) from you in the beginning, which handles it. > and applications like > virsh have code in place that waits for the async events and act after > receiving them. Yep, I remember Eric mentioning the other day that `virsh` is setup to capture libvirt events. So I briefly went through the virshBlockJobWait() in tools/virsh-domain.c where you see that: [...] /* Fallback behaviour is only needed if one or both callbacks could not * be registered */ if (data->cb_id < 0 || data->cb_id2 < 0) { /* If the block job vanishes, synthesize a COMPLETED event */ if (result == 0) { ret = VIR_DOMAIN_BLOCK_JOB_COMPLETED; break; } /* If the block job hits 100%, wait a little while for a possible * event from libvirt unless both callbacks could not be registered * in order to synthesize our own READY event */ if (info.end == info.cur && ((data->cb_id < 0 && data->cb_id2 < 0) || --retries == 0)) { ret = VIR_DOMAIN_BLOCK_JOB_READY; break; } } [...] > > Since QEMU reports the "ready": true field (followed by a > > BLOCK_JOB_READY QMP event). It would be helpful if libvirt expose this > > via an API, so upper layers could instead use that, rather than polling. > > We expose the state of the copy job in the XML and forward the READY > event from qemu to the users. Yep, I see that in the QMP traffic: ----- 2016-10-04 15:54:52.333+0000: 30550: info : qemuMonitorIOProcess:423 : QEMU_MONITOR_IO_PROCESS: mon=0x7fa43000ee60 buf={"timestamp": {"seconds": 1475596492, "microseconds": 333414}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk1", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}} ----- > A running copy job exposes itself in the xml as: > > <disk type='file' device='cdrom'> > <driver name='qemu' type='raw'/> > <source file='/var/lib/libvirt/images/systemrescuecd-x86-4.8.0.iso'/> > <backingStore/> > <mirror type='file' file='/tmp/ble.img' format='raw' job='copy'> > <format type='raw'/> > <source file='/tmp/ble.img'/> > </mirror> > [...] > </disk> > > While the ready copy job is exposed as: > > <disk type='file' device='cdrom'> > <driver name='qemu' type='raw'/> > <source file='/var/lib/libvirt/images/systemrescuecd-x86-4.8.0.iso'/> > <backingStore/> > <mirror type='file' file='/tmp/ble.img' format='raw' job='copy' ready='yes'> > <format type='raw'/> > <source file='/tmp/ble.img'/> > </mirror> > [...] > </disk> Thanks for the XML snippets. > Additionally we have anyncrhronous events that are emitted once qemu > notifies us that the block job has reached sync state or finished. > Libvirt uses the event to switch to the ready state. > > The documentation suggests that block jobs should listen to the events > and act accordingly only after receiving the event. > > > > Problem scenario > > ---------------- > > > > When virDomainBlockRebase() is invoked to start a copy job, then > > aborting the said copy operation with virDomainBlockJobAbort() + flag > > VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT can result in a potential race > > condition (due to the way the virDomainGetBlockJobInfo() reports the job > > status) where the pivot operation fails. > > Again, this should trigger the block job event and that should be used > as the marker to perform the pivot operation. > > $ virsh event --all --loop > event 'block-job-2' for domain hp: Block Copy for hda ready <- after the sync phase is reached > event 'block-job-2' for domain hp: Block Copy for hda completed <- after successfully pivoting > > Applications should act only after receiving such event. Okay, Eric also has suggested > > > Race condition window > > ~~~~~~~~~~~~~~~~~~~~~ > > > > libvirt finds cur==end AND sends a pivot request, all in the window > > before QEMU would have sent "ready": true field [emitted as part of the > > This is not true. Libvirt checks that the mirror is actually ready. It's > done by the commit you've mentioned above. Right, because I've independently tested this with libvirt and have seen the correct behavior. I meant to write: s/libvirt finds/Nova libvirt driver finds/. > > QMP `query-block-jobs` command's response, indicating that the job has > > actually completed], however the pivot request fails because it requires > > "ready": true. > > The problem is that you are polling the block job info which correctly > reports that all data was properly copied and you are inferring the > block job state from that data. Indeed. We should listen to block job events. > I'm against deliberately reporting false data in the block info > structure. Yes, you're not alone in that, Nova developer (Matt Booth, CCed) also expressed a similar concern, albeit not as strongly as you. > The application should register handlers for the block job events and > act only if it receives such event. Additionally you may want to check > that the state is correct in the XML. The current block job information > structure can't be extended unfortunately. Okay, we sort of came to the similar conclusion (in a discussion on the upstream Nova IRC channel) to rely on events, rather than polling for cur/end values. > Please look at the virsh handling of the block jobs for inspiration > since we have example code doing the right thing here. Yes, looking already at them. Thanks! -- /kashyap -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list