On 03/13/2015 10:25 AM, Peter Krempa wrote: > Surprisingly we did not grab a VM job when a block job finished and we'd > happily rewrite the backing chain data. This made it possible to crash > libvirt when queueing two backing chains tightly and other badness. My fault for violating the rule of 'no VM modifications without a job'. Thanks for finding and fixing this. > > To fix it, add yet another handler to the helper thread that handles > monitor events that require a job. > --- > src/qemu/qemu_domain.h | 2 + > src/qemu/qemu_driver.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++ > src/qemu/qemu_process.c | 129 ++++++++----------------------------------- > 3 files changed, 168 insertions(+), 105 deletions(-) > > +processBlockJobEvent(virQEMUDriverPtr driver, > + virDomainObjPtr vm, > + char *diskAlias, > + int type, > + int status) > +{ > + virObjectEventPtr event = NULL; > + virObjectEventPtr event2 = NULL; > + const char *path; > + virDomainDiskDefPtr disk; > + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); > + virDomainDiskDefPtr persistDisk = NULL; > + bool save = false; > + > + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) > + goto cleanup; > + > + if (!virDomainObjIsActive(vm)) { > + VIR_DEBUG("Domain is not running"); > + goto endjob; > + } Hmm. I suspect we could hit the situation where the pivot after active block commit or block copy happens just before the guest quits, and thus where we fail to update the XML to record that the pivot happened and instead leave the XML in its old state. Probably not the end of the world (at that point, the amount of work done by the guest after pivot is not much, so restarting the guest from the unpivoted disk is hopefully not inconsistent); except that a guest shutting down might make the small modification of marking a file system clean that turns out to have a large impact on how the guest next starts if it starts from the wrong disk. I don't know if we can do any better, and at a minimum this is a strict improvement over what we had before, so I'm not going to reject the patch because of it, but it is food for thought. [Actually, we probably need the notion of persistent bitmaps, which looks like it might hit qemu 2.4, before we can guarantee that we KNOW when a pivot job completed or failed, and thus always update the XML correctly, based on the state of the persistent bitmap file] > > virObjectLock(vm); > - disk = qemuProcessFindDomainDiskByAlias(vm, diskAlias); > > - if (disk) { The old code was locking the VM, but modifying without a job. > + processEvent->eventType = QEMU_PROCESS_EVENT_BLOCK_JOB; > + if (VIR_STRDUP(data, diskAlias) < 0) > + goto error; > + processEvent->data = data; > + processEvent->vm = vm; > + processEvent->action = type; > + processEvent->status = status; > > - case VIR_DOMAIN_BLOCK_JOB_LAST: > - break; > - } > + virObjectRef(vm); > + if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { > + ignore_value(virObjectUnref(vm)); > + goto error; The new code is now throwing things over the fence to a helper thread, so that blocking while waiting for the job is no longer holding up monitor interactions. Looks correct to me. ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list