Re: [PATCH 2/3] qemu: event: Don't fiddle with disk backing trees without a job

Eric Blake <eblake@xxxxxxxxxx> · Fri, 13 Mar 2015 13:31:58 -0600

On 03/13/2015 10:25 AM, Peter Krempa wrote:
> Surprisingly we did not grab a VM job when a block job finished and we'd
> happily rewrite the backing chain data. This made it possible to crash
> libvirt when queueing two backing chains tightly and other badness.

My fault for violating the rule of 'no VM modifications without a job'.
 Thanks for finding and fixing this.

> 
> To fix it, add yet another handler to the helper thread that handles
> monitor events that require a job.
> ---
>  src/qemu/qemu_domain.h  |   2 +
>  src/qemu/qemu_driver.c  | 142 ++++++++++++++++++++++++++++++++++++++++++++++++
>  src/qemu/qemu_process.c | 129 ++++++++-----------------------------------
>  3 files changed, 168 insertions(+), 105 deletions(-)
> 

> +processBlockJobEvent(virQEMUDriverPtr driver,
> +                     virDomainObjPtr vm,
> +                     char *diskAlias,
> +                     int type,
> +                     int status)
> +{
> +    virObjectEventPtr event = NULL;
> +    virObjectEventPtr event2 = NULL;
> +    const char *path;
> +    virDomainDiskDefPtr disk;
> +    virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
> +    virDomainDiskDefPtr persistDisk = NULL;
> +    bool save = false;
> +
> +    if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0)
> +        goto cleanup;
> +
> +    if (!virDomainObjIsActive(vm)) {
> +        VIR_DEBUG("Domain is not running");
> +        goto endjob;
> +    }

Hmm. I suspect we could hit the situation where the pivot after active
block commit or block copy happens just before the guest quits, and thus
where we fail to update the XML to record that the pivot happened and
instead leave the XML in its old state.  Probably not the end of the
world (at that point, the amount of work done by the guest after pivot
is not much, so restarting the guest from the unpivoted disk is
hopefully not inconsistent); except that a guest shutting down might
make the small modification of marking a file system clean that turns
out to have a large impact on how the guest next starts if it starts
from the wrong disk.  I don't know if we can do any better, and at a
minimum this is a strict improvement over what we had before, so I'm not
going to reject the patch because of it, but it is food for thought.

[Actually, we probably need the notion of persistent bitmaps, which
looks like it might hit qemu 2.4, before we can guarantee that we KNOW
when a pivot job completed or failed, and thus always update the XML
correctly, based on the state of the persistent bitmap file]

> 
>      virObjectLock(vm);
> -    disk = qemuProcessFindDomainDiskByAlias(vm, diskAlias);
> 
> -    if (disk) {

The old code was locking the VM, but modifying without a job.

> +    processEvent->eventType = QEMU_PROCESS_EVENT_BLOCK_JOB;
> +    if (VIR_STRDUP(data, diskAlias) < 0)
> +        goto error;
> +    processEvent->data = data;
> +    processEvent->vm = vm;
> +    processEvent->action = type;
> +    processEvent->status = status;
> 
> -        case VIR_DOMAIN_BLOCK_JOB_LAST:
> -            break;
> -        }
> +    virObjectRef(vm);
> +    if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) {
> +        ignore_value(virObjectUnref(vm));
> +        goto error;

The new code is now throwing things over the fence to a helper thread,
so that blocking while waiting for the job is no longer holding up
monitor interactions.  Looks correct to me.

ACK.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment:
signature.asc

Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list