Re: [PATCH] qemu: fix state change lock held by remoteDispatchDomainBlockJobAbort forever

Jiri Denemark <jdenemar@xxxxxxxxxx> · Thu, 1 Sep 2016 12:20:08 +0200

On Thu, Sep 01, 2016 at 14:46:01 +0800, Xiubo Li wrote:
> When doing the snapshot using the script below:
> =========================================
> !#/bin/bash
> virsh blockjob $instance_name vdX --abort
> virsh undefine $instance_name
> qemu-img create -f qcow2 -o backing_file=$diskfile,size=$size $path/$uuid.dlta
> virsh blockcopy --domain $instance_name vdX $path/$uuid.dlta --wait --reuse-external --verbose
> virsh blockjob $instance_name vdX --abort
> qemu-img convert -f qcow2 -O qcow2 $path/$uuid.dlta $path/$uuid.qcow2
> echo "start uploading at $(date)"
> ......
> =========================================
> 
> Sometimes you can encounter the following warning and error logs:
> +++++++++++++++++++++++++++++++++++++++++
> 2016-08-26 07:51:05.685+0000: 8916: warning : qemuDomainObjBeginJobInternal:1571 :
> Cannot start job (query, none) for domain instance-00001b3a; current job is (modify,
> none) owned by (8914 remoteDispatchDomainBlockJobAbort, 0 <null>) for (30s, 0s)
> 
> 2016-08-26 07:51:05.685+0000: 8916: error : qemuDomainObjBeginJobInternal:1583 :
> Timed out during operation: cannot acquire state change lock (held by
> remoteDispatchDomainBlockJobAbort)
> -----------------------------------------
> 
> Mostly it will be okay later. But sometimes the state change lock maybe hold by
> remoteDispatchDomainBlockJobAbort forever, then the instance couldn't be connected
> through the VNC or the network(ping,ssh...), expect after rebooting the instance.
> 
> >From the code, we find that after sending the --abort command, there will be two
> steps by condition waiting the two reply signals:
> +++++++++++++++++++++++++++++++++++++++++
> The first condition wait is:
> In qemuMonitorBlockJobCancel() --> qemuMonitoJSONBlockJobCancel() -->
> qemuMonitorJSONCommand() --> qemuMonitorJSONCommandWithFd() -->
> qemuMonitorSend() --> virCondWait(&mon->notify, &mon->parent.lock).
> 
> The second condition wait is:
> In virDomainCondWait(vm) --> virCondWait(&vm->cond, &vm->parent.lock).
> -----------------------------------------
> 
> The two corresponding replies are:
> +++++++++++++++++++++++++++++++++++++++++
> The "return" reply is:
> QEMU_MONITOR_RECV_REPLY: mon=0x7ff3fc001020 reply={"return": [], "id": "libvirt-338"}
> With the condition wakeup signal:
> virCondBroadcast(&mon->notify)
> 
> The "event" reply is:
> QEMU_MONITOR_RECV_EVENT: mon=0x7fe08401a3f0 event={"timestamp": {"seconds": 1472524518,
> "microseconds": 360135}, "event": "BLOCK_JOB_CANCELLED", "data": {"device":
> "drive-virtio-disk0", "len": 42949672960, "offset": 10485760, "speed": 0, "type": "mirror"}}
> With the condition wakeup signal:
> virDomainObjBroadcast(vm) --> virCondBroadcast(&vm->cond)
> -----------------------------------------
> 
> But sometimes the qemu daemon will reply like:
> +++++++++++++++++++++++++++++++++++++++++
> The "event" reply is:
> QEMU_MONITOR_RECV_EVENT: mon=0x7fe08401a3f0 event={"timestamp": {"seconds": 1472524518,
> "microseconds": 360135}, "event": "BLOCK_JOB_CANCELLED", "data": {"device":
> "drive-virtio-disk0", "len": 42949672960, "offset": 10485760, "speed": 0, "type": "mirror"}}
> With the condition wakeup signal:
> virDomainObjBroadcast(vm) --> virCondBroadcast(&vm->cond)
> 
> The "return" reply is:
> QEMU_MONITOR_RECV_REPLY: mon=0x7ff3fc001020 reply={"return": [], "id": "libvirt-338"}
> With the condition wakeup signal:
> virCondBroadcast(&mon->notify)
> -----------------------------------------
> 
> In this case, when in the first condition wait, the received "event" reply & signal will be
> lost and still waiting for the "return" reply & signal. So the when in the second condition
> wait step, it will wait forever by holding the lock.

Nice analysis.

...
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index f9a3b15..6ebaf6b 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -15929,6 +15929,7 @@ qemuDomainBlockJobAbort(virDomainPtr dom,
>      bool pivot = !!(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT);
>      bool async = !!(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC);
>      virDomainObjPtr vm;
> +    unsigned long long now;
>      int ret = -1;
>  
>      virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC |
> @@ -16018,7 +16019,12 @@ qemuDomainBlockJobAbort(virDomainPtr dom,
>              qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
>              qemuBlockJobUpdate(driver, vm, disk);
>              while (diskPriv->blockjob) {
> -                if (virDomainObjWait(vm) < 0) {
> +                if (virTimeMillisNow(&now) < 0) {
> +                    ret = -1;
> +                    goto endjob;
> +                }
> +
> +                if (virDomainObjWaitUntil(vm, now + BLOCK_JOB_ABORT_TIMEOUT) < 0) {
>                      ret = -1;
>                      goto endjob;
>                  }

However, the patch is wrong. We should just never attempt to wait for
the event if we already got it. In other words, diskPriv->blockjob
should have already be false at this point.

Jirka

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list