Re: [PATCH 1/1] qemu: sync blockjob finishing in qemuDomainGetBlockJobInfo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 27, 2019 at 17:19:18 +0300, Nikolay Shirokovskiy wrote:
> Due to race qemuDomainGetBlockJobInfo can return there is
> no block job for disk but later call to spawn new blockjob
> can fail because libvirt internally still not process blockjob
> finishing. Thus let's wait for blockjob finishing if we
> report there is no more blockjob.

Could you please elaborate how this happened? e.g. provide some logs?

Polling with qemuDomainGetBlockJobInfo will always be racy and if the
problem is that diskPriv->blockjob is allocated but qemu didn't report
anything I suspect the problem is somewhere else.


> 
> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx>
> ---
>  src/qemu/qemu_driver.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 669c12d6ca..b148df3a57 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -17785,12 +17785,26 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom,
>          goto endjob;
>      }
>  
> +    qemuBlockJobSyncBegin(job);
> +
>      qemuDomainObjEnterMonitor(driver, vm);
>      ret = qemuMonitorGetBlockJobInfo(qemuDomainGetMonitor(vm), job->name, &rawInfo);
>      if (qemuDomainObjExitMonitor(driver, vm) < 0)
>          ret = -1;
> -    if (ret <= 0)
> +    if (ret < 0)
> +        goto endjob;
> +
> +    if (ret == 0) {

So if qemu returns that there is no blockjob ...

> +        qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE);
> +        while (qemuBlockJobIsRunning(job)) {

but we think it's still running it's either that the event arrived but
wasn't processed yet. In that case this would help, but the outcome
would not differ much from the scenario if the code isn't here as the
event would be processed later.


> +            if (virDomainObjWait(vm) < 0) {
> +                ret = -1;
> +                goto endjob;
> +            }
> +            qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE);

If there's a bug and qemu doesn't know that there's a job but libvirt
thinks there's one but qemu is right, this will lock up here as the
event will never arrive.

> +        }
>          goto endjob;

My suggested solution would be to fake the job from the data in 'job'
rather than attempt to wait in this case e.g.

if (ret == 0) {
    rawStats.type = job->type;
    rawStats.ready = 0;
}

and then make it to qemuBlockJobInfoTranslate which sets some fake
progress data so that the job looks active.

That way you get a response that there's a job without the potential to
deadlock.

Attachment: signature.asc
Description: PGP signature

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux