Re: [PATCHv4] qemu: allow blkstat/blkinfo calls during migration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/20/2011 03:11 PM, Paolo Bonzini wrote:
> On 05/16/2011 07:30 PM, Eric Blake wrote:
>> +        while (priv->jobSignals&  QEMU_JOB_SIGNAL_BLKSTAT)
>> +            ignore_value(virCondWait(&priv->signalCond,&vm->lock));
>> +
>> +        priv->jobSignalsData.statDevName = disk->info.alias;
>> +        priv->jobSignalsData.blockStat = stats;
>> +        priv->jobSignalsData.statRetCode = -1;
>> +        priv->jobSignals |= QEMU_JOB_SIGNAL_BLKSTAT;
>> +
>> +        while (priv->jobSignals&  QEMU_JOB_SIGNAL_BLKSTAT)
>> +            ignore_value(virCondWait(&priv->signalCond,&vm->lock));
> 
> I'm not sure that the original problem with jobSignals is fixed (the 
> one that you solved by splitting the retcode field).  A similar race 
> can happen with two threads, both requesting blkstat.
> 
> Thread 1 starts a migration.
> Thread 2 queries BlockStats, waits for the condition variable and
>     priv->jobSignals to be 0.
> Thread 2 sets priv->jobSignals to non-zero, then waits for condition
>     variable and priv->jobSignals to clear BLKSTATS.
> Thread 1 processes thread 2's query, and clears priv->jobSignals
> Thread 3 sees priv->jobSignals == 0 and queries BlockStats.
> Thread 1 processes thread 3's query, and clears priv->jobSignals
> Thread 2 gets condition lock, but sees the answer intended for
> thread 3.

I think we're safe.  It shouldn't matter if thread 2 reads thread 3's
answer (because they are both read-only queries, and should be getting
the same answer; or even if the answers differ, reading the newer answer
is not horrible because it is still an accurate answer).  The harder
problem is if thread 3 starts overwriting the result area to NULL before
setting the jobSignals request, and before thread 2 can read the result
area; but we aren't doing a memset (that is, the query code is only ever
reading the result data, and it is only ever being modified while
jobSignals is set).

However,

> 
> I think the code should use two condition variables.  signalCond is 
> broadcast when you _can start_ a job, resultCond is broadcast when a 
> job has finished.

you probably have a point here.  Using two variables will make it much
easier to prove correctness, especially if we add other interfaces.

-- 
Eric Blake   eblake@xxxxxxxxxx    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]