Re: [PATCH 2/2] qemu: completely rework reference counting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/03/14 06:49, Martin Kletzander wrote:
> There is one problem that causes various errors in the daemon.  When 
> domain is waiting for a job, it is unlocked while waiting on the 
> condition.  However, if that domain is for example transient and 
> being removed in another API (e.g. cancelling incoming migration), it
> get's unref'd.  If the first call, that was waiting, fails to get the
> job, it unref's the domain object, and because it was the last 
> reference, it causes clearing of the whole domain object.  However, 
> when finishing the call, the domain must be unlocked, but there is no
> way for the API to know whether it was cleaned or not (unless there
> is some ugly temporary variable, but let's scratch that).
> 
> The root cause is that our APIs don't ref the objects they are using
>  and all use the implicit reference that the object has when it is in
>  the domain list.  That reference can be removed when the API is 
> waiting for a job.  And because each domain doesn't do its ref'ing, 
> it results in the ugly checking of the return value of 
> virObjectUnref() that we have everywhere.
> 
> This patch changes qemuDomObjFromDomain() to ref the domain (using 
> virDomainObjListFindByUUIDRef()) and adds qemuDomObjEndAPI() which 
> should be the only function in which the return value of 
> virObjectUnref() is checked.  This makes all reference counting 
> deterministic and makes the code a bit clearer.
> 
> Signed-off-by: Martin Kletzander <mkletzan@xxxxxxxxxx> --- 
> src/qemu/THREADS.txt      |  40 ++- src/qemu/qemu_domain.c    |  28 
> +- src/qemu/qemu_domain.h    |  12 +- src/qemu/qemu_driver.c    | 708
> ++++++++++++++++------------------------------ 
> src/qemu/qemu_migration.c | 108 +++---- src/qemu/qemu_migration.h | 
> 10 +- src/qemu/qemu_process.c   | 126 ++++----- 7 files changed, 379
>  insertions(+), 653 deletions(-)
> 

...

> index 08d6b7c..a0582d3 100644 --- a/src/qemu/qemu_process.c +++ 
> b/src/qemu/qemu_process.c

...

> @@ -3756,6 +3736,9 @@ qemuProcessReconnectHelper(virDomainObjPtr
> obj, struct qemuProcessReconnectData *src = opaque; struct 
> qemuProcessReconnectData *data;
> 
> +    virObjectLock(obj); +    virObjectRef(obj); + if (!obj->pid) 
> return 0;
> 

This hunk causes deadlock of the daemon when starting if you have at
least one inactive VM. You lock the object, find out that the pid 
is 0 and return without unlocking.

The daemon then locks up when trying to reload snapshot list for 
the domain as the domain is still left locked:

Thread 2 (Thread 0x7fd6165f1700 (LWP 847791)):
#0  0x00007fd62c16faf4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fd62c16b459 in _L_lock_535 () from /lib64/libpthread.so.0
#2  0x00007fd62c16b280 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fd62f99b453 in virMutexLock (m=0x7fd61c0e4810) at util/virthread.c:88
#4  0x00007fd62f97cdc6 in virObjectLock (anyobj=0x7fd61c0e4800) at util/virobject.c:323
#5  0x00007fd617672af5 in qemuDomainSnapshotLoad (vm=0x7fd61c0e4800, data=0x7fd61c0023f0) at qemu/qemu_driver.c:474
#6  0x00007fd62f9f20b8 in virDomainObjListHelper (payload=0x7fd61c0e4800, name=0x7fd61c0f0de0, opaque=0x7fd6165f0810) at conf/domain_conf.c:20454
#7  0x00007fd62f9559be in virHashForEach (table=0x7fd61c008050, iter=0x7fd62f9f2072 <virDomainObjListHelper>, data=0x7fd6165f0810) at util/virhash.c:521
#8  0x00007fd62f9f213e in virDomainObjListForEach (doms=0x7fd61c03bdd0, callback=0x7fd617672a55 <qemuDomainSnapshotLoad>, opaque=0x7fd61c0023f0) at conf/domain_conf.c:20467
#9  0x00007fd61767445c in qemuStateInitialize (privileged=true, callback=0x7fd630584ab2 <daemonInhibitCallback>, opaque=0x7fd63258dba0) at qemu/qemu_driver.c:877
#10 0x00007fd62fa5f05a in virStateInitialize (privileged=true, callback=0x7fd630584ab2 <daemonInhibitCallback>, opaque=0x7fd63258dba0) at libvirt.c:742
#11 0x00007fd630584b7a in daemonRunStateInit (opaque=0x7fd63258dba0) at libvirtd.c:918
#12 0x00007fd62f99b8b4 in virThreadHelper (data=0x7fd6325b4480) at util/virthread.c:197
#13 0x00007fd62c169023 in start_thread () from /lib64/libpthread.so.0
#14 0x00007fd62bea470d in clone () from /lib64/libc.so.6




> @@ -3780,8 +3763,6 @@ qemuProcessReconnectHelper(virDomainObjPtr
> obj, * this early phase. */
> 
> -    virObjectLock(obj); - qemuDomainObjRestoreJob(obj, 
> &data->oldjob);
> 
> if (qemuDomainObjBeginJob(src->driver, obj, QEMU_JOB_MODIFY) < 0)

Rest of the review will follow once I find a way to solve that issue.

Peter


Attachment: signature.asc
Description: OpenPGP digital signature

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]