A pretty nasty deadlock occurs while trying to rename a VM in parallel with virDomainObjListNumOfDomains. The short description of the problem is as follows: Thread #1: qemuDomainRename: ------> aquires domain lock by qemuDomObjFromDomain ---------> waits for domain list lock in any of the listed functions: - virDomainObjListFindByName - virDomainObjListRenameAddNew - virDomainObjListRenameRemove Thread #2: virDomainObjListNumOfDomains: ------> aquires domain list lock ---------> waits for domain lock in virDomainObjListCount Consider another solution for the deadlock besides those proposed by Maxim and Michael. Simply unlock before calling any function that internally grabs domains list lock. First it looks correct. We use qemuDomainObjBeginJob/qemuDomainObjEndJob to wrap the operation so no one will run another operation on that domain in parallel. In THREADS.txt this unlocking/locking is described explicitly for the cases when we need to wait/sleep. Here we use same means just to different purpose. Also called functions do not use domain internals in any way. Second I find approach proposed by Michael difficult to implement correctly. Imagine we write such a generic rename function with callbacks. Then it would look like this: lock domains list ... add new name to list unlock domain list call callback to change driver state lock domain list remove old or new name (depends on callback result) unlock domain list We need to unlock domain list or we block operations on all domains while we wait for job conditions in the callback. But now we are in potential trouble as we leaving list in an inconsistent state. We can't move remove operation before calling callback or we will be in trouble on callback failures - we would need to add the old name back and this is not always possible. Current rename implementation leaves config with new name on disk in worst case on rollback but does not break the list. --- src/qemu/qemu_driver.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index abcdbe6..26a8fb0 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -19987,6 +19987,7 @@ static int qemuDomainRename(virDomainPtr dom, virObjectEventPtr event_new = NULL; virObjectEventPtr event_old = NULL; int ret = -1; + int rc; char *new_dom_name = NULL; char *old_dom_name = NULL; char *old_dom_cfg_file = NULL; @@ -20036,10 +20037,11 @@ static int qemuDomainRename(virDomainPtr dom, /* * This is a rather racy check, but still better than reporting - * internal error. And since new_name != name here, there's no - * deadlock imminent. + * internal error. */ + virObjectUnlock(vm); tmp_dom = virDomainObjListFindByName(driver->domains, new_name); + virObjectLock(vm); if (tmp_dom) { virObjectUnlock(tmp_dom); virObjectUnref(tmp_dom); @@ -20057,7 +20059,10 @@ static int qemuDomainRename(virDomainPtr dom, goto endjob; } - if (virDomainObjListRenameAddNew(driver->domains, vm, new_name) < 0) + virObjectUnlock(vm); + rc = virDomainObjListRenameAddNew(driver->domains, vm, new_name); + virObjectLock(vm); + if (rc < 0) goto endjob; event_old = virDomainEventLifecycleNewFromObj(vm, @@ -20081,7 +20086,9 @@ static int qemuDomainRename(virDomainPtr dom, } /* Remove old domain name from table. */ + virObjectUnlock(vm); virDomainObjListRenameRemove(driver->domains, old_dom_name); + virObjectLock(vm); event_new = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_DEFINED, @@ -20110,7 +20117,9 @@ static int qemuDomainRename(virDomainPtr dom, old_dom_name = NULL; } + virObjectUnlock(vm); virDomainObjListRenameRemove(driver->domains, new_name); + virObjectLock(vm); goto endjob; } -- 1.8.3.1 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list