On Thu, Apr 30, 2015 at 02:38:21PM +0200, Michal Privoznik wrote: > On 30.04.2015 14:04, Erik Skultety wrote: > > The lock is dropped always at the end of an API, but for example when > > attaching devices, there's no point in having the NW filter locked if the > > device being attached isn't a network interface. It's always a nice > > practice to drop unnecessary locks as soon as possible. > > --- > > src/qemu/qemu_driver.c | 20 ++++++++++++++++++-- > > 1 file changed, 18 insertions(+), 2 deletions(-) > > > > BZ 1181074 reported the daemon to hang (not completely true btw) after trying > > to attach /dev/ttyX to a running domain. When going through the code I realized > > we acquire a read lock for NW filter, although the device itself is not a network > > interface and then releasing it at the end. First clue was that this lock which > > won't be unlocked as the thread is blocked in reading header of /dev/ttyX causes > > the freeze. As it turned out, the problem resides in calling virsh list afterwards > > (Peter already does have some patches prepared), but I think the NW filter might > > be released earlier in this case with no harm done. > > > > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > > index 74dcb0a..04887e0 100644 > > --- a/src/qemu/qemu_driver.c > > +++ b/src/qemu/qemu_driver.c > > @@ -8764,6 +8764,7 @@ static int qemuDomainAttachDeviceFlags(virDomainPtr dom, const char *xml, > > virDomainDeviceDefPtr dev = NULL, dev_copy = NULL; > > int ret = -1; > > unsigned int affect, parse_flags = VIR_DOMAIN_DEF_PARSE_INACTIVE; > > + bool nwfilter_locked = false; > > virQEMUCapsPtr qemuCaps = NULL; > > qemuDomainObjPrivatePtr priv; > > virQEMUDriverConfigPtr cfg = NULL; > > @@ -8773,6 +8774,7 @@ static int qemuDomainAttachDeviceFlags(virDomainPtr dom, const char *xml, > > VIR_DOMAIN_AFFECT_CONFIG, -1); > > > > virNWFilterReadLockFilterUpdates(); > > + nwfilter_locked = true; > > Wait a while. Whenever a programmer needs to know if a lock is locked, > something is wrong with the design. That's why POSIX doesn't define such > function. Then, this is a Read lock, so it won't hold back other > readers, only writers. I don't think there's a real bottle neck unless > you are doing a testcase and attaching thousands of devices and defining > another thousands of NWFilters. > > What I'm more curious is - why the heck does this lock needs to be on > this level? It's very high level. We lock the lock even when we don't > know yet if the XML is right, and what device type we will work on. > Can't the lock be moved to lower layers? We have to be careful of the lock ordering rules - see this note in the commit where I refactoring nwfilter locking: commit 6e5c79a1b5a8b3a23e7df7ffe58fb272aa17fbfb Author: Daniel P. Berrange <berrange@xxxxxxxxxx> Date: Wed Jan 22 17:28:29 2014 +0000 Push nwfilter update locking up to top level The NWFilter code has as a deadlock race condition between the virNWFilter{Define,Undefine} APIs and starting of guest VMs due to mis-matched lock ordering. In the virNWFilter{Define,Undefine} codepaths the lock ordering is 1. nwfilter driver lock 2. virt driver lock 3. nwfilter update lock 4. domain object lock In the VM guest startup paths the lock ordering is 1. virt driver lock 2. domain object lock 3. nwfilter update lock As can be seen the domain object and nwfilter update locks are not acquired in a consistent order. The fix used is to push the nwfilter update lock upto the top level resulting in a lock ordering for virNWFilter{Define,Undefine} of 1. nwfilter driver lock 2. nwfilter update lock 3. virt driver lock 4. domain object lock and VM start using 1. nwfilter update lock 2. virt driver lock 3. domain object lock This has the effect of serializing VM startup once again, even if no nwfilters are applied to the guest. There is also the possibility of deadlock due to a call graph loop via virNWFilterInstantiate and virNWFilterInstantiateFilterLate. These two problems mean the lock must be turned into a read/write lock instead of a plain mutex at the same time. The lock is used to serialize changes to the "driver->nwfilters" hash, so the write lock only needs to be held by the define/undefine methods. All other methods can rely on a read lock which allows good concurrency. Though some things have changed due to removal of the virt driver lock in QEMU, we stil have to respect the ordering of the nwfilter vs domain object lock Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list