Historically the nwfilter driver didn't keep track of any instances of filters. When it needed to rebuild filters it called back to the virt drivers to ask them to re-create the filters for their VMs. This lead to some complex locking requirements where the virt driver needed to acquire locks on the nwfilter driver before creating/deleting filter instances. Somewhat recently we introduced the virNWFilterBinding concept which allowed the nwfilter driver to keep track of all filter instances aka bindings. Thus it could rebuild them without talking to the virt drivers. We never updated the locking model when doing this though, so we were still reliant on the virt drivers acquiring locks before creating or deleting virNWFilterBinding objects. When we started using the modular daemons though, the locks acquired by the virt drivers no longer protected the nwfilter driver as they were separate processes. Thus the race condition fixed back in commit 6e5c79a1b5a8b3a23e7df7ffe58fb272aa17fbfb Author: Daniel P. Berrangé <berrange@xxxxxxxxxx> Date: Wed Jan 22 17:28:29 2014 +0000 Push nwfilter update locking up to top level has now re-appeared when running modular daemons. In fact it is possible to trigger it even in libvirtd if you ignore the virt drivers and directly call the virNWFilterBinding APIs, though that won't hit users in practice. The solution is surprisingly simple. Since the nwfilter driver keeps track of virNWFilterBinding objects, we can simply put locking calls around the Create/Delete methods for those objects, and remove the locking from the virt drivers. The only slight difference is that previous the locking serialized the entire VM startup/shutdown sequence, while the new locking only serializes individual NICs. This should not matter in practice, since there's no shared state between filters for NICs on the same VM. This is a minimal fix suitable for the freeze. There is scope for some significant improvements in locking which I'm still working on. In particular we currently have a terrible writer starvation problem with the RWLock that protects the filter create/delete/redefine operations that has existed forever. Repeatedly creating/deleting virNWFilterBinding objects can prevent a virNWFilterDefineXML call from ever running. Daniel P. Berrangé (5): nwfilter: stop using recursive mutex for IP learning nwfilter: hold filter update lock when creating/deleting bindings qemu,lxc: remove use to nwfilter update lock nwfilter: remove decl of virNWFilterCreateVarHashmap nwfilter: make some gentech driver methods static src/lxc/lxc_driver.c | 6 ------ src/nwfilter/nwfilter_driver.c | 5 +++++ src/nwfilter/nwfilter_gentech_driver.c | 4 ++-- src/nwfilter/nwfilter_gentech_driver.h | 8 -------- src/nwfilter/nwfilter_learnipaddr.c | 2 +- src/qemu/qemu_driver.c | 18 ------------------ src/qemu/qemu_migration.c | 3 --- src/qemu/qemu_process.c | 4 ---- 8 files changed, 8 insertions(+), 42 deletions(-) -- 2.35.1