On Sat, Dec 09, 2017 at 12:29:14PM -0500, John Ferlan wrote: > If the timing is "just right", there is a possibility that the > udev nodeStateInitialize conflicts with another systemd thread > running an lspci command leaving both waiting for "something", > but resulting in a hung libvirtd (and hung lspci thread) from > which the only recovery is a reboot because killing either thread > is impossible and results in a defunct libvirtd process if a > SIGKILL is performed. > > In order to avoid this let's move where the PCI initialization > is done to be where it's actually needed. Ensure we only perform > the initialization once via a driver bool. Likewise, during > cleanup ensure we only call udevPCITranslateDeinit once the > initialization is successful. > > At least a failure for this driver won't hang out the rest of the > the libvirt event loop. May not make certain things usable though. > Still a libvirtd restart is far easier than a host reboot. Is there a BZ for this or can you at least share what steps are necessary to have a chance of hitting this issue? I'm asking because it sounds like we should file a BZ against udev as well (possibly kernel) and a thorough investigation of where the deadlock happens is necessary because I don't see a any guarantee that just with a simple logic movement (and adding a trigger condition) we can make disappear a race outside of our scope for good. On the other hand, having to choose between a hung process requiring a host restart and a hung worker thread requiring a service restart, I'd obviously opt for the latter. So I'd say the next steps depend on how frequently and under what circumstances (specific host devices, kernel version, etc.) this happens, because to me it sounds odd how systemd and libpciaccess clash here. Erik -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list