We got a new BZ filed about a libvirtd crash in shutdown https://bugzilla.redhat.com/show_bug.cgi?id=1828207 We can see from the stack trace that the "interface" driver is in the middle of servicing an RPC call for virConnectListAllInterfaces() Meanwhile the libvirtd daemon is doing virObjectUnref(dmn) on the virNetDaemonPtr object. The fact that it is doing this unref, means that it must have already call virStateCleanup(), given the code sequence: /* Run event loop. */ virNetDaemonRun(dmn); ret = 0; virHookCall(VIR_HOOK_DRIVER_DAEMON, "-", VIR_HOOK_DAEMON_OP_SHUTDOWN, 0, "shutdown", NULL, NULL); cleanup: /* Keep cleanup order in inverse order of startup */ virNetDaemonClose(dmn); virNetlinkEventServiceStopAll(); if (driversInitialized) { /* NB: Possible issue with timing window between driversInitialized * setting if virNetlinkEventServerStart fails */ driversInitialized = false; virStateCleanup(); } virObjectUnref(adminProgram); virObjectUnref(srvAdm); virObjectUnref(qemuProgram); virObjectUnref(lxcProgram); virObjectUnref(remoteProgram); virObjectUnref(srv); virObjectUnref(dmn); Unless I'm missing something non-obvious, this cleanup code path is inherantly broken & racy. When virNetDaemonRun() returns the RPC worker threads are all still active. They are all liable to still be executing RPC calls, which means any of the drivers may be in use. So calling virStateCleanup() is an inherantly dangerous thing to do. There is the further complication that once we have exitted the main loop we may prevent the RPC calls from ever completing, as they may be waiting on an event to be dispatched. I know we're had various patch proposals in the past to improve the robustness of shutdown cleanup but I can't remember the outcome of the reviews. Hopefully people involved in those threads can jump in here... IMHO the key problem here is the virNetDeamonRun() method which just looks at the "quit" flag and immediately returns if it is set. This needs to be changed so that when it sees quit == true, it takes the following actions 1. Call virNetDaemonClose() to drop all RPC clients and thus prevent new RPC calls arriving 2. Flush any RPC calls which are queued but not yet assigned to a worker thread 3. Tell worker threads to exit after finishing their current job 4. Wait for all worker threads to exit 5. Now virNetDaemonRun may return At this point we can call virStateCleanup and the various other things, as we know no drivers are still active in RPC calls. Having said that, there could be background threads in the the drivers which are doing work that uses the event loop thread. So we probably need a virStateClose() method that we call from virNetDaemonRun, *after* all worker threads are gone, which would cleanup any background threads while the event loop is still running. The issue is that step 4 above ("Wait for all worker threads to exit") may take too long, or indeed never complete. To deal with this, it will need a timeout. In the remote_daemon.c cleanup code path, if there are still worker threads present, then we need to skip all cleanup and simply call _exit(0) to terminate the process with no attempt at cleanup, since it would be unsafe to try anything else. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|