As far as code goes: Reviewed-by: Daniel Henrique Barboza <danielhb413@xxxxxxxxx> About the design I have a question about the timeout. Patch 5/10 is setting a 15 second timeout. How did you reach this value? Reading the bug, specially this comment from Daniel: https://bugzilla.redhat.com/show_bug.cgi?id=1828207#c6 He mentions "give it 5 seconds of running before shutting it down". 5 seconds before shutdown is something that most users can be slightly annoyed but in the end don't mind that much, but 15 seconds is something that will cause bugs to be opened because "Libvirt is taking too long to shutdown". Besides, it's a fair assumption that a transaction that takes more than 5 or so seconds to finish is already compromised* - might as well shutdown the daemon and deal with the errors. * assuming user discretion to avoid shutting down the daemon in the middle of a long duration transaction, of course. Thanks, DHB On 7/14/20 6:32 AM, Nikolay Shirokovskiy wrote:
This series follows [1] but address the issue slightly differently. Instead of polling for RPC thread pool termination it waits for thread pool drain in distinct thread and then signal the main loop to exit. The series introduces new driver's methods stateShutdown/stateShutdownWait to finish all driver's background threads. The methods however are only implemented for qemu driver and only partially. There are other drivers that have background threads and I don't check every one of them in terms of how they manage their background threads. For example node driver creates 2 threads. One of them is supposed to live a for a short amount of time and thus not tracked. This thread can cause issues on shutdown. The second thread is tracked and finished synchronously on driver cleanup. So this thread can not cause crashes but can cause hangs theoretically speaking so we may want to move the thread finishing to stateShutdownWait method so that possible hang will be handled by shutdown timeout. The qemu driver also has untracked threads and they can cause crashes on shutdown. For example reconnect threads or reboot thread. These need to be tracked. I'm going to address these issues in qemu and other drivers once the overall approach will be approved. I added 2 new driver's methods so that thread finishing will be done in parallel. If we have only one method then the shutdown is done one by one effectively. I've added clean shutdown timeout in event loop as suggested by Daniel in [2]. Now I think why we can't just go with systemd unit management? Systemd will eventually send SIGKILL and we can tune the timeout using TimeoutStopUSec parameter. This way we even don't need to introduce new driver's methods. Driver's background threads can be finished in stateCleanup method. AFAIU as drivers are cleaned up in reverse order it is safe in the sense that already cleaned up driver can not be used by background threads of not yet cleaned up driver. Of course this way the cleanup is not done in parallel. Well to turn it into parallel we can introduce just stateShutdown which we don't need to call in netdaemon code and thus don't introduce undesired dependency of netdaemon on drivers concept. [1] Resolve libvirtd hang on termination with connected long running client https://www.redhat.com/archives/libvir-list/2018-July/msg00382.html [2] Races / crashes in shutdown of libvirtd daemon https://www.redhat.com/archives/libvir-list/2020-April/msg01328.html Nikolay Shirokovskiy (10): libvirt: add stateShutdown/stateShutdownWait to drivers util: always initialize priority condition util: add stop/drain functions to thread pool rpc: don't unref service ref on socket behalf twice rpc: finish all threads before exiting main loop vireventthread: add virEventThreadClose qemu: exit thread synchronously in qemuDomainObjStopWorker qemu: implement driver's shutdown/shutdown wait methods rpc: cleanup virNetDaemonClose method util: remove unused virThreadPoolNew macro scripts/check-drivername.py | 2 + src/driver-state.h | 8 ++++ src/libvirt.c | 42 +++++++++++++++++++ src/libvirt_internal.h | 2 + src/libvirt_private.syms | 3 ++ src/libvirt_remote.syms | 1 - src/qemu/qemu_domain.c | 1 + src/qemu/qemu_driver.c | 32 +++++++++++++++ src/remote/remote_daemon.c | 3 -- src/rpc/virnetdaemon.c | 95 ++++++++++++++++++++++++++++++++++++------- src/rpc/virnetdaemon.h | 2 - src/rpc/virnetserver.c | 8 ++++ src/rpc/virnetserver.h | 1 + src/rpc/virnetserverservice.c | 1 - src/util/vireventthread.c | 9 ++++ src/util/vireventthread.h | 1 + src/util/virthreadpool.c | 65 ++++++++++++++++++++--------- src/util/virthreadpool.h | 6 +-- 18 files changed, 238 insertions(+), 44 deletions(-)