On 14.07.2020 17:53, Daniel Henrique Barboza wrote: > As far as code goes: > > > Reviewed-by: Daniel Henrique Barboza <danielhb413@xxxxxxxxx> > > > About the design I have a question about the timeout. Patch 5/10 is setting a > 15 second timeout. How did you reach this value? Reading the bug, specially > this comment from Daniel: > > https://bugzilla.redhat.com/show_bug.cgi?id=1828207#c6 > > He mentions "give it 5 seconds of running before shutting it down". I guess 5 seconds is time for libvirtd to finish startup. This time has different nature than time for libvirtd to finish it's work on shutdown so it can be different. > > 5 seconds before shutdown is something that most users can be slightly annoyed > but in the end don't mind that much, but 15 seconds is something that will > cause bugs to be opened because "Libvirt is taking too long to shutdown". > Besides, it's a fair assumption that a transaction that takes more than > 5 or so seconds to finish is already compromised* - might as well shutdown > the daemon and deal with the errors. 15 seconds was mentioned by Daniel in [1] when he first proposed the approach so I used this value without any extra thought. However I missed that in the last John's series [2] the default for waiting time is 0s. May be this is the current decision on waiting time. Let's wait for others to join the review. [1] https://www.redhat.com/archives/libvir-list/2018-January/msg00942.html [2] https://www.redhat.com/archives/libvir-list/2018-July/msg00382.html Nikolay > > > > * assuming user discretion to avoid shutting down the daemon in the middle > of a long duration transaction, of course. > > > Thanks, > > > DHB > > > On 7/14/20 6:32 AM, Nikolay Shirokovskiy wrote: >> This series follows [1] but address the issue slightly differently. >> Instead of polling for RPC thread pool termination it waits for >> thread pool drain in distinct thread and then signal the main loop >> to exit. >> >> The series introduces new driver's methods stateShutdown/stateShutdownWait >> to finish all driver's background threads. The methods however are only >> implemented for qemu driver and only partially. There are other drivers >> that have background threads and I don't check every one of them in >> terms of how they manage their background threads. >> >> For example node driver creates 2 threads. One of them is supposed to live >> a for a short amount of time and thus not tracked. This thread can cause issues >> on shutdown. The second thread is tracked and finished synchronously on driver >> cleanup. So this thread can not cause crashes but can cause hangs theoretically >> speaking so we may want to move the thread finishing to stateShutdownWait >> method so that possible hang will be handled by shutdown timeout. >> >> The qemu driver also has untracked threads and they can cause crashes on >> shutdown. For example reconnect threads or reboot thread. These need to be >> tracked. >> >> I'm going to address these issues in qemu and other drivers once the overall >> approach will be approved. >> >> I added 2 new driver's methods so that thread finishing will be done in >> parallel. If we have only one method then the shutdown is done one by one >> effectively. >> >> I've added clean shutdown timeout in event loop as suggested by Daniel in [2]. >> Now I think why we can't just go with systemd unit management? Systemd will >> eventually send SIGKILL and we can tune the timeout using TimeoutStopUSec >> parameter. This way we even don't need to introduce new driver's methods. >> Driver's background threads can be finished in stateCleanup method. AFAIU as >> drivers are cleaned up in reverse order it is safe in the sense that already >> cleaned up driver can not be used by background threads of not yet cleaned up >> driver. Of course this way the cleanup is not done in parallel. Well to >> turn it into parallel we can introduce just stateShutdown which we don't need >> to call in netdaemon code and thus don't introduce undesired dependency of >> netdaemon on drivers concept. >> >> [1] Resolve libvirtd hang on termination with connected long running client >> https://www.redhat.com/archives/libvir-list/2018-July/msg00382.html >> [2] Races / crashes in shutdown of libvirtd daemon >> https://www.redhat.com/archives/libvir-list/2020-April/msg01328.html >> >> Nikolay Shirokovskiy (10): >> libvirt: add stateShutdown/stateShutdownWait to drivers >> util: always initialize priority condition >> util: add stop/drain functions to thread pool >> rpc: don't unref service ref on socket behalf twice >> rpc: finish all threads before exiting main loop >> vireventthread: add virEventThreadClose >> qemu: exit thread synchronously in qemuDomainObjStopWorker >> qemu: implement driver's shutdown/shutdown wait methods >> rpc: cleanup virNetDaemonClose method >> util: remove unused virThreadPoolNew macro >> >> scripts/check-drivername.py | 2 + >> src/driver-state.h | 8 ++++ >> src/libvirt.c | 42 +++++++++++++++++++ >> src/libvirt_internal.h | 2 + >> src/libvirt_private.syms | 3 ++ >> src/libvirt_remote.syms | 1 - >> src/qemu/qemu_domain.c | 1 + >> src/qemu/qemu_driver.c | 32 +++++++++++++++ >> src/remote/remote_daemon.c | 3 -- >> src/rpc/virnetdaemon.c | 95 ++++++++++++++++++++++++++++++++++++------- >> src/rpc/virnetdaemon.h | 2 - >> src/rpc/virnetserver.c | 8 ++++ >> src/rpc/virnetserver.h | 1 + >> src/rpc/virnetserverservice.c | 1 - >> src/util/vireventthread.c | 9 ++++ >> src/util/vireventthread.h | 1 + >> src/util/virthreadpool.c | 65 ++++++++++++++++++++--------- >> src/util/virthreadpool.h | 6 +-- >> 18 files changed, 238 insertions(+), 44 deletions(-) >>