On Thu, Jul 12, 2018 at 03:34:00PM -0400, John Ferlan wrote: > > > On 07/09/2018 03:11 AM, Peter Krempa wrote: > > On Sat, Jul 07, 2018 at 08:11:05 -0400, John Ferlan wrote: > >> When virNetDaemonQuit is called from libvirtd's shutdown > >> handler (daemonShutdownHandler) we need to perform the quit > >> in multiple steps. The first part is to "request" the quit > >> and notify the NetServer's of the impending quit which causes > >> the NetServers to inform their workers that a quit was requested. > >> > >> Still because we cannot guarantee a quit will happen or it's > >> possible there's no workers pending, use a virNetDaemonQuitTimer > >> to not only break the event loop but keep track of how long we're > >> waiting and we've waited too long, force an ungraceful exit so > >> that we don't hang waiting forever or cause some sort of SEGV > >> because something is still pending and we Unref things. > >> > >> Signed-off-by: John Ferlan <jferlan@xxxxxxxxxx> > >> --- > >> src/libvirt_remote.syms | 1 + > >> src/remote/remote_daemon.c | 1 + > >> src/rpc/virnetdaemon.c | 68 +++++++++++++++++++++++++++++++++++++- > >> src/rpc/virnetdaemon.h | 4 +++ > >> 4 files changed, 73 insertions(+), 1 deletion(-) > > > > [...] > > > >> @@ -855,10 +904,27 @@ virNetDaemonRun(virNetDaemonPtr dmn) > >> virObjectLock(dmn); > >> > >> virHashForEach(dmn->servers, daemonServerProcessClients, NULL); > >> + > >> + /* HACK: Add a dummy timeout to break event loop */ > >> + if (dmn->quitRequested && quitTimer == -1) > >> + quitTimer = virEventAddTimeout(500, virNetDaemonQuitTimer, > >> + &quitCount, NULL); > >> + > >> + if (dmn->quitRequested && daemonServerWorkersDone(dmn)) { > >> + dmn->quit = true; > >> + } else { > >> + /* Firing every 1/2 second and quitTimeout in seconds, force > >> + * an exit when there are still worker threads running and we > >> + * have waited long enough */ > >> + if (quitCount > dmn->quitTimeout * 2) > >> + _exit(EXIT_FAILURE); > > > > If you have a legitimate long-running job which would finish eventually > > and e.g. write an updated status XML this will break things. I'm not > > persuaded that this is a systematic solution to some API getting stuck. > > > > The commit message also does not help persuading me that this is a good > > idea. > > > > NACK > > > > I understand the sentiment and this hasn't been a "top of the list" item > for me to think about, but to a degree the purpose of re-posting the > series was to try to come to a consensus over some solution. A naked > NACK almost makes it appear as if you would prefer to not do anything in > this situation, letting nature takes it's course (so to speak). > > Do you have any thoughts or suggestions related to what processing you > believe is appropriate if someone issues a SIG{INT|QUIT|TERM} to a > daemon (libvirtd|lockd|logd}? That is what to do if we reach the > cleanup: label in main() of src/remote/remote_daemon.c and > virNetDaemonClose() gets run? Right now IIRC it's either going to be a > SEGV or possible long wait for some worker thread to complete. The > latter is the don't apply this example to cause a wait in block stats > fetch. Naturally the long term wait only matters up through the point > while someone is patient before issuing a SIGKILL. > > Do you favor waiting forever for some worker thread to finish? To some > degree if some signal is sent to libvirtd, then I would think the > expectation is to not wait for some indeterminate time. And most > definitely trying to avoid some sort of defunct state resolvable by a > host reboot. Knowing exactly what a worker thread is doing may help, but > that'd take a bit (;-)) of code to trace the stack. Perhaps providing a > list of client/workers still active or even some indication that we're > waiting on some worker rather than no feedback? How much is "too much"? > > Another option that was proposed, but didn't really gain any support was > introduction of a stateShutdown in virStateDriver that would be run > during libvirtd's cleanup: processing prior to virNetDaemonClose. That > would be before virStateCleanup. For qemu, the priv->mon and priv->agent > would be closed. That would seemingly generate an error and cause the > worker to exit thus theoretically not needing any forced quit of the > thread "if" the reason was some sort of hang as a result of > monitor/agent processing. Doesn't mean there wouldn't be some other > issue causing a hang. Refreshing my memory on this - there was some > discussion or investigation related to using virStateStop, but I since > that's gated by "WITH_DBUS" being defined, it wasn't clear if/how it > could be used (daemonStop is only called/setup if daemonRunStateInit has > set it up). I think the key thing to bear in mind is what is the benefit of what we are trying todo. ie why do we need todo a clean shutdown instead of just immediately exiting. The kernel will clean up all resources associated with the process when it exits. So we only need care about a clean shutdown if there is something needing cleanup that the kernel won't handle, or if we are trying to debug with valgrind & similar tools. I tend to think we put way too much time into worrying about getting perfect clean shutdown, for little real world benefit. IMHO, we should make a moderate effort to shutdown cleanly, but if there's something stuck after 15 seconds, we should just uncoditionally call exit(). Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list