On 07/09/2018 03:11 AM, Peter Krempa wrote: > On Sat, Jul 07, 2018 at 08:11:05 -0400, John Ferlan wrote: >> When virNetDaemonQuit is called from libvirtd's shutdown >> handler (daemonShutdownHandler) we need to perform the quit >> in multiple steps. The first part is to "request" the quit >> and notify the NetServer's of the impending quit which causes >> the NetServers to inform their workers that a quit was requested. >> >> Still because we cannot guarantee a quit will happen or it's >> possible there's no workers pending, use a virNetDaemonQuitTimer >> to not only break the event loop but keep track of how long we're >> waiting and we've waited too long, force an ungraceful exit so >> that we don't hang waiting forever or cause some sort of SEGV >> because something is still pending and we Unref things. >> >> Signed-off-by: John Ferlan <jferlan@xxxxxxxxxx> >> --- >> src/libvirt_remote.syms | 1 + >> src/remote/remote_daemon.c | 1 + >> src/rpc/virnetdaemon.c | 68 +++++++++++++++++++++++++++++++++++++- >> src/rpc/virnetdaemon.h | 4 +++ >> 4 files changed, 73 insertions(+), 1 deletion(-) > > [...] > >> @@ -855,10 +904,27 @@ virNetDaemonRun(virNetDaemonPtr dmn) >> virObjectLock(dmn); >> >> virHashForEach(dmn->servers, daemonServerProcessClients, NULL); >> + >> + /* HACK: Add a dummy timeout to break event loop */ >> + if (dmn->quitRequested && quitTimer == -1) >> + quitTimer = virEventAddTimeout(500, virNetDaemonQuitTimer, >> + &quitCount, NULL); >> + >> + if (dmn->quitRequested && daemonServerWorkersDone(dmn)) { >> + dmn->quit = true; >> + } else { >> + /* Firing every 1/2 second and quitTimeout in seconds, force >> + * an exit when there are still worker threads running and we >> + * have waited long enough */ >> + if (quitCount > dmn->quitTimeout * 2) >> + _exit(EXIT_FAILURE); > > If you have a legitimate long-running job which would finish eventually > and e.g. write an updated status XML this will break things. I'm not > persuaded that this is a systematic solution to some API getting stuck. > > The commit message also does not help persuading me that this is a good > idea. > > NACK > I understand the sentiment and this hasn't been a "top of the list" item for me to think about, but to a degree the purpose of re-posting the series was to try to come to a consensus over some solution. A naked NACK almost makes it appear as if you would prefer to not do anything in this situation, letting nature takes it's course (so to speak). Do you have any thoughts or suggestions related to what processing you believe is appropriate if someone issues a SIG{INT|QUIT|TERM} to a daemon (libvirtd|lockd|logd}? That is what to do if we reach the cleanup: label in main() of src/remote/remote_daemon.c and virNetDaemonClose() gets run? Right now IIRC it's either going to be a SEGV or possible long wait for some worker thread to complete. The latter is the don't apply this example to cause a wait in block stats fetch. Naturally the long term wait only matters up through the point while someone is patient before issuing a SIGKILL. Do you favor waiting forever for some worker thread to finish? To some degree if some signal is sent to libvirtd, then I would think the expectation is to not wait for some indeterminate time. And most definitely trying to avoid some sort of defunct state resolvable by a host reboot. Knowing exactly what a worker thread is doing may help, but that'd take a bit (;-)) of code to trace the stack. Perhaps providing a list of client/workers still active or even some indication that we're waiting on some worker rather than no feedback? How much is "too much"? Another option that was proposed, but didn't really gain any support was introduction of a stateShutdown in virStateDriver that would be run during libvirtd's cleanup: processing prior to virNetDaemonClose. That would be before virStateCleanup. For qemu, the priv->mon and priv->agent would be closed. That would seemingly generate an error and cause the worker to exit thus theoretically not needing any forced quit of the thread "if" the reason was some sort of hang as a result of monitor/agent processing. Doesn't mean there wouldn't be some other issue causing a hang. Refreshing my memory on this - there was some discussion or investigation related to using virStateStop, but I since that's gated by "WITH_DBUS" being defined, it wasn't clear if/how it could be used (daemonStop is only called/setup if daemonRunStateInit has set it up). John -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list