On Fri, Oct 08, 2021 at 01:12:49PM +0200, Michal Privoznik wrote: > When a server decides to close a client, the > virNetServerClientCloseLocked() is called. In here various > cleanup steps are taken, but the most important part (from this > commit's POV at least) is the way that the socket is closed. > Firstly, removal of the socket associated with the client from > the event loop is signalized and then the socket is unrefed. The > socket is not closed just yet though, because the event loop > holds a reference to it. This reference will be freed as soon as > the event loop wakes up and starts issuing callbacks (in this > case virNetSocketEventFree()). > > So far, this is how things usually work. But if the daemon > reaches the number of opened files limit, things start to work > differently. > > If the RLIMIT_NOFILE limit is reached and there's a client that > wants to connect then the event loop wakes up, sees POLLIN on the > socket and calls virNetServerServiceAccept() which in turn calls > virNetSocketAccept(). But because of the limit, accept() fails > with EMFILE leaving the POLLIN event unhandled. The dispatch then > continues to next FDs with events on them. BUT, it will NOT call > the socket removal callback (virNetSocketEventFree()) because it > has low priority (G_PRIORITY_DEFAULT_IDLE). Per glib's > documentation: if virNetSocketAccept fails, there's no client socket to remove/close. AFAIK, we are not removing the server socket either. So I'm not following which socket is supposedly being removed in this scenario ? > * Each event source is assigned a priority. The default priority, > * %G_PRIORITY_DEFAULT, is 0. Values less than 0 denote higher priorities. > * Values greater than 0 denote lower priorities. Events from high priority > * sources are always processed before events from lower priority sources. > > and per g_idle_add() documentation: > > * Adds a function to be called whenever there are no higher priority > * events pending to the default main loop. The function is given the > * default idle priority, %G_PRIORITY_DEFAULT_IDLE. > > Now, because we did not accept() the client we are constantly > seeing POLLIN on the main socket and thus the removal of the > client socket won't ever happen. > > The fix is to set at least the same priority as other sources, > but since we want to just close an FD, let's give it the highest > priority and call it before handling other events. > > This issue can be easily reproduced, for instance: > > # ulimit -S -n 40 (tweak this number if needed) > # ./src/libvirtd > > from another terminal: > > # for ((i=0; i<100; i++)); do virsh list & done; virsh list > > The last `virsh list` must not get stuck. The bug below describes libvirt core dumps rather than hangs. Do you know why it dumped ? > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2007168 > Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > --- > src/util/vireventglib.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/util/vireventglib.c b/src/util/vireventglib.c > index f3e5a344b0..983787932f 100644 > --- a/src/util/vireventglib.c > +++ b/src/util/vireventglib.c > @@ -287,7 +287,7 @@ virEventGLibHandleRemove(int watch) > * 'removed' to prevent reuse > */ > data->removed = TRUE; > - g_idle_add(virEventGLibHandleRemoveIdle, data); > + g_idle_add_full(G_PRIORITY_HIGH, virEventGLibHandleRemoveIdle, data, NULL); > > ret = 0; > > -- > 2.32.0 > Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|