On 9/14/24 3:29 AM, Salvatore Bonaccorso wrote:
Hi all,
On Wed, Jun 05, 2024 at 05:19:27PM +0200, Sergio Gelato wrote:
Observed on Debian 12 (nfs-utils 2.6.2):
May 28 09:40:25 HOSTNAME rpc.idmapd[3602614]: dirscancb: scandir(/run/rpc_pipefs/nfs): Too many open files
[repeated multiple times]
Investigation with lsof on one of the affected systems shows that file desciptors are not being closed:
[...]
rpc.idmap 675 root 126r DIR 0,40 0 10813 /run/rpc_pipefs/nfs/clnt11e6 (deleted)
rpc.idmap 675 root 127u FIFO 0,40 0t0 10817 /run/rpc_pipefs/nfs/clnt11e6/idmap (deleted)
rpc.idmap 675 root 128r DIR 0,40 0 10834 /run/rpc_pipefs/nfs/clnt11ef (deleted)
rpc.idmap 675 root 129u FIFO 0,40 0t0 10838 /run/rpc_pipefs/nfs/clnt11ef/idmap (deleted)
rpc.idmap 675 root 130r DIR 0,40 0 10855 /run/rpc_pipefs/nfs/clnt11f8 (deleted)
rpc.idmap 675 root 131u FIFO 0,40 0t0 10859 /run/rpc_pipefs/nfs/clnt11f8/idmap (deleted)
Raising the verbosity level to 3 results in no "Stale client:" lines.
strace shows no close() calls other than for the /run/rpc_pipefs/nfs directory.
I believe this is because in dirscancb() the loop is exited prematurely
the first time nfsopen() returns -1, preventing later entries in the queue
from being reaped. I've tested the patch below, which seems indeed to cure
the problem. The bug appears to be still unfixed in the current master branch.
From: Sergio Gelato <Sergio.Gelato@xxxxxxxxxxx>
Date: Tue, 4 Jun 2024 16:02:59 +0200
Subject: rpc.idmapd: nfsopen() failures should not be fatal
dirscancb() loops over all clnt* subdirectories of /run/rpc_pipefs/nfs/.
Some of these directories contain /idmap files, others don't. nfsopen()
returns -1 for the latter; we then want to skip the directory, not abort
the entire scan.
---
utils/idmapd/idmapd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/utils/idmapd/idmapd.c b/utils/idmapd/idmapd.c
index e79c124..f3c540d 100644
--- a/utils/idmapd/idmapd.c
+++ b/utils/idmapd/idmapd.c
@@ -556,7 +556,7 @@ dirscancb(int fd, short UNUSED(which), void *data)
if (nfsopen(ic) == -1) {
close(ic->ic_dirfd);
free(ic);
- goto out;
+ continue;
}
if (verbose > 2)
Did this felt trough the cracks? Does the patch from Sergio looks good
to you?
It did because it was not in the appropriate format... The patch
was an attachment, not in-line, no Signed-off-by: line and
the patch was not create by git format-patch command (which
adds PATCH in the subject line).
I have filters that look for things like that and I just
didn't see it...
steved.