Re: NFSD threads hang when destroying a session or client ID

Chuck Lever via Bugspray Bot <bugbot@xxxxxxxxxx> · Tue, 21 Jan 2025 16:35:13 +0000

Chuck Lever writes via Kernel.org Bugzilla:

(In reply to Baptiste PELLEGRIN from comment #8)
> I always see one or two "unrecognized reply" message around 120 seconds
> before the hang message.
> 
> So it may something that happen on client or server weekly jobs ?
> Or maybe some memory leak or cache corruption ?
> Or something related to expired Kerberos cache file ?
> Or expired NFS session ?
> ...
> 
> It seems also that the number of nfsd_cb_recall_any callback message
> increase with the server uptime. This seems in favor of the memory leak
> hypothesis.

The server generates a CB_RECALL_ANY message for each active client. If the number of active clients increases from zero at server boot time to a few dozen, that would also explain why you see more of these over time.

If your NFS server does not also have NFS mount points, a few client-side trace points can be enabled to capture more details about NFSv4 callback activity.

"-e sunrpc:xprt_reserve" for example would help us match the XIDs in the callback operations to the messages you see in the server's system journal.

View: https://bugzilla.kernel.org/show_bug.cgi?id=219710#c9
You can reply to this message to join the discussion.
-- 
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)