Re: NFSD threads hang when destroying a session or client ID

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 22 Jan 2025 09:19:11 -0500

On 1/22/25 6:40 AM, Baptiste PELLEGRIN via Bugspray Bot wrote:
Baptiste PELLEGRIN writes via Kernel.org Bugzilla:

(In reply to Chuck Lever from comment #9)

The server generates a CB_RECALL_ANY message for each active client. If the
number of active clients increases from zero at server boot time to a few
dozen, that would also explain why you see more of these over time.

This mean that something can stay active overnight ? In my case, no client are running outside opening hour. They are all suspended or power off.

"-e sunrpc:xprt_reserve" for example would help us match the XIDs in the
callback operations to the messages you see in the server's system journal.

I will try to help you I much as I can. Is not really a problem for me to run trace-cmd on all clients as they are all managed with Puppet. I will try as soon as possible.

You confirm to me that I need to run "trace-cmd record -e sunrpc:xprt_reserve" on all my clients ? No more flags ?

Don't change the clients. The additional command line option above goes
on the existing server trace-cmd, but only if the NFS server system does
not have any NFS mounts of its own. I'm trying to keep the amount of
generated trace data to a minimum.

The point here is to record the XIDs of each of the backchannel
operations. These XIDs will show up in the "unrecognized XID" messages
on your server.

Did you see that I have added the "-e sunrpc:svc_xprt_detach -p function -l nfsd4_destroy_session" flags in my last recorded trace ? Do I need to add new one for the next crash ?

I will also send to you the dump of current kernel task next time.

--
Chuck Lever