Re: NFSD threads hang when destroying a session or client ID

Jeff Layton via Bugspray Bot <bugbot@xxxxxxxxxx> · Tue, 21 Jan 2025 14:40:10 +0000

Jeff Layton writes via Kernel.org Bugzilla:

I looked at this with Chuck the other day. As far as wait_var_event() getting stuck, I think all that would take is for nfsd4_cb_sequence_done() to continually set cb_need_restart on every call. That would cause the callback to not be destroyed and to never call nfsd41_cb_inflight_end().

That happens in the need_restart: label in nfsd4_cb_sequence_done. These two cases goto that:

        if (!clp->cl_minorversion) {
                /*
                 * If the backchannel connection was shut down while this
                 * task was queued, we need to resubmit it after setting up
                 * a new backchannel connection.
                 *
                 * Note that if we lost our callback connection permanently
                 * the submission code will error out, so we don't need to
                 * handle that case here.
                 */
                if (RPC_SIGNALLED(task))
                        goto need_restart;

                return true;
        }

        if (cb->cb_held_slot < 0)
                goto need_restart;

It doesn't seem likely that it somehow lost the slot, so my guess is that the RPC task is continually returning with RPC_SIGNALLED() set.

Question for Baptiste -- what NFS versions are your clients using?

View: https://bugzilla.kernel.org/show_bug.cgi?id=219710#c6
You can reply to this message to join the discussion.
-- 
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)