Re: NFSD threads hang when destroying a session or client ID

Jeff Layton via Bugspray Bot <bugbot@xxxxxxxxxx> · Thu, 23 Jan 2025 13:50:22 +0000

Jeff Layton writes via Kernel.org Bugzilla:

There is another scenario that could explain a hang here. From nfsd4_cb_sequence_done():

------------------8<---------------------
        case -NFS4ERR_BADSLOT:
                goto retry_nowait;
        case -NFS4ERR_SEQ_MISORDERED:        
                if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
                        session->se_cb_seq_nr[cb->cb_held_slot] = 1;
                        goto retry_nowait;     
                }      
                break;
        default:                          
                nfsd4_mark_cb_fault(cb->cb_clp);
        }                       
        trace_nfsd_cb_free_slot(task, cb);
        nfsd41_cb_release_slot(cb);             

        if (RPC_SIGNALLED(task))
                goto need_restart;
out:                  
        return ret;
retry_nowait:
        if (rpc_restart_call_prepare(task))
                ret = false;                
        goto out;
------------------8<---------------------

Since it doesn't check RPC_SIGNALLED in the v4.1+ case until very late in the function, it's possible to get a BADSLOT or SEQ_MISORDERED error that causes the callback client to immediately resubmit the rpc_task to the RPC engine without resubmitting to the callback workqueue.

I think that we should assume that when RPC_SIGNALLED returns true that the result is suspect, and that we should halt further processing into the CB_SEQUENCE response and restart the callback.

View: https://bugzilla.kernel.org/show_bug.cgi?id=219710#c18
You can reply to this message to join the discussion.
-- 
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)