On Fri, 2013-11-15 at 16:36 -0500, andros+AEA-netapp.com wrote: +AD4- From: Andy Adamson +ADw-andros+AEA-netapp.com+AD4- +AD4- +AD4- When the state manager is processing the NFS4CLNT+AF8-DELEGRETURN flag, session +AD4- draining is off, but DELEGRETURN can still get a session error. +AD4- The async handler calls nfs4+AF8-schedule+AF8-session+AF8-recovery returns -EAGAIN, and +AD4- the DELEGRETURN done then restarts the RPC task in the prepare state. +AD4- With the state manager still processing the NFS4CLNT+AF8-DELEGRETURN flag with +AD4- session draining off, these DELEGRETURNs will cycle with errors filling up the +AD4- session slots. +AD4- +AD4- This prevents OPEN reclaims (from nfs+AF8-delegation+AF8-claim+AF8-opens) required by the +AD4- NFS4CLNT+AF8-DELEGRETURN state manager processing from completing, hanging the +AD4- state manager in the +AF8AXw-rpc+AF8-wait+AF8-for+AF8-completion+AF8-task in nfs4+AF8-run+AF8-open+AF8-task +AD4- as seen in this kernel thread dump: +AD4- Hi Andy, There is a second patch that goes with this problem. Please see the following attachment. Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust+AEA-netapp.com www.netapp.com
From 20a4067243f81c1417bf62ecea7697b79901926f Mon Sep 17 00:00:00 2001 From: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Tue, 19 Nov 2013 16:34:14 -0500 Subject: [PATCH] NFSv4: Update list of irrecoverable errors on DELEGRETURN If the DELEGRETURN errors out with something like NFS4ERR_BAD_STATEID then there is no recovery possible. Also, the client must not assume that the NFSv4 lease has been renewed when it sees an error on DELEGRETURN. Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Cc: stable@xxxxxxxxxxxxxxx --- fs/nfs/nfs4proc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 1f4edfbb4a70..aa16a22ad349 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -4988,10 +4988,14 @@ static void nfs4_delegreturn_done(struct rpc_task *task, void *calldata) trace_nfs4_delegreturn_exit(&data->args, &data->res, task->tk_status); switch (task->tk_status) { - case -NFS4ERR_STALE_STATEID: - case -NFS4ERR_EXPIRED: case 0: renew_lease(data->res.server, data->timestamp); + case -NFS4ERR_ADMIN_REVOKED: + case -NFS4ERR_DELEG_REVOKED: + case -NFS4ERR_BAD_STATEID: + case -NFS4ERR_OLD_STATEID: + case -NFS4ERR_STALE_STATEID: + case -NFS4ERR_EXPIRED: break; default: if (nfs4_async_handle_error(task, data->res.server, NULL) == -- 1.8.3.1