Re: [PATCH RFC] sunrpc: Ensure signalled RPC tasks exit

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 1 Apr 2020 21:55:26 +0000

On Wed, 2020-04-01 at 15:37 -0400, Chuck Lever wrote:
> If an RPC task is signaled while it is running and the transport is
> not connected, it will never sleep and never be terminated. This can
> happen when a RPC transport is shut down: the remaining tasks are
> signalled, but the transport is disconnected.
> 
> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> ---
>  net/sunrpc/sched.c |   14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> Interested in comments and suggestions.
> 
> Nearly every time my NFS/RDMA client unmounts when using krb5, the
> umount hangs (killably). I tracked it down to an NFSv3 NULL request
> that is signalled but loops and does not exit.
> 
> 
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 55e900255b0c..905c31f75593 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -905,6 +905,12 @@ static void __rpc_execute(struct rpc_task *task)
>  		trace_rpc_task_run_action(task, do_action);
>  		do_action(task);
>  
> +		if (RPC_SIGNALLED(task)) {
> +			task->tk_rpc_status = -ERESTARTSYS;
> +			rpc_exit(task, -ERESTARTSYS);
> +			break;
> +		}
> +

Hmm... I'd really prefer to avoid this kind of check in the tight loop.
Why is this NULL request looping?

>  		/*
>  		 * Lockless check for whether task is sleeping or not.
>  		 */
> @@ -912,14 +918,6 @@ static void __rpc_execute(struct rpc_task *task)
>  			continue;
>  
>  		/*
> -		 * Signalled tasks should exit rather than sleep.
> -		 */
> -		if (RPC_SIGNALLED(task)) {
> -			task->tk_rpc_status = -ERESTARTSYS;
> -			rpc_exit(task, -ERESTARTSYS);
> -		}
> -
> -		/*
>  		 * The queue->lock protects against races with
>  		 * rpc_make_runnable().
>  		 *
> 
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx