Re: [PATCH 3/3] NFSD: Fix server reboot hang problem when callback workqueue is stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 15, 2023 at 11:15:03AM -0800, Dai Ngo wrote:
> If the callback workqueue is stuck, nfsd4_deleg_getattr_conflict will
> also stuck waiting for the callback request to be executed. This causes
> the client to hang waiting for the reply of the GETATTR and also causes
> the reboot of the NFS server to hang due to the pending NFS request.
> 
> Fix by replacing wait_on_bit with wait_on_bit_timeout with 20 seconds
> time out.
> 
> Reported-by: Wolfgang Walter <linux-nfs@xxxxxxx>
> Fixes: 6c41d9a9bd02 ("NFSD: handle GETATTR conflict with write delegation")
> Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
> ---
>  fs/nfsd/nfs4state.c | 6 +++++-
>  fs/nfsd/state.h     | 2 ++
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 175f3e9f5822..0cc7d4953807 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -2948,6 +2948,9 @@ void nfs4_cb_getattr(struct nfs4_cb_fattr *ncf)
>  	if (test_and_set_bit(CB_GETATTR_BUSY, &ncf->ncf_cb_flags))
>  		return;
>  
> +	/* set to proper status when nfsd4_cb_getattr_done runs */
> +	ncf->ncf_cb_status = NFS4ERR_IO;
> +
>  	refcount_inc(&dp->dl_stid.sc_count);
>  	if (!nfsd4_run_cb(&ncf->ncf_getattr)) {
>  		refcount_dec(&dp->dl_stid.sc_count);
> @@ -8558,7 +8561,8 @@ nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct inode *inode,
>  			nfs4_cb_getattr(&dp->dl_cb_fattr);
>  			spin_unlock(&ctx->flc_lock);
>  
> -			wait_on_bit(&ncf->ncf_cb_flags, CB_GETATTR_BUSY, TASK_INTERRUPTIBLE);
> +			wait_on_bit_timeout(&ncf->ncf_cb_flags, CB_GETATTR_BUSY,
> +				TASK_INTERRUPTIBLE, NFSD_CB_GETATTR_TIMEOUT);

I'm still thinking the timeout here should be the same (or slightly
longer than) the RPC retransmit timeout, rather than adding a new
NFSD_CB_GETATTR_TIMEOUT macro.


>  			if (ncf->ncf_cb_status) {
>  				status = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ));
>  				if (status != nfserr_jukebox ||
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index f96eaa8e9413..94563a6813a6 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -135,6 +135,8 @@ struct nfs4_cb_fattr {
>  /* bits for ncf_cb_flags */
>  #define	CB_GETATTR_BUSY		0
>  
> +#define	NFSD_CB_GETATTR_TIMEOUT	msecs_to_jiffies(20000) /* 20 secs */
> +
>  /*
>   * Represents a delegation stateid. The nfs4_client holds references to these
>   * and they are put when it is being destroyed or when the delegation is
> -- 
> 2.39.3
> 

-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux