Re: [RFC PATCH 3/9] NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD

"NeilBrown" <neilb@xxxxxxx> · Wed, 09 Oct 2024 08:54:27 +1100

On Wed, 09 Oct 2024, cel@xxxxxxxxxx wrote:
> From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> 
> RFC 7862 permits callback services to respond to CB_OFFLOAD with
> NFS4ERR_DELAY. Currently NFSD drops the CB_OFFLOAD in that case.
> 
> To improve the reliability of COPY offload, NFSD should rather send
> another CB_OFFLOAD completion notification.
> 
> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> ---
>  fs/nfsd/nfs4proc.c | 8 ++++++++
>  fs/nfsd/xdr4.h     | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index a3c564a9596c..02e73ebbfe5c 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1613,6 +1613,13 @@ static int nfsd4_cb_offload_done(struct nfsd4_callback *cb,
>  		container_of(cb, struct nfsd4_cb_offload, co_cb);
>  
>  	trace_nfsd_cb_offload_done(&cbo->co_res.cb_stateid, task);
> +	switch (task->tk_status) {
> +	case -NFS4ERR_DELAY:
> +		if (cbo->co_retries--) {
> +			rpc_delay(task, 1 * HZ);
> +			return 0;

Is 5 tries at 1 second interval really sufficient?
It is common to double the delay on each retry failure, so delays of 
1,2,4,8,16 would give at total of 30 seconds for the client to get over
whatever congestion is affecting it.  That seems safer.

NeilBrown

> +		}
> +	}
>  	return 1;
>  }
>  
> @@ -1742,6 +1749,7 @@ static void nfsd4_send_cb_offload(struct nfsd4_copy *copy)
>  	memcpy(&cbo->co_res, &copy->cp_res, sizeof(copy->cp_res));
>  	memcpy(&cbo->co_fh, &copy->fh, sizeof(copy->fh));
>  	cbo->co_nfserr = copy->nfserr;
> +	cbo->co_retries = 5;
>  
>  	nfsd4_init_cb(&cbo->co_cb, copy->cp_clp, &nfsd4_cb_offload_ops,
>  		      NFSPROC4_CLNT_CB_OFFLOAD);
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index dec29afa43f3..cd2bf63651e3 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -675,6 +675,7 @@ struct nfsd4_cb_offload {
>  	struct nfsd4_callback	co_cb;
>  	struct nfsd42_write_res	co_res;
>  	__be32			co_nfserr;
> +	unsigned int		co_retries;
>  	struct knfsd_fh		co_fh;
>  };
>  
> -- 
> 2.46.2
> 
> 
>