On Tue, Oct 08, 2024 at 05:54:27PM -0400, NeilBrown wrote: > On Wed, 09 Oct 2024, cel@xxxxxxxxxx wrote: > > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > RFC 7862 permits callback services to respond to CB_OFFLOAD with > > NFS4ERR_DELAY. Currently NFSD drops the CB_OFFLOAD in that case. > > > > To improve the reliability of COPY offload, NFSD should rather send > > another CB_OFFLOAD completion notification. > > > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > --- > > fs/nfsd/nfs4proc.c | 8 ++++++++ > > fs/nfsd/xdr4.h | 1 + > > 2 files changed, 9 insertions(+) > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > index a3c564a9596c..02e73ebbfe5c 100644 > > --- a/fs/nfsd/nfs4proc.c > > +++ b/fs/nfsd/nfs4proc.c > > @@ -1613,6 +1613,13 @@ static int nfsd4_cb_offload_done(struct nfsd4_callback *cb, > > container_of(cb, struct nfsd4_cb_offload, co_cb); > > > > trace_nfsd_cb_offload_done(&cbo->co_res.cb_stateid, task); > > + switch (task->tk_status) { > > + case -NFS4ERR_DELAY: > > + if (cbo->co_retries--) { > > + rpc_delay(task, 1 * HZ); > > + return 0; > > Is 5 tries at 1 second interval really sufficient? It doesn't matter, as long as the client can send an OFFLOAD_STATUS when it hasn't seen the expected CB_OFFLOAD. In fact IMO an even shorter delay would be better. This is not a situation where the service endpoint is waiting for a slow I/O device. The important part of this logic is the retry, not the delay. > It is common to double the delay on each retry failure, so delays of > 1,2,4,8,16 would give at total of 30 seconds for the client to get over > whatever congestion is affecting it. That seems safer. I didn't find other callback operations in NFSD that implemented exponential backoff. I could compromise and do .1 sec, .2 sec, .4 sec, .8 sec, 1.6 sec. > NeilBrown > > > + } > > + } > > return 1; > > } > > > > @@ -1742,6 +1749,7 @@ static void nfsd4_send_cb_offload(struct nfsd4_copy *copy) > > memcpy(&cbo->co_res, ©->cp_res, sizeof(copy->cp_res)); > > memcpy(&cbo->co_fh, ©->fh, sizeof(copy->fh)); > > cbo->co_nfserr = copy->nfserr; > > + cbo->co_retries = 5; > > > > nfsd4_init_cb(&cbo->co_cb, copy->cp_clp, &nfsd4_cb_offload_ops, > > NFSPROC4_CLNT_CB_OFFLOAD); > > diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h > > index dec29afa43f3..cd2bf63651e3 100644 > > --- a/fs/nfsd/xdr4.h > > +++ b/fs/nfsd/xdr4.h > > @@ -675,6 +675,7 @@ struct nfsd4_cb_offload { > > struct nfsd4_callback co_cb; > > struct nfsd42_write_res co_res; > > __be32 co_nfserr; > > + unsigned int co_retries; > > struct knfsd_fh co_fh; > > }; > > > > -- > > 2.46.2 > > > > > > > -- Chuck Lever