On Wed, 09 Oct 2024, cel@xxxxxxxxxx wrote: > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > RFC 7862 permits callback services to respond to CB_OFFLOAD with > NFS4ERR_DELAY. Currently NFSD drops the CB_OFFLOAD in that case. > > To improve the reliability of COPY offload, NFSD should rather send > another CB_OFFLOAD completion notification. > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > --- > fs/nfsd/nfs4proc.c | 8 ++++++++ > fs/nfsd/xdr4.h | 1 + > 2 files changed, 9 insertions(+) > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index a3c564a9596c..02e73ebbfe5c 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -1613,6 +1613,13 @@ static int nfsd4_cb_offload_done(struct nfsd4_callback *cb, > container_of(cb, struct nfsd4_cb_offload, co_cb); > > trace_nfsd_cb_offload_done(&cbo->co_res.cb_stateid, task); > + switch (task->tk_status) { > + case -NFS4ERR_DELAY: > + if (cbo->co_retries--) { > + rpc_delay(task, 1 * HZ); > + return 0; Is 5 tries at 1 second interval really sufficient? It is common to double the delay on each retry failure, so delays of 1,2,4,8,16 would give at total of 30 seconds for the client to get over whatever congestion is affecting it. That seems safer. NeilBrown > + } > + } > return 1; > } > > @@ -1742,6 +1749,7 @@ static void nfsd4_send_cb_offload(struct nfsd4_copy *copy) > memcpy(&cbo->co_res, ©->cp_res, sizeof(copy->cp_res)); > memcpy(&cbo->co_fh, ©->fh, sizeof(copy->fh)); > cbo->co_nfserr = copy->nfserr; > + cbo->co_retries = 5; > > nfsd4_init_cb(&cbo->co_cb, copy->cp_clp, &nfsd4_cb_offload_ops, > NFSPROC4_CLNT_CB_OFFLOAD); > diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h > index dec29afa43f3..cd2bf63651e3 100644 > --- a/fs/nfsd/xdr4.h > +++ b/fs/nfsd/xdr4.h > @@ -675,6 +675,7 @@ struct nfsd4_cb_offload { > struct nfsd4_callback co_cb; > struct nfsd42_write_res co_res; > __be32 co_nfserr; > + unsigned int co_retries; > struct knfsd_fh co_fh; > }; > > -- > 2.46.2 > > >