Re: [RFC PATCH] NFSD: Force all NFSv4.2 COPY requests to be synchronous

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 06, 2024 at 04:37:15PM -0700, Dai Ngo wrote:
> 
> On 5/6/24 2:04 PM, cel@xxxxxxxxxx wrote:
> > From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> > 
> > We've discovered that delivering a CB_OFFLOAD operation can be
> > unreliable in some pretty unremarkable situations,
> 
> Since the fore and back channel use the same connection so I assume
> this is not a connection related problem.

This is totally a connection-related problem. The underlying issue
is that NFSD does not retransmit backchannel requests when the
connection is lost, and the Linux NFS client does not implement
OFFLOAD_STATUS. Neither side right now recovers from connection
loss while a COPY operation is pending.


> Sounds like this is a bug that we should find and fix if possible
> instead of work around it.

I've been looking for a fix for the past several months. The last
fix I put in, you asked me to revert. So, I would prefer to fix
the root cause of this issue, but right now the best we can do is
create a surgical patch that can be backported to LTS kernels, and
keep working on a longer term fix.

It's either temporarily force all COPY operations to become
synchronous, or temporarily drop support for COPY in NFSD. Actually
the latter sounds safer.


> Do you know any scenarios where the CB_OFFLOAD operation is
> unreliable?

Any scenario where the connection is dropped (say, because the
server wants the client to retransmit forechannel requests, or
because of a GSS sequence number window under-run, or because of a
network partition, etc) can potentially result in the loss of a
backchannel operation.

I can reproduce this issue 100% of the time with an NFSv4.2 mount
from a 6.8.7-200.fc39.x86_64 NFS client, using the git regression
suite.


> > and the Linux
> > NFS client does not yet support sending an OFFLOAD_STATUS
> > operation to probe whether an asynchronous COPY operation has
> > finished. On Linux NFS clients, COPY can hang until manually
> > interrupted.
> > 
> > I've tried a couple of remedies, but so far the side-effects are
> > worse than the disease. For now, force COPY operations to be
> > synchronous so that the use of CB_OFFLOAD is avoided entirely.
> > 
> > I have some patches that add an OFFLOAD_STATUS implementation to the
> > Linux NFS client, but that is not likely to fix older clients.
> > 
> > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> > ---
> >   fs/nfsd/nfs4proc.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index ea3cc3e870a7..12722c709cc6 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -1807,6 +1807,13 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >   	__be32 status;
> >   	struct nfsd4_copy *async_copy = NULL;
> > +	/*
> > +	 * Currently, async COPY is not reliable. Force all COPY
> > +	 * requests to be synchronous to avoid client application
> > +	 * hangs waiting for completion.
> > +	 */
> > +	nfsd4_copy_set_sync(copy, true);
> > +
> >   	copy->cp_clp = cstate->clp;
> >   	if (nfsd4_ssc_is_inter(copy)) {
> >   		trace_nfsd_copy_inter(copy);
> > 
> > base-commit: 939cb14d51a150e3c12ef7a8ce0ba04ce6131bd2

-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux