Re: [PATCH] nfsd: fallback to sync COPY if async not possible

"NeilBrown" <neilb@xxxxxxx> · Tue, 05 Nov 2024 07:18:03 +1100

On Tue, 05 Nov 2024, Jeff Layton wrote:
> On Mon, 2024-11-04 at 15:47 +1100, NeilBrown wrote:
> > An NFSv4.2 COPY request can explicitly request a synchronous copy.  If
> > that is not requested then the server is free to perform the copy
> > synchronously or asynchronously.
> > 
> > In the Linux implementation an async copy requires more resources than a
> > sync copy.  If nfsd cannot allocate these resources, the best response
> > is to simply perform the copy (or the first 4MB of it) synchronously.
> > 
> 
> Where does the copy get clamped at 4MB?
> 
> > This choice may be debatable if the unavailable resource was due to
> > memory allocation failure - when memalloc fails it might be best to
> > simply give up as the server is clearly under load.  However in the case
> > that policy prevents another kthread being created there is no benefit
> > and much cost is failing with NFS4ERR_DELAY.  In that case it seems
> > reasonable to avoid that error in all circumstances.
> > 
> > So change the out_err case to retry as a sync copy.
> > 
> > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
> > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > ---
> >  fs/nfsd/nfs4proc.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index fea171ffed62..06e0d9153ca9 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  		wake_up_process(async_copy->copy_task);
> >  		status = nfs_ok;
> >  	} else {
> > +	retry_sync:
> >  		status = nfsd4_do_copy(copy, copy->nf_src->nf_file,
> >  				       copy->nf_dst->nf_file, true);
> >  	}
> > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  	}
> >  	if (async_copy)
> >  		cleanup_async_copy(async_copy);
> > -	status = nfserr_jukebox;
> > -	goto out;
> > +	goto retry_sync;
> >  }
> >  
> >  static struct nfsd4_copy *
> > 
> > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44
> 
> 
> You're probably right that attempting to do this synchronously is
> probably best. That should avoid another RPC, though at the cost of
> having to shovel the data into the pagecache.

COPY always shovels data into the pagecache.  The difference between
sync and async is:
 - sync runs in nfsd thread, async run in a workqueue
 - sync stops after 4MB, async loops around after each 4MB until it
 finishes
(plus the obvious protocol differences).

> 
> If memory is very tight, I suppose the synchronous copy might also fail
> with NFS4ERR_DELAY, in which case we're no worse off.

Memory being tight isn't the interesting case.  More COPYs than nfsd
threads is the interesting case.

> 
> Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>

Thanks,
NeilBrown

>