On Tue, Nov 05, 2024 at 07:30:03AM +1100, NeilBrown wrote: > On Tue, 05 Nov 2024, Chuck Lever wrote: > > On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote: > > > > > > An NFSv4.2 COPY request can explicitly request a synchronous copy. If > > > that is not requested then the server is free to perform the copy > > > synchronously or asynchronously. > > > > > > In the Linux implementation an async copy requires more resources than a > > > sync copy. If nfsd cannot allocate these resources, the best response > > > is to simply perform the copy (or the first 4MB of it) synchronously. > > > > > > This choice may be debatable if the unavailable resource was due to > > > memory allocation failure - when memalloc fails it might be best to > > > simply give up as the server is clearly under load. However in the case > > > that policy prevents another kthread being created there is no benefit > > > and much cost is failing with NFS4ERR_DELAY. In that case it seems > > > reasonable to avoid that error in all circumstances. > > > > > > So change the out_err case to retry as a sync copy. > > > > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") > > > > Hi Neil, > > > > Why is a Fixes: tag necessary? > > > > And why that commit? async copies can fail due to lack of resources > > on kernels that don't have aadc3bbea163, AFAICT. > > I had hoped my commit message would have explained that, though I accept > it was not as explicit as it could be. The problem might be that you and I have different understandings of what exactly aadc3bbea163 does. > kmalloc(GFP_KERNEL) allocation failures aren't interesting. They never > happen for smallish sizes, and if they do then the server is so borked > that it hardly matter what we do. > > The fixed commit introduces a new failure mode that COULD easily be hit > in practice. It causes the N+1st COPY to wait indefinitely until at > least one other copy completes which, as you observed in that commit, > could "run for a long time". I don't think that behaviour is necessary > or appropriate. The waiting happens on the client. An async COPY operation always completes quickly on the server, in this case with NFS4ERR_DELAY. It does not tie up an nfsd thread. By the way, there are two fixes in this area that now appear in v6.12-rc6 that you should check out. > Changing the handling for kmalloc failure was just an irrelevant > side-effect for changing the behaviour when then number of COPY requests > exceeded the number of configured threads. aadc3bbea163 checks the number of concurrent /async/ COPY requests, which do not tie up nfsd threads, and thus are not limited by the svc_thread count, as synchronous COPY operations are by definition. I'm still thinking about the ramifications of converting an async COPY to a sync COPY in this case. We want to reduce the server workload in this case, rather than accommodate an aggressive client. > This came up because CVE-2024-49974 was created so I had to do something > about the theoretical DoS vector in SLE kernels. I didn't like the > patch so I backported > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous") > > instead (and wondered why it hadn't gone to stable). I was conservative about requesting a backport here. However, if a CVE has been filed, and if there is no automation behind that process, you can explicitly request aadc3bbea163 be backported. The problem, to me, was less about server resource depletion and more about client hangs. > Thanks, > NeilBrown > > > > > > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > > --- > > > fs/nfsd/nfs4proc.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > > index fea171ffed62..06e0d9153ca9 100644 > > > --- a/fs/nfsd/nfs4proc.c > > > +++ b/fs/nfsd/nfs4proc.c > > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > wake_up_process(async_copy->copy_task); > > > status = nfs_ok; > > > } else { > > > + retry_sync: > > > status = nfsd4_do_copy(copy, copy->nf_src->nf_file, > > > copy->nf_dst->nf_file, true); > > > } > > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > } > > > if (async_copy) > > > cleanup_async_copy(async_copy); > > > - status = nfserr_jukebox; > > > - goto out; > > > + goto retry_sync; > > > } > > > > > > static struct nfsd4_copy * > > > > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44 > > > -- > > > 2.47.0 > > > > > > > -- > > Chuck Lever > > > -- Chuck Lever