On Mon, 22 Apr 2024, Chuck Lever wrote: > On Mon, Apr 22, 2024 at 12:09:19PM +1000, NeilBrown wrote: > > The calculation of how many clients the nfs server can manage is only an > > heuristic. Triggering the laundromat to clean up old clients when we > > have more than the heuristic limit is valid, but refusing to create new > > clients is not. Client creation should only fail if there really isn't > > enough memory available. > > > > This is not known to have caused a problem is production use, but > > testing of lots of clients reports an error and it is not clear that > > this error is justified. > > It is justified, see 4271c2c08875 ("NFSD: limit the number of v4 > clients to 1024 per 1GB of system memory"). In cases like these, > the recourse is to add more memory to the test system. Does each client really need 1MB? Obviously we don't want all memory to be used by client state.... > > However, that commit claims that the client is told to retry; I > don't expect client creation to fail outright. Can you describe the > failure mode you see? The failure mode is repeated client retries that never succeed. I think an outright failure would be preferable - it would be more clear than memory must be added. The server has N active clients and M courtesy clients. Triggering reclaim when N+M exceeds a limit and M>0 makes sense. A hard failure (NFS4ERR_RESOURCE) when N exceeds a limit makes sense. A soft failure (NFS4ERR_DELAY) while reclaim is running makes sense. I don't think a retry while N exceeds the limit makes sense. Do we have a count of active vs courtesy clients? > > Meanwhile, we need to have broader and more regular testing of NFSD > on memory-starved systems. That's a long-term project. :-) Thanks, NeilBrown > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > --- > > fs/nfsd/nfs4state.c | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > index daf83823ba48..8a40bb6a4a67 100644 > > --- a/fs/nfsd/nfs4state.c > > +++ b/fs/nfsd/nfs4state.c > > @@ -2223,10 +2223,9 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name, > > struct nfs4_client *clp; > > int i; > > > > - if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) { > > + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) > > mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); > > - return NULL; > > - } > > + > > clp = kmem_cache_zalloc(client_slab, GFP_KERNEL); > > if (clp == NULL) > > return NULL; > > -- > > 2.44.0 > > > > > > -- > Chuck Lever >