On Wed, Oct 23, 2024 at 01:37:02PM +1100, NeilBrown wrote: > If there are more non-courteous clients than the calculated limit, we > should fail the request rather than report a soft failure and > encouraging the client to retry indefinitely. Discussion: This change has the potential to cause behavior regressions. I'm not sure how clients will behave (eg, what error is reported to administrators) if EXCH_ID / SETCLIENTID returns SERVERFAULT. I can't find a more suitable status code than SERVERFAULT, however. There is also the question of whether CREATE_SESSION, which also might fail when server resources are over-extended, could return a similar hard failure. (CREATE_SESSION has other spec-mandated restrictions around using NFS4ERR_DELAY, however). > The only hard failure allowed for EXCHANGE_ID that doesn't clearly have > some other meaning is NFS4ERR_SERVERFAULT. So use that, but explain why > in a comment at each place that it is returned. > > If there are courteous clients which push us over the limit, then expedite > their removal. > > This is not known to have caused a problem is production use, but The current DELAY behavior is known to trigger an (interruptible) infinite loop when a small-memory server can't create a new session. Thus I believe the infinite loop behavior is a real issue that has been observed and reported. > testing of lots of clients reports repeated NFS4ERR_DELAY responses > which doesn't seem helpful. No argument from me. NFSD takes the current approach for exactly the reason you mention above: there isn't a good choice of status code to return in this case. Nit: the description might better explain how this change is related to or required by on-demand thread allocation. It seems a little orthogonal to me right now. NBD. > Also remove an outdated comment - we do use a slab cache. > > Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> > Signed-off-by: NeilBrown <neilb@xxxxxxx> > --- > fs/nfsd/nfs4state.c | 30 ++++++++++++++++++++---------- > 1 file changed, 20 insertions(+), 10 deletions(-) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index 56b261608af4..ca6b5b52f77d 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -2212,21 +2212,20 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn) > return 1; > } > > -/* > - * XXX Should we use a slab cache ? > - * This type of memory management is somewhat inefficient, but we use it > - * anyway since SETCLIENTID is not a common operation. > - */ > static struct nfs4_client *alloc_client(struct xdr_netobj name, > struct nfsd_net *nn) > { > struct nfs4_client *clp; > int i; > > - if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) { > + if (atomic_read(&nn->nfs4_client_count) - > + atomic_read(&nn->nfsd_courtesy_clients) >= nn->nfs4_max_clients) > + return ERR_PTR(-EUSERS); > + > + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients && > + atomic_read(&nn->nfsd_courtesy_clients) > 0) > mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); > - return NULL; > - } > + > clp = kmem_cache_zalloc(client_slab, GFP_KERNEL); > if (clp == NULL) > return NULL; > @@ -3121,8 +3120,8 @@ static struct nfs4_client *create_client(struct xdr_netobj name, > struct dentry *dentries[ARRAY_SIZE(client_files)]; > > clp = alloc_client(name, nn); > - if (clp == NULL) > - return NULL; > + if (IS_ERR_OR_NULL(clp)) > + return clp; > > ret = copy_cred(&clp->cl_cred, &rqstp->rq_cred); > if (ret) { > @@ -3504,6 +3503,11 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > new = create_client(exid->clname, rqstp, &verf); > if (new == NULL) > return nfserr_jukebox; > + if (IS_ERR(new)) > + /* Protocol has no specific error for "client limit reached". > + * NFS4ERR_RESOURCE is not permitted for EXCHANGE_ID > + */ > + return nfserr_serverfault; > status = copy_impl_id(new, exid); > if (status) > goto out_nolock; > @@ -4422,6 +4426,12 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > new = create_client(clname, rqstp, &clverifier); > if (new == NULL) > return nfserr_jukebox; > + if (IS_ERR(new)) > + /* Protocol has no specific error for "client limit reached". > + * NFS4ERR_RESOURCE, while allowed for SETCLIENTID, implies > + * that a smaller COMPOUND might be successful. > + */ > + return nfserr_serverfault; > spin_lock(&nn->client_lock); > conf = find_confirmed_client_by_name(&clname, nn); > if (conf && client_has_state(conf)) { > -- > 2.46.0 > -- Chuck Lever