Re: [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 24 Oct 2024, Chuck Lever wrote:
> On Wed, Oct 23, 2024 at 01:37:02PM +1100, NeilBrown wrote:
> > If there are more non-courteous clients than the calculated limit, we
> > should fail the request rather than report a soft failure and
> > encouraging the client to retry indefinitely.
> 
> Discussion:
> 
> This change has the potential to cause behavior regressions. I'm not
> sure how clients will behave (eg, what error is reported to
> administrators) if EXCH_ID / SETCLIENTID returns SERVERFAULT.

Linux client will issues a pr_warn() and return EIO for mount.
If lease recovery is needed (e.g.  server boot) then I think the client
will retry indefinitely.  There isn't much else it can do.

> 
> I can't find a more suitable status code than SERVERFAULT, however.

Maybe we should never fail.  Always allow at least 1 slot ??

> 
> There is also the question of whether CREATE_SESSION, which also
> might fail when server resources are over-extended, could return a
> similar hard failure. (CREATE_SESSION has other spec-mandated
> restrictions around using NFS4ERR_DELAY, however).

Would that only be in the case of kmalloc() failure?  I think that is a
significantly different circumstance.  kmalloc() for small values never
fails in practice (citation needed).  Caches get cleans and pages are
written to swap and big processes are killed until something can be
freed.

This contrasts with that code in exchange_id which tries to guess an
amount of memory that shouldn't put too much burden on the server and so
should always be safe to allocate without risking OOM killing.

> 
> 
> > The only hard failure allowed for EXCHANGE_ID that doesn't clearly have
> > some other meaning is NFS4ERR_SERVERFAULT.  So use that, but explain why
> > in a comment at each place that it is returned.
> > 
> > If there are courteous clients which push us over the limit, then expedite
> > their removal.
> > 
> > This is not known to have caused a problem is production use, but
> 
> The current DELAY behavior is known to trigger an (interruptible)
> infinite loop when a small-memory server can't create a new session.
> Thus I believe the infinite loop behavior is a real issue that has
> been observed and reported.

I knew it had been observed with test code.  I didn't know it had be
reported for a genuine use-case.

> 
> 
> > testing of lots of clients reports repeated NFS4ERR_DELAY responses
> > which doesn't seem helpful.
> 
> No argument from me. NFSD takes the current approach for exactly the
> reason you mention above: there isn't a good choice of status code
> to return in this case.
> 
> Nit: the description might better explain how this change is related
> to or required by on-demand thread allocation. It seems a little
> orthogonal to me right now. NBD.

Yes, it is a bit of a tangent.  It might be seen as prep for the next
patch, but maybe it is completely independent..

Thanks,
NeilBrown


> 
> 
> > Also remove an outdated comment - we do use a slab cache.
> > 
> > Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > ---
> >  fs/nfsd/nfs4state.c | 30 ++++++++++++++++++++----------
> >  1 file changed, 20 insertions(+), 10 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 56b261608af4..ca6b5b52f77d 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -2212,21 +2212,20 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn)
> >  	return 1;
> >  }
> >  
> > -/* 
> > - * XXX Should we use a slab cache ?
> > - * This type of memory management is somewhat inefficient, but we use it
> > - * anyway since SETCLIENTID is not a common operation.
> > - */
> >  static struct nfs4_client *alloc_client(struct xdr_netobj name,
> >  				struct nfsd_net *nn)
> >  {
> >  	struct nfs4_client *clp;
> >  	int i;
> >  
> > -	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) {
> > +	if (atomic_read(&nn->nfs4_client_count) -
> > +	    atomic_read(&nn->nfsd_courtesy_clients) >= nn->nfs4_max_clients)
> > +		return ERR_PTR(-EUSERS);
> > +
> > +	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients &&
> > +	    atomic_read(&nn->nfsd_courtesy_clients) > 0)
> >  		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
> > -		return NULL;
> > -	}
> > +
> >  	clp = kmem_cache_zalloc(client_slab, GFP_KERNEL);
> >  	if (clp == NULL)
> >  		return NULL;
> > @@ -3121,8 +3120,8 @@ static struct nfs4_client *create_client(struct xdr_netobj name,
> >  	struct dentry *dentries[ARRAY_SIZE(client_files)];
> >  
> >  	clp = alloc_client(name, nn);
> > -	if (clp == NULL)
> > -		return NULL;
> > +	if (IS_ERR_OR_NULL(clp))
> > +		return clp;
> >  
> >  	ret = copy_cred(&clp->cl_cred, &rqstp->rq_cred);
> >  	if (ret) {
> > @@ -3504,6 +3503,11 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  	new = create_client(exid->clname, rqstp, &verf);
> >  	if (new == NULL)
> >  		return nfserr_jukebox;
> > +	if (IS_ERR(new))
> > +		/* Protocol has no specific error for "client limit reached".
> > +		 * NFS4ERR_RESOURCE is not permitted for EXCHANGE_ID
> > +		 */
> > +		return nfserr_serverfault;
> >  	status = copy_impl_id(new, exid);
> >  	if (status)
> >  		goto out_nolock;
> > @@ -4422,6 +4426,12 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  	new = create_client(clname, rqstp, &clverifier);
> >  	if (new == NULL)
> >  		return nfserr_jukebox;
> > +	if (IS_ERR(new))
> > +		/* Protocol has no specific error for "client limit reached".
> > +		 * NFS4ERR_RESOURCE, while allowed for SETCLIENTID, implies
> > +		 * that a smaller COMPOUND might be successful.
> > +		 */
> > +		return nfserr_serverfault;
> >  	spin_lock(&nn->client_lock);
> >  	conf = find_confirmed_client_by_name(&clname, nn);
> >  	if (conf && client_has_state(conf)) {
> > -- 
> > 2.46.0
> > 
> 
> -- 
> Chuck Lever
> 






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux