Re: [PATCH 1/6] nfsd: filecache: remove race handling.

"NeilBrown" <neilb@xxxxxxx> · Thu, 13 Feb 2025 09:16:50 +1100

On Tue, 11 Feb 2025, Dave Chinner wrote:
> On Fri, Feb 07, 2025 at 04:15:11PM +1100, NeilBrown wrote:
> > The race that this code tries to protect against is not interesting.
> > The code is problematic as we access the "nf" after we have given our
> > reference to the lru system.  While that take 2+ seconds to free things
> > it is still poor form.
> > 
> > The only interesting race I can find would be with
> > nfsd_file_close_inode_sync();
> > This is the only place that really doesn't want the file to stay on the
> > LRU when unhashed (which is the direct consequence of the race).
> > 
> > However for the race to happen, some other thread must own a reference
> > to a file and be putting in while nfsd_file_close_inode_sync() is trying
> > to close all files for an inode.  If this is possible, that other thread
> > could simply call nfsd_file_put() a little bit later and the result
> > would be the same: not all files are closed when
> > nfsd_file_close_inode_sync() completes.
> > 
> > If this was really a problem, we would need to wait in close_inode_sync
> > for the other references to be dropped.  We probably don't want to do
> > that.
> > 
> > So it is best to simply remove this code.
> > 
> > Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > ---
> >  fs/nfsd/filecache.c | 16 +++-------------
> >  1 file changed, 3 insertions(+), 13 deletions(-)
> > 
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index a1cdba42c4fa..b13255bcbb96 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -371,20 +371,10 @@ nfsd_file_put(struct nfsd_file *nf)
> >  
> >  		/* Try to add it to the LRU.  If that fails, decrement. */
> >  		if (nfsd_file_lru_add(nf)) {
> > -			/* If it's still hashed, we're done */
> > -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> > -				nfsd_file_schedule_laundrette();
> > -				return;
> > -			}
> > -
> > -			/*
> > -			 * We're racing with unhashing, so try to remove it from
> > -			 * the LRU. If removal fails, then someone else already
> > -			 * has our reference.
> > -			 */
> > -			if (!nfsd_file_lru_remove(nf))
> > -				return;
> > +			nfsd_file_schedule_laundrette();
> > +			return;
> 
> Why do we need to start the background gc on demand?  When the nfsd
> subsystem is under load it is going to be calling
> nfsd_file_schedule_laundrette() all the time. However, gc is almost
> always going to be queued/running already.
> 
> i.e. the above code results in adding the overhead of an atomic
> NFSD_FILE_CACHE_UP bit check and then a call to queue_delayed_work()
> on an already queued item. This will disables interrupts, make an
> atomic bit check that sees the work is queued, re-enable interrupts
> and return.

I don't think we really need the NFSD_FILE_CACHE_UP test - if we have a
file to put, then the cache must be up.
And we could use delayed_work_pending() to avoid the irq disable.
That is still one atomic bit test though.

> 
> IMO, there is no need to do this unnecessary work on every object
> that is added to the LRU.  Changing the gc worker to always run
> every 2s and check if it has work to do like so:
> 
>  static void
>  nfsd_file_gc_worker(struct work_struct *work)
>  {
> -	nfsd_file_gc();
> -	if (list_lru_count(&nfsd_file_lru))
> -		nfsd_file_schedule_laundrette();
> +	if (list_lru_count(&nfsd_file_lru))
> +		nfsd_file_gc();
> +	nfsd_file_schedule_laundrette();
>  }
> 
> means that nfsd_file_gc() will be run the same way and have the same
> behaviour as the current code. When the system it idle, it does a
> list_lru_count() check every 2 seconds and goes back to sleep.
> That's going to be pretty much unnoticable on most machines that run
> NFS servers.

When serving a v4 only load we don't need the timer at all...  Maybe
that doesn't matter.

I would rather use a timer instead of a delayed work as the task doesn't
require sleeping, and we have a pool of nfsd threads to do any work that
might be needed.  So a timer that checks if work is needed then sets a
flag and wakes an nfsd would be even lower impact that a delayed work
which needs to wake a wq thread every time even if there is nothing to
do.

Thanks,
NeilBrown