On Wed, 19 Feb 2025, Dave Chinner wrote: > On Tue, Feb 18, 2025 at 10:39:32AM -0500, cel@xxxxxxxxxx wrote: > > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > Dave opines: > > > > IMO, there is no need to do this unnecessary work on every object > > that is added to the LRU. Changing the gc worker to always run > > every 2s and check if it has work to do like so: > > > > static void > > nfsd_file_gc_worker(struct work_struct *work) > > { > > - nfsd_file_gc(); > > - if (list_lru_count(&nfsd_file_lru)) > > - nfsd_file_schedule_laundrette(); > > + if (list_lru_count(&nfsd_file_lru)) > > + nfsd_file_gc(); > > + nfsd_file_schedule_laundrette(); > > } > > > > means that nfsd_file_gc() will be run the same way and have the same > > behaviour as the current code. When the system it idle, it does a > > list_lru_count() check every 2 seconds and goes back to sleep. > > That's going to be pretty much unnoticable on most machines that run > > NFS servers. > > > > Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx> > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > --- > > fs/nfsd/filecache.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > > index 909b5bc72bd3..2933cba1e5f4 100644 > > --- a/fs/nfsd/filecache.c > > +++ b/fs/nfsd/filecache.c > > @@ -549,9 +549,9 @@ nfsd_file_gc(void) > > static void > > nfsd_file_gc_worker(struct work_struct *work) > > { > > - nfsd_file_gc(); > > + nfsd_file_schedule_laundrette(); > > if (list_lru_count(&nfsd_file_lru)) > > - nfsd_file_schedule_laundrette(); > > + nfsd_file_gc(); > > } > > IMO, the scheduling of new work is the wrong way around. It should > be done on completion of gc work, not before gc work is started. > > i.e. If nfsd_file_gc() is overly delayed (because load, rt preempt, > etc), then a new gc worker will be started in 2s regardless of > whether the currently running gc worker has completed or not. > > Worse case, there's a spinlock hang bug in nfsd_file_gc(). This code > will end up with N worker threads all spinning up in nfsd_file_gc() > chewing up all the CPU in the system, not making any progress.... > If we schedule new work after completion of this work, then gc might > hang but it won't slowly drag the entire system down with it. While I agree that the enqueue is best done later rather than earlier, I think your worst-case is over-stated. queue_delayed_work() is a no-op if WORK_STRUCT_PENDING_BIT is still set. A given work_struct can only be running once. If the timer fires while nfsd_file_gc() is still running, nfsd_filecache_laundrette will be queued to start immediately that the currently running instance completes. So the worst cases is that there will always be one instance running. Thanks, NeilBrown