Re: [PATCH v1 2/2] NFSD: Change the filecache laundrette workqueue again

"NeilBrown" <neilb@xxxxxxx> · Mon, 06 Jan 2025 13:47:52 +1100

On Mon, 06 Jan 2025, Chuck Lever wrote:
> On 1/5/25 5:28 PM, NeilBrown wrote:
> > On Sun, 05 Jan 2025, Chuck Lever wrote:
> >> On 1/3/25 5:53 PM, Tejun Heo wrote:
> >>> On Sat, Jan 04, 2025 at 09:50:32AM +1100, NeilBrown wrote:
> >>>> On Fri, 03 Jan 2025, cel@xxxxxxxxxx wrote:
> >>> ...
> >>>> I think that instead of passing "list_lru_count()" we should pass some
> >>>> constant like 1024.
> >>>>
> >>>> cnt = list_lru_count()
> >>>> while (cnt > 0) {
> >>>>      num = min(cnt, 1024);
> >>>>      list_lru_walk(...., num);
> >>>>      cond_sched()
> >>>>      cnt -= num;
> >>>> }
> >>>>
> >>>> Then run it from system_wq.
> >>>>
> >>>> list_lru_shrink is most often called as list_lru_shrink_walk() from a
> >>>> shrinker, and the pattern there is essentially that above.  A count is
> >>>> taken, possibly scaled down, then the shrinker is called in batches.
> >>>
> >>> BTW, there's nothing wrong with taking some msecs or even tens of msecs
> >>> running on system_unbound_wq, so the current state may be fine too.
> >>
> >> My thinking was that this work is low priority, so there should be
> >> plenty of opportunity to set it aside for a few moments and handle
> >> higher priority work. Maybe not worrisome on systems with a high core
> >> count, but on small SMP (eg VM guests), I've found that tasks like this
> >> can be rude neighbors.
> > 
> > None of the different work queues are expected to "set aside" the work
> > for more than a normal scheduling time slice.
> > system_long_wq was created "so that flush_scheduled_work() doesn't take
> > so long" but flush_scheduled_work() is never called now (and would
> > generate a warning if it was) so system_long_wq should really be
> > removed.
> > 
> > system_unbound_wq uses threads that are not bound to a CPU so the
> > scheduler can move them around.  That is most suitable for something
> > that might run for a long time.
> 
> That's not my understanding: unbound selects a CPU to run the work
> item on, and it can be (and usually is) not the same CPU as the one
> that invoked queue_work(). Then it isn't rescheduled. The work items
> are expected to complete quickly; work item termination is the typical
> reschedule point. My understanding, as always, could be stale.

If the work item will never schedule, then there isn't much point in
using a work queue.  The code can be run inline, or with a timer.

> 
> But that's neither here nor there: I've dropped the patch to replace
> system_unbound_wq with system_long_wq.
> 
> 
> > system_wq is bound to the CPU that schedules the work, but it can run
> > multiple work items - potentially long running ones - without trouble by
> > helping the scheduler share the CPU among them.  This requires that they
> > can sleep of course.
> > 
> > I haven't seen the actually symptoms that resulted in the various
> > changes to this code, but that last point is I think the key one.  The
> > work item, like all kernel code, needs to have scheduler points if it
> > might run for a longish time.  list_lru_walk() doesn't contain any
> > scheduling points so giving a large "nr_to_walk" is always a bad idea.
> 
> That might be a overstated... I don't see other consumers that are so
> concerned about rescheduling. But then it's not clear if they are
> dealing with lengthy LRU lists.

There are very few callers of list_lru_walk().
shrink_dcache_sb() is the closest analogue and it uses batches of 1024.
xfs_buftarg_drain() uses batches of LONG_MAX !!!

Mostly the lru is walked in a shrinker callback, and the shrinker
manages the batch size.

Thanks,
NeilBrown

> 
> 
> >> We could do this by adding a cond_resched() call site in the loop,
> >> or take Neil's suggestion of breaking up the free list across multiple
> >> work items that handle one or just a few file releases each.
> > 
> > I guess I should send a proper patch.
> 
> More comments in reply to that patch. Generally speaking, I'm
> comfortable chopping up the work as you propose, we just have to
> address some details.
> 
> 
> -- 
> Chuck Lever
>