On Sat, Jun 18, 2022 at 07:52:45AM +1000, Dave Chinner wrote: > On Fri, Jun 17, 2022 at 12:34:38PM -0400, Brian Foster wrote: > > On Thu, Jun 16, 2022 at 08:04:15AM +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > Currently inodegc work can sit queued on the per-cpu queue until > > > the workqueue is either flushed of the queue reaches a depth that > > > triggers work queuing (and later throttling). This means that we > > > could queue work that waits for a long time for some other event to > > > trigger flushing. > > > > > > Hence instead of just queueing work at a specific depth, use a > > > delayed work that queues the work at a bound time. We can still > > > schedule the work immediately at a given depth, but we no long need > > > to worry about leaving a number of items on the list that won't get > > > processed until external events prevail. > > > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > > --- > > > fs/xfs/xfs_icache.c | 36 ++++++++++++++++++++++-------------- > > > fs/xfs/xfs_mount.h | 2 +- > > > fs/xfs/xfs_super.c | 2 +- > > > 3 files changed, 24 insertions(+), 16 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > > > index 374b3bafaeb0..46b30ecf498c 100644 > > > --- a/fs/xfs/xfs_icache.c > > > +++ b/fs/xfs/xfs_icache.c > > ... > > > @@ -2176,7 +2184,7 @@ xfs_inodegc_shrinker_scan( > > > unsigned int h = READ_ONCE(gc->shrinker_hits); > > > > > > WRITE_ONCE(gc->shrinker_hits, h + 1); > > > - queue_work_on(cpu, mp->m_inodegc_wq, &gc->work); > > > + mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, 0); > > > no_items = false; > > > } > > > > This all seems reasonable to me, but is there much practical benefit to > > shrinker infra/feedback just to expedite a delayed work item by one > > jiffy? Maybe there's a use case to continue to trigger throttling..? > > I haven't really considered doing anything other than fixing the > reported bug. That just requires an API conversion for the existing > "queue immediately" semantics and is the safest minimum change > to fix the issue at hand. > > So, yes, the shrinker code may (or may not) be superfluous now, but > I haven't looked at it and done analysis of the behaviour without > the shrinkers enabled. I'll do that in a completely separate > patchset if it turns out that it is not needed now. I think the shrinker part is still necessary -- bulkstat and xfs_scrub on a very low memory machine (~560M RAM) opening and closing tens of millions of files can still OOM the machine if one doesn't have a means to slow down ->destroy_inode (and hence the next open()) when reclaim really starts to dig in. Without the shrinker bits, it's even easier to trigger OOM storms when xfs has timer-delayed inactivation... which is something that Brian pointed out a year ago when we were reviewing the initial inodegc patchset. > > If > > so, it looks like decent enough overhead to cycle through every cpu in > > both callbacks that it might be worth spelling out more clearly in the > > top-level comment. > > I'm not sure what you are asking here - mod_delayed_work_on() has > pretty much the same overhead and behaviour as queue_work() in this > case, so... ? <shrug> Looks ok to me, since djwong-dev has had some variant of timer delayed inactivation in it longer than it hasn't: Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx