Re: [PATCH 1/2] xfs: bound maximum wait time for inodegc work

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Tue, 21 Jun 2022 22:20:46 -0700

On Sat, Jun 18, 2022 at 07:52:45AM +1000, Dave Chinner wrote:
> On Fri, Jun 17, 2022 at 12:34:38PM -0400, Brian Foster wrote:
> > On Thu, Jun 16, 2022 at 08:04:15AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > 
> > > Currently inodegc work can sit queued on the per-cpu queue until
> > > the workqueue is either flushed of the queue reaches a depth that
> > > triggers work queuing (and later throttling). This means that we
> > > could queue work that waits for a long time for some other event to
> > > trigger flushing.
> > > 
> > > Hence instead of just queueing work at a specific depth, use a
> > > delayed work that queues the work at a bound time. We can still
> > > schedule the work immediately at a given depth, but we no long need
> > > to worry about leaving a number of items on the list that won't get
> > > processed until external events prevail.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/xfs_icache.c | 36 ++++++++++++++++++++++--------------
> > >  fs/xfs/xfs_mount.h  |  2 +-
> > >  fs/xfs/xfs_super.c  |  2 +-
> > >  3 files changed, 24 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > index 374b3bafaeb0..46b30ecf498c 100644
> > > --- a/fs/xfs/xfs_icache.c
> > > +++ b/fs/xfs/xfs_icache.c
> > ...
> > > @@ -2176,7 +2184,7 @@ xfs_inodegc_shrinker_scan(
> > >  			unsigned int	h = READ_ONCE(gc->shrinker_hits);
> > >  
> > >  			WRITE_ONCE(gc->shrinker_hits, h + 1);
> > > -			queue_work_on(cpu, mp->m_inodegc_wq, &gc->work);
> > > +			mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, 0);
> > >  			no_items = false;
> > >  		}
> > 
> > This all seems reasonable to me, but is there much practical benefit to
> > shrinker infra/feedback just to expedite a delayed work item by one
> > jiffy? Maybe there's a use case to continue to trigger throttling..?
> 
> I haven't really considered doing anything other than fixing the
> reported bug. That just requires an API conversion for the existing
> "queue immediately" semantics and is the safest minimum change
> to fix the issue at hand.
> 
> So, yes, the shrinker code may (or may not) be superfluous now, but
> I haven't looked at it and done analysis of the behaviour without
> the shrinkers enabled. I'll do that in a completely separate
> patchset if it turns out that it is not needed now.

I think the shrinker part is still necessary -- bulkstat and xfs_scrub
on a very low memory machine (~560M RAM) opening and closing tens of
millions of files can still OOM the machine if one doesn't have a means
to slow down ->destroy_inode (and hence the next open()) when reclaim
really starts to dig in.  Without the shrinker bits, it's even easier to
trigger OOM storms when xfs has timer-delayed inactivation... which is
something that Brian pointed out a year ago when we were reviewing the
initial inodegc patchset.

> > If
> > so, it looks like decent enough overhead to cycle through every cpu in
> > both callbacks that it might be worth spelling out more clearly in the
> > top-level comment.
> 
> I'm not sure what you are asking here - mod_delayed_work_on() has
> pretty much the same overhead and behaviour as queue_work() in this
> case, so... ?

<shrug> Looks ok to me, since djwong-dev has had some variant of timer
delayed inactivation in it longer than it hasn't:

Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx