Re: [PATCH 2/2] xfs: run blockgc on freeze to avoid iget stalls after reclaim

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 2 Feb 2022 13:22:40 +1100

On Mon, Jan 24, 2022 at 11:57:12AM -0500, Brian Foster wrote:
> On Wed, Jan 19, 2022 at 04:36:36PM -0800, Darrick J. Wong wrote:
> > > Of course if you wanted to recycle inactive inodes or do something else
> > > entirely, then it's probably not worth going down this path..
> > 
> > I'm a bit partial to /trying/ to recycle inactive inodes because (a)
> > it's less tangling with -fsdevel for you and (b) inode scans in the
> > online repair patchset got a little weird once xfs_iget lost the ability
> > to recycle _NEEDS_INACTIVE inodes...
> > 
> > OTOH I tried to figure out how to deal with the lockless list that those
> > inodes are put on, and I couldn't figure out how to get them off the
> > list safely, so that might be a dead end.  If you have any ideas I'm all
> > ears. :)
> > 
> 
> So one of the things that I've been kind of unclear on about the current
> deferred inactivation implementation is the primary goal of the percpu
> optimization. I obviously can see the value of the percpu list in
> general, but how much processing needs to occur in percpu context to
> achieve the primary goal?
> 
> For example, I can see how a single or small multi threaded sustained
> file removal might be batched efficiently, but what happens if said task
> happens to bounce around many cpus?

In that case we have a scheduler problem, not a per-cpu
infrastructure issue.

> What if a system has hundreds of
> cpus and enough removal tasks to populate most or all of the
> queues?

It behaves identically to before the per-cpu inactivation queues
were added. Essentially, everything serialises and burns CPU
spinning on the CIL push lock regardless of where the work is coming
from. The per-cpu queues do not impact this behaviour at all, nor do
they change the distribution of the work that needs to be done.

Even Darrick's original proposal had this same behaviour:

https://lore.kernel.org/linux-xfs/20210801234709.GD2757197@xxxxxxxxxxxxxxxxxxx/

> Is
> it worth having 200 percpu workqueue tasks doing block truncations and
> inode frees to a filesystem that might have something like 16-32 AGs?

If you have a workload with 200-way concurrency that hits a
filesystem path with 16-32 way concurrency because of per-AG
locking (e.g. unlink needs to lock the AGI twice - once to put the
inode on the unlinked list, then again to remove and free it),
then you're only going to get 16-32 way concurrency from your
workload regardless of whether you have per-cpu algorithms for parts
of the workload.