Re: [PATCH 04/26] xfs: Improve metadata buffer reclaim accountability

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 4 Nov 2019 08:26:50 +1100

On Thu, Oct 31, 2019 at 02:05:51PM -0700, Darrick J. Wong wrote:
> On Fri, Nov 01, 2019 at 07:50:49AM +1100, Dave Chinner wrote:
> > On Wed, Oct 30, 2019 at 08:06:58PM -0700, Darrick J. Wong wrote:
> > > > In the case of the xfs_bufs, I've been running workloads recently
> > > > that cache several million xfs_bufs and only a handful of inodes
> > > > rather than the other way around. If we spread inodes because
> > > > caching millions on a single node can cause problems on large NUMA
> > > > machines, then we also need to spread xfs_bufs...
> > > 
> > > Hmm, could we capture this as a comment somewhere?
> > 
> > Sure, but where? We're planning on getting rid of the KM_ZONE flags
> > in the near future, and most of this is specific to the impacts on
> > XFS. I could put it in xfs-super.c above where we initialise all the
> > slabs, I guess. Probably a separate patch, though....
> 
> Sounds like a reasonable place (to me) to record the fact that we want
> inodes and metadata buffers not to end up concentrating on a single node.

Ok. I'll add yet another patch to the preliminary part of the
series. Any plans to take any of these first few patches in this
cycle?

> At least until we start having NUMA systems with a separate "IO node" in
> which to confine all the IO threads and whatnot <shudder>. :P

Been there, done that, got the t-shirt and wore it out years ago.

IO-only nodes (either via software configuration, or real
cpu/memory-less IO nodes) are one of the reasons we don't want
node-local allocation behaviour for large NUMA configs...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx