[LSF/MM/BPF TOPIC] shrinker changes for non-blocking inode reclaim

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Thu, 30 Jan 2020 16:24:32 -0800

Hi all,

Dave Chinner sent a big long patchset[1] that changes the behavior of mm
shrinker which will make it easier for us to make reclaim of xfs inodes
not block on IO.  I don't want to lead this discussion and nominate Dave
to do so, but if he cannot make it to LSF then I will take over.

---

In a nutshell, we actually can make XFS inode reclaim mostly nonblocking
right now by shifting responsibility for doing one last flush of the
ondisk metadata to the XFS log.  After that, memory reclaim "merely" has
to poke the log ... but doing this naïvely causes IO storms issued from
log pokes started during direct reclaim.  Shifting that to kswapd
results in unnecessary OOMs in direct reclaim because we failed to free
enough resources even though we're on our way to being able to free
resources.

What we need are some fairly minor changes to how the shrinkers work --
first, XFS needs to be able to communicate to a caller of its shrinker
that we freed X items, but we can free another Y items from another
context (e.g. kswapd).  Second, we need a way to actually do that work
from a less-restrictive context (kswapd) and to have direct reclaim
throttle itself if kswapd is busy actually doing the work that it can't
do.  Third, we need to teach kswapd how to discover that we're running
IO as fast as we can and that it needs to wait a little while to let us
catch up.

So the question(s) are: What do people think of these changes to
shrinker behavior?  Are they acceptable to the mm and fs communities?
If so, how do we stage these changes in tandem with the XFS changes so
that we can commit these new features and a user of them in the same
kernel cycle?

--D

[1] https://lore.kernel.org/linux-xfs/20191031234618.15403-1-david@xxxxxxxxxxxxx/