Hi all, Dave Chinner sent a big long patchset[1] that changes the behavior of mm shrinker which will make it easier for us to make reclaim of xfs inodes not block on IO. I don't want to lead this discussion and nominate Dave to do so, but if he cannot make it to LSF then I will take over. --- In a nutshell, we actually can make XFS inode reclaim mostly nonblocking right now by shifting responsibility for doing one last flush of the ondisk metadata to the XFS log. After that, memory reclaim "merely" has to poke the log ... but doing this naïvely causes IO storms issued from log pokes started during direct reclaim. Shifting that to kswapd results in unnecessary OOMs in direct reclaim because we failed to free enough resources even though we're on our way to being able to free resources. What we need are some fairly minor changes to how the shrinkers work -- first, XFS needs to be able to communicate to a caller of its shrinker that we freed X items, but we can free another Y items from another context (e.g. kswapd). Second, we need a way to actually do that work from a less-restrictive context (kswapd) and to have direct reclaim throttle itself if kswapd is busy actually doing the work that it can't do. Third, we need to teach kswapd how to discover that we're running IO as fast as we can and that it needs to wait a little while to let us catch up. So the question(s) are: What do people think of these changes to shrinker behavior? Are they acceptable to the mm and fs communities? If so, how do we stage these changes in tandem with the XFS changes so that we can commit these new features and a user of them in the same kernel cycle? --D [1] https://lore.kernel.org/linux-xfs/20191031234618.15403-1-david@xxxxxxxxxxxxx/