On Fri, Oct 26, 2018 at 10:57:35AM +0200, Michal Hocko wrote: > Spock doesn't seem to be cced here - fixed now > > On Tue 23-10-18 16:43:29, Roman Gushchin wrote: > > Spock reported that the commit 172b06c32b94 ("mm: slowly shrink slabs > > with a relatively small number of objects") leads to a regression on > > his setup: periodically the majority of the pagecache is evicted > > without an obvious reason, while before the change the amount of free > > memory was balancing around the watermark. > > > > The reason behind is that the mentioned above change created some > > minimal background pressure on the inode cache. The problem is that > > if an inode is considered to be reclaimed, all belonging pagecache > > page are stripped, no matter how many of them are there. So, if a huge > > multi-gigabyte file is cached in the memory, and the goal is to > > reclaim only few slab objects (unused inodes), we still can eventually > > evict all gigabytes of the pagecache at once. > > > > The workload described by Spock has few large non-mapped files in the > > pagecache, so it's especially noticeable. > > > > To solve the problem let's postpone the reclaim of inodes, which have > > more than 1 attached page. Let's wait until the pagecache pages will > > be evicted naturally by scanning the corresponding LRU lists, and only > > then reclaim the inode structure. > > Has this actually fixed/worked around the issue? Spock wrote this earlier to me directly. I believe I can quote it here: "Patch applied, looks good so far. System behaves like it was with pre-4.18.15 kernels. Also tried to add some user-level tests to the geneic background activity, like - stat'ing a bunch of files - streamed read several large files at once on ext4 and XFS - random reads on the whole collection with a read size of 16K I will be monitoring while fragmentation stacks up and report back if something bad happens." Spock, please let me know if you have any new results. Thanks!