+ mm-dont-reclaim-inodes-with-many-attached-pages.patch added to -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Wed, 24 Oct 2018 15:20:34 -0700

The patch titled
     Subject: mm: don't reclaim inodes with many attached pages
has been added to the -mm tree.  Its filename is
     mm-dont-reclaim-inodes-with-many-attached-pages.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-dont-reclaim-inodes-with-many-attached-pages.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-dont-reclaim-inodes-with-many-attached-pages.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Roman Gushchin <guro@xxxxxx>
Subject: mm: don't reclaim inodes with many attached pages

Spock reported that the commit 172b06c32b94 ("mm: slowly shrink slabs with
a relatively small number of objects") leads to a regression on his setup:
periodically the majority of the pagecache is evicted without an obvious
reason, while before the change the amount of free memory was balancing
around the watermark.

The reason behind is that the mentioned above change created some minimal
background pressure on the inode cache.  The problem is that if an inode
is considered to be reclaimed, all belonging pagecache page are stripped,
no matter how many of them are there.  So, if a huge multi-gigabyte file
is cached in the memory, and the goal is to reclaim only few slab objects
(unused inodes), we still can eventually evict all gigabytes of the
pagecache at once.

The workload described by Spock has few large non-mapped files in the
pagecache, so it's especially noticeable.

To solve the problem let's postpone the reclaim of inodes, which have more
than 1 attached page.  Let's wait until the pagecache pages will be
evicted naturally by scanning the corresponding LRU lists, and only then
reclaim the inode structure.

Link: http://lkml.kernel.org/r/20181023164302.20436-1-guro@xxxxxx
Signed-off-by: Roman Gushchin <guro@xxxxxx>
Reported-by: Spock <dairinin@xxxxxxxxx>
Reviewed-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxxx>
Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 fs/inode.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/fs/inode.c~mm-dont-reclaim-inodes-with-many-attached-pages
+++ a/fs/inode.c
@@ -730,8 +730,11 @@ static enum lru_status inode_lru_isolate
 		return LRU_REMOVED;
 	}
 
-	/* recently referenced inodes get one more pass */
-	if (inode->i_state & I_REFERENCED) {
+	/*
+	 * Recently referenced inodes and inodes with many attached pages
+	 * get one more pass.
+	 */
+	if (inode->i_state & I_REFERENCED || inode->i_data.nrpages > 1) {
 		inode->i_state &= ~I_REFERENCED;
 		spin_unlock(&inode->i_lock);
 		return LRU_ROTATE;
_

Patches currently in -mm which might be from guro@xxxxxx are

mm-dont-reclaim-inodes-with-many-attached-pages.patch
mm-rework-memcg-kernel-stack-accounting.patch
mm-drain-memcg-stocks-on-css-offlining.patch
mm-dont-miss-the-last-page-because-of-round-off-error.patch
mm-dont-miss-the-last-page-because-of-round-off-error-fix.patch
mm-dont-raise-memcg_oom-event-due-to-failed-high-order-allocation.patch