+ list_lru-combine-code-under-the-same-define.patch added to -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Tue, 03 Jul 2018 13:59:18 -0700

The patch titled
     Subject: mm/list_lru.c: combine code under the same define
has been added to the -mm tree.  Its filename is
     list_lru-combine-code-under-the-same-define.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/list_lru-combine-code-under-the-same-define.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/list_lru-combine-code-under-the-same-define.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx>
Subject: mm/list_lru.c: combine code under the same define

Patch series "Improve shrink_slab() scalability (old complexity was O(n^2), new is O(n))", v8.

This patcheset solves the problem with slow shrink_slab() occuring on the
machines having many shrinkers and memory cgroups (i.e., with many
containers).  The problem is complexity of shrink_slab() is O(n^2) and it
grows too fast with the growth of containers numbers.

Let us have 200 containers, and every container has 10 mounts and 10
cgroups.  All container tasks are isolated, and they don't touch foreign
containers mounts.

In case of global reclaim, a task has to iterate all over the memcgs and
to call all the memcg-aware shrinkers for all of them.  This means, the
task has to visit 200 * 10 = 2000 shrinkers for every memcg, and since
there are 2000 memcgs, the total calls of do_shrink_slab() are 2000 * 2000
= 4000000.

4 million calls are not a number operations, which can takes 1 cpu cycle. 
E.g., super_cache_count() accesses at least two lists, and makes
arifmetical calculations.  Even, if there are no charged objects, we do
these calculations, and replaces cpu caches by read memory.  I observed
nodes spending almost 100% time in kernel, in case of intensive writing
and global reclaim.  The writer consumes pages fast, but it's need to
shrink_slab() before the reclaimer reached shrink pages function (and
frees SWAP_CLUSTER_MAX pages).  Even if there is no writing, the
iterations just waste the time, and slows reclaim down.

Let's see the small test below:

$echo 1 > /sys/fs/cgroup/memory/memory.use_hierarchy
$mkdir /sys/fs/cgroup/memory/ct
$echo 4000M > /sys/fs/cgroup/memory/ct/memory.kmem.limit_in_bytes
$for i in `seq 0 4000`;
        do mkdir /sys/fs/cgroup/memory/ct/$i;
        echo $$ > /sys/fs/cgroup/memory/ct/$i/cgroup.procs;
        mkdir -p s/$i; mount -t tmpfs $i s/$i; touch s/$i/file;
done

Then, let's see drop caches time (5 sequential calls):
$time echo 3 > /proc/sys/vm/drop_caches

0.00user 13.78system 0:13.78elapsed 99%CPU
0.00user 5.59system 0:05.60elapsed 99%CPU
0.00user 5.48system 0:05.48elapsed 99%CPU
0.00user 8.35system 0:08.35elapsed 99%CPU
0.00user 8.34system 0:08.35elapsed 99%CPU

Last four calls don't actually shrink something.  So, the iterations over
slab shrinkers take 5.48 seconds.  Not so good for scalability.

The patchset solves the problem by making shrink_slab() of O(n)
complexity.  There are following functional actions:

1) Assign id to every registered memcg-aware shrinker.

2) Maintain per-memcgroup bitmap of memcg-aware shrinkers, and set a
   shrinker-related bit after the first element is added to lru list
   (also, when removed child memcg elements are reparanted).

3) Split memcg-aware shrinkers and !memcg-aware shrinkers, and call a
   shrinker if its bit is set in memcg's shrinker bitmap.  (Also, there is
   a functionality to clear the bit, after last element is shrinked).

This gives significant performance increase.  The result after patchset is
applied:

$time echo 3 > /proc/sys/vm/drop_caches

0.00user 1.10system 0:01.10elapsed 99%CPU
0.00user 0.00system 0:00.01elapsed 64%CPU
0.00user 0.01system 0:00.01elapsed 82%CPU
0.00user 0.00system 0:00.01elapsed 64%CPU
0.00user 0.01system 0:00.01elapsed 82%CPU

The results show the performance increases at least in 548 times.

So, the patchset makes shrink_slab() of less complexity and improves the
performance in such types of load I pointed.  This will give a profit in
case of !global reclaim case, since there also will be less
do_shrink_slab() calls.


This patch (of 17):

These two pairs of blocks of code are under the same #ifdef #else #endif.

Link: http://lkml.kernel.org/r/153063052519.1818.9393587113056959488.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx>
Acked-by: Vladimir Davydov <vdavydov.dev@xxxxxxxxx>
Tested-by: Shakeel Butt <shakeelb@xxxxxxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Philippe Ombredanne <pombredanne@xxxxxxxx>
Cc: Sahitya Tummala <stummala@xxxxxxxxxxxxxx>
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx>
Cc: Roman Gushchin <guro@xxxxxx>
Cc: Matthias Kaehlcke <mka@xxxxxxxxxxxx>
Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Waiman Long <longman@xxxxxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: "Huang, Ying" <ying.huang@xxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Cc: Josef Bacik <jbacik@xxxxxx>
Cc: Guenter Roeck <linux@xxxxxxxxxxxx>
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: Li RongQing <lirongqing@xxxxxxxxx>
Cc: Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/list_lru.c |   18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff -puN mm/list_lru.c~list_lru-combine-code-under-the-same-define mm/list_lru.c

--- a/mm/list_lru.c~list_lru-combine-code-under-the-same-define
+++ a/mm/list_lru.c
@@ -29,17 +29,7 @@ static void list_lru_unregister(struct l
 	list_del(&lru->list);
 	mutex_unlock(&list_lrus_mutex);
 }
-#else
-static void list_lru_register(struct list_lru *lru)
-{
-}
-
-static void list_lru_unregister(struct list_lru *lru)
-{
-}
-#endif /* CONFIG_MEMCG && !CONFIG_SLOB */
 
-#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
 static inline bool list_lru_memcg_aware(struct list_lru *lru)
 {
 	/*
@@ -89,6 +79,14 @@ list_lru_from_kmem(struct list_lru_node
 	return list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg));
 }
 #else
+static void list_lru_register(struct list_lru *lru)
+{
+}
+
+static void list_lru_unregister(struct list_lru *lru)
+{
+}
+
 static inline bool list_lru_memcg_aware(struct list_lru *lru)
 {
 	return false;
_

Patches currently in -mm which might be from ktkhai@xxxxxxxxxxxxx are

memcg-remove-memcg_cgroup-id-from-idr-on-mem_cgroup_css_alloc-failure.patch
list_lru-combine-code-under-the-same-define.patch
mm-introduce-config_memcg_kmem-as-combination-of-config_memcg-config_slob.patch
mm-assign-id-to-every-memcg-aware-shrinker.patch
memcg-move-up-for_each_mem_cgroup-_tree-defines.patch
mm-assign-memcg-aware-shrinkers-bitmap-to-memcg.patch
mm-refactoring-in-workingset_init.patch
fs-refactoring-in-alloc_super.patch
fs-propagate-shrinker-id-to-list_lru.patch
list_lru-add-memcg-argument-to-list_lru_from_kmem.patch
list_lru-pass-dst_memcg-argument-to-memcg_drain_list_lru_node.patch
list_lru-pass-lru-argument-to-memcg_drain_list_lru_node.patch
mm-export-mem_cgroup_is_root.patch
mm-set-bit-in-memcg-shrinker-bitmap-on-first-list_lru-item-apearance.patch
mm-iterate-only-over-charged-shrinkers-during-memcg-shrink_slab.patch
mm-add-shrink_empty-shrinker-methods-return-value.patch
mm-clear-shrinker-bit-if-there-are-no-objects-related-to-memcg.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html