During the reclaiming slab of a memcg, shrink_slab iterates over all registered shrinkers in the system, and tries to count and consume objects related to the cgroup. In case of memory pressure, this behaves bad: I observe high system time and time spent in list_lru_count_one() for many processes on RHEL7 kernel (collected via $perf record --call-graph fp -j k -a): 0,50% nixstatsagent [kernel.vmlinux] [k] _raw_spin_lock [k] _raw_spin_lock 0,26% nixstatsagent [kernel.vmlinux] [k] shrink_slab [k] shrink_slab 0,23% nixstatsagent [kernel.vmlinux] [k] super_cache_count [k] super_cache_count 0,15% nixstatsagent [kernel.vmlinux] [k] __list_lru_count_one.isra.2 [k] _raw_spin_lock 0,15% nixstatsagent [kernel.vmlinux] [k] list_lru_count_one [k] __list_lru_count_one.isra.2 0,94% mysqld [kernel.vmlinux] [k] _raw_spin_lock [k] _raw_spin_lock 0,57% mysqld [kernel.vmlinux] [k] shrink_slab [k] shrink_slab 0,51% mysqld [kernel.vmlinux] [k] super_cache_count [k] super_cache_count 0,32% mysqld [kernel.vmlinux] [k] __list_lru_count_one.isra.2 [k] _raw_spin_lock 0,32% mysqld [kernel.vmlinux] [k] list_lru_count_one [k] __list_lru_count_one.isra.2 0,73% sshd [kernel.vmlinux] [k] _raw_spin_lock [k] _raw_spin_lock 0,35% sshd [kernel.vmlinux] [k] shrink_slab [k] shrink_slab 0,32% sshd [kernel.vmlinux] [k] super_cache_count [k] super_cache_count 0,21% sshd [kernel.vmlinux] [k] __list_lru_count_one.isra.2 [k] _raw_spin_lock 0,21% sshd [kernel.vmlinux] [k] list_lru_count_one [k] __list_lru_count_one.isra.2 This patch aims to make super_cache_count() more effective. It makes __list_lru_count_one() count nr_items lockless to minimize overhead introducing by locking operation, and to make parallel reclaims more scalable. The lock won't be taken on shrinker::count_objects(), it would be taken only for the real shrink by the thread, who realizes it. https://jira.sw.ru/browse/PSBM-69296 Signed-off-by: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> --- mm/list_lru.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/list_lru.c b/mm/list_lru.c index 2db3cdadb577..8d1d2db5f4fb 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -177,10 +177,10 @@ static unsigned long __list_lru_count_one(struct list_lru *lru, struct list_lru_one *l; unsigned long count; - spin_lock(&nlru->lock); + rcu_read_lock(); l = list_lru_from_memcg_idx(nlru, memcg_idx); count = l->nr_items; - spin_unlock(&nlru->lock); + rcu_read_unlock(); return count; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>