+ mm-reparent-memcg-kmem_caches-on-cgroup-removal-fix.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm: memcg/slab: properly handle kmem_caches reparented to root_mem_cgroup
has been added to the -mm tree.  Its filename is
     mm-reparent-memcg-kmem_caches-on-cgroup-removal-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-reparent-memcg-kmem_caches-on-cgroup-removal-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-reparent-memcg-kmem_caches-on-cgroup-removal-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Roman Gushchin <guro@xxxxxx>
Subject: mm: memcg/slab: properly handle kmem_caches reparented to root_mem_cgroup

As a result of reparenting a kmem_cache might belong to the root memory
cgroup.  It happens when a top-level memory cgroup is removed, and all
associated kmem_caches are reparented to the root memory cgroup.

The root memory cgroup is special, and requires a special handling.  Let's
make sure that we don't try to charge or uncharge it, and we handle
system-wide vmstats exactly as for root kmem_caches.

Note, that we still need to alter the kmem_cache reference counter, so
that the kmem_cache can be released properly.

The issue was discovered by running CRIU tests; the following warning did
appear:

[  381.345960] WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62
page_counter_cancel+0x26/0x30
[  381.345992] Modules linked in:
[  381.345998] CPU: 0 PID: 11655 Comm: kworker/0:8 Not tainted
5.2.0-rc5-next-20190618+ #1
[  381.346001] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[  381.346010] Workqueue: memcg_kmem_cache kmemcg_workfn
[  381.346013] RIP: 0010:page_counter_cancel+0x26/0x30
[  381.346017] Code: 1f 44 00 00 0f 1f 44 00 00 48 89 f0 53 48 f7 d8
f0 48 0f c1 07 48 29 f0 48 89 c3 48 89 c6 e8 61 ff ff ff 48 85 db 78
02 5b c3 <0f> 0b 5b c3 66 0f 1f 44 00 00 0f 1f 44 00 00 48 85 ff 74 41
41 55
[  381.346019] RSP: 0018:ffffb3b34319f990 EFLAGS: 00010086
[  381.346022] RAX: fffffffffffffffc RBX: fffffffffffffffc RCX: 0000000000000004
[  381.346024] RDX: 0000000000000000 RSI: fffffffffffffffc RDI: ffff9c2cd7165270
[  381.346026] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000001
[  381.346028] R10: 00000000000000c8 R11: ffff9c2cd684e660 R12: 00000000fffffffc
[  381.346030] R13: 0000000000000002 R14: 0000000000000006 R15: ffff9c2c8ce1f200
[  381.346033] FS:  0000000000000000(0000) GS:ffff9c2cd8200000(0000)
knlGS:0000000000000000
[  381.346039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  381.346041] CR2: 00000000007be000 CR3: 00000001cdbfc005 CR4: 00000000001606f0
[  381.346043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  381.346045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  381.346047] Call Trace:
[  381.346054]  page_counter_uncharge+0x1d/0x30
[  381.346065]  __memcg_kmem_uncharge_memcg+0x39/0x60
[  381.346071]  __free_slab+0x34c/0x460
[  381.346079]  deactivate_slab.isra.80+0x57d/0x6d0
[  381.346088]  ? add_lock_to_list.isra.36+0x9c/0xf0
[  381.346095]  ? __lock_acquire+0x252/0x1410
[  381.346106]  ? cpumask_next_and+0x19/0x20
[  381.346110]  ? slub_cpu_dead+0xd0/0xd0
[  381.346113]  flush_cpu_slab+0x36/0x50
[  381.346117]  ? slub_cpu_dead+0xd0/0xd0
[  381.346125]  on_each_cpu_mask+0x51/0x70
[  381.346131]  ? ksm_migrate_page+0x60/0x60
[  381.346134]  on_each_cpu_cond_mask+0xab/0x100
[  381.346143]  __kmem_cache_shrink+0x56/0x320
[  381.346150]  ? ret_from_fork+0x3a/0x50
[  381.346157]  ? unwind_next_frame+0x73/0x480
[  381.346176]  ? __lock_acquire+0x252/0x1410
[  381.346188]  ? kmemcg_workfn+0x21/0x50
[  381.346196]  ? __mutex_lock+0x99/0x920
[  381.346199]  ? kmemcg_workfn+0x21/0x50
[  381.346205]  ? kmemcg_workfn+0x21/0x50
[  381.346216]  __kmemcg_cache_deactivate_after_rcu+0xe/0x40
[  381.346220]  kmemcg_cache_deactivate_after_rcu+0xe/0x20
[  381.346223]  kmemcg_workfn+0x31/0x50
[  381.346230]  process_one_work+0x23c/0x5e0
[  381.346241]  worker_thread+0x3c/0x390
[  381.346248]  ? process_one_work+0x5e0/0x5e0
[  381.346252]  kthread+0x11d/0x140
[  381.346255]  ? kthread_create_on_node+0x60/0x60
[  381.346261]  ret_from_fork+0x3a/0x50
[  381.346275] irq event stamp: 10302
[  381.346278] hardirqs last  enabled at (10301): [<ffffffffb2c1a0b9>]
_raw_spin_unlock_irq+0x29/0x40
[  381.346282] hardirqs last disabled at (10302): [<ffffffffb2182289>]
on_each_cpu_mask+0x49/0x70
[  381.346287] softirqs last  enabled at (10262): [<ffffffffb2191f4a>]
cgroup_idr_replace+0x3a/0x50
[  381.346290] softirqs last disabled at (10260): [<ffffffffb2191f2d>]
cgroup_idr_replace+0x1d/0x50
[  381.346293] ---[ end trace b324ba73eb3659f0 ]---

v2: fixed return value from memcg_charge_slab(), spotted by Shakeel

Link: http://lkml.kernel.org/r/20190620213427.1691847-1-guro@xxxxxx
Signed-off-by: Roman Gushchin <guro@xxxxxx>
Reported-by: Andrei Vagin <avagin@xxxxxxxxx>
Reviewed-by: Shakeel Butt <shakeelb@xxxxxxxxxx>
Acked-by: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx>
Cc: Waiman Long <longman@xxxxxxxxxx>
Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
Cc: Pekka Enberg <penberg@xxxxxxxxxx>
Cc: Qian Cai <cai@xxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/slab.h |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

--- a/mm/slab.h~mm-reparent-memcg-kmem_caches-on-cgroup-removal-fix
+++ a/mm/slab.h
@@ -294,8 +294,12 @@ static __always_inline int memcg_charge_
 		memcg = parent_mem_cgroup(memcg);
 	rcu_read_unlock();
 
-	if (unlikely(!memcg))
-		return true;
+	if (unlikely(!memcg || mem_cgroup_is_root(memcg))) {
+		mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s),
+				    (1 << order));
+		percpu_ref_get_many(&s->memcg_params.refcnt, 1 << order);
+		return 0;
+	}
 
 	ret = memcg_kmem_charge_memcg(page, gfp, order, memcg);
 	if (ret)
@@ -324,9 +328,14 @@ static __always_inline void memcg_unchar
 
 	rcu_read_lock();
 	memcg = READ_ONCE(s->memcg_params.memcg);
-	lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg);
-	mod_lruvec_state(lruvec, cache_vmstat_idx(s), -(1 << order));
-	memcg_kmem_uncharge_memcg(page, order, memcg);
+	if (likely(!mem_cgroup_is_root(memcg))) {
+		lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg);
+		mod_lruvec_state(lruvec, cache_vmstat_idx(s), -(1 << order));
+		memcg_kmem_uncharge_memcg(page, order, memcg);
+	} else {
+		mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s),
+				    -(1 << order));
+	}
 	rcu_read_unlock();
 
 	percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order);
_

Patches currently in -mm which might be from guro@xxxxxx are

mm-postpone-kmem_cache-memcg-pointer-initialization-to-memcg_link_cache.patch
mm-rename-slab-delayed-deactivation-functions-and-fields.patch
mm-generalize-postponed-non-root-kmem_cache-deactivation.patch
mm-introduce-__memcg_kmem_uncharge_memcg.patch
mm-unify-slab-and-slub-page-accounting.patch
mm-dont-check-the-dying-flag-on-kmem_cache-creation.patch
mm-synchronize-access-to-kmem_cache-dying-flag-using-a-spinlock.patch
mm-rework-non-root-kmem_cache-lifecycle-management.patch
mm-stop-setting-page-mem_cgroup-pointer-for-slab-pages.patch
mm-reparent-memcg-kmem_caches-on-cgroup-removal.patch
mm-reparent-memcg-kmem_caches-on-cgroup-removal-fix.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux