[RFC] memcg: fix race between css_offline and async charge (was: Re: Possible regression with cgroups in 3.11)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes, what do you think about something like this?
I have just compile tested it.
--- 
>From 73042adc905847bfe401ae12073d1c479db8fdab Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@xxxxxxx>
Date: Wed, 13 Nov 2013 13:53:54 +0100
Subject: [PATCH] memcg: fix race between css_offline and async charge

As correctly pointed out by Johannes, charges done on behalf of a group
where the current task is not its member (swap accounting currently) are
racy wrt. memcg offlining (mem_cgroup_css_offline).

This might lead to a charge leak as follows:

                           rcu_read_lock()
                           css_tryget()
                           rcu_read_unlock()
disable tryget()
call_rcu()
  offline_css()
    reparent_charges()
                           res_counter_charge()
                           css_put()
                             css_free()
                           pc->mem_cgroup = deadcg
                           add page to lru

If a group has a parent then the parent's res_counter would have a
charge which doesn't have any corresponding page on any reachable LRUs
under its hierarchy and so it won't be able to free/reparent its own
charges when going away and end up looping in reparent_charges for ever.

This patch fixes the issue by introducing memcg->offline flag which is
set when memcg is offlined (and the memcg is not reachable anymore).

The only async charger we have currently (swapin accounting path) checks
the offline status after successful charge and uncharges and falls back
to charge the current task if the group is offline now.

Spotted-by: Johannes Weiner <hannes@xxxxxxxxxxx>
Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
---
 mm/memcontrol.c |   38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d1fded477ef6..c75c7244d96d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -312,6 +312,7 @@ struct mem_cgroup {
 	spinlock_t pcp_counter_lock;
 
 	atomic_t	dead_count;
+	bool		offline;
 #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_INET)
 	struct tcp_memcontrol tcp_mem;
 #endif
@@ -3794,6 +3795,24 @@ void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
 	preempt_enable();
 }
 
+/*
+ * memcg is marked offline before it reparents its charges
+ * and any async charger has to recheck mem_cgroup_is_offline
+ * after successful charge to make sure that the memcg hasn't
+ * go offline in the meantime.
+ */
+static void mem_cgroup_mark_offline(struct mem_cgroup *memcg)
+{
+	memcg->offline = true;
+	smp_wmb();
+}
+
+static bool mem_cgroup_is_offline(struct mem_cgroup *memcg)
+{
+	smp_rmb();
+	return memcg->offline;
+}
+
 /**
  * mem_cgroup_move_account - move account of the page
  * @page: the page
@@ -4006,6 +4025,24 @@ static int __mem_cgroup_try_charge_swapin(struct mm_struct *mm,
 		goto charge_cur_mm;
 	*memcgp = memcg;
 	ret = __mem_cgroup_try_charge(NULL, mask, 1, memcgp, true);
+
+	/*
+	 * try_get_mem_cgroup_from_page and the actual charge are not
+	 * done in the same RCU read section which means that the memcg
+	 * might get offlined before res counter is charged so we have
+	 * to recheck the memcg status again here and revert that charge
+	 * as we cannot be sure it was accounted properly.
+	 */
+	if (!ret) {
+		if (mem_cgroup_is_offline(memcg)) {
+			__mem_cgroup_cancel_charge(memcg, 1);
+			/* from try_get_mem_cgroup_from_page */
+			css_put(&memcg->css);
+			*memcgp = NULL;
+			goto charge_cur_mm;
+		}
+	}
+
 	css_put(&memcg->css);
 	if (ret == -EINTR)
 		ret = 0;
@@ -6342,6 +6379,7 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 
 	kmem_cgroup_css_offline(memcg);
 
+	mem_cgroup_mark_offline(memcg);
 	mem_cgroup_invalidate_reclaim_iterators(memcg);
 	mem_cgroup_reparent_charges(memcg);
 	mem_cgroup_destroy_all_caches(memcg);
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux