On Thu, 20 Mar 2014, Greg KH wrote: > On Wed, Mar 19, 2014 at 10:59:27PM -0700, Hugh Dickins wrote: > > On Wed, 19 Mar 2014, gregkh@xxxxxxxxxxxxxxxxxxx wrote: > > > > > > The patch below does not apply to the 3.13-stable tree. > > > If someone wants it applied there, or to any other stable or longterm > > > tree, then please email the backport, including the original git commit > > > id to <stable@xxxxxxxxxxxxxxx>. > > > > > > thanks, > > > > > > greg k-h > > > > > ------------------ version for 3.13.7 ------------------ > > Thanks, now applied. Thanks for including this in the 3.13.7-stable review series, Greg. But I notice there wasn't one in the 3.10.34-stable review series: I now think that I should have interpreted your FAILED on 3.13 mail as a prompt to send equivalents for the earlier releases too, sorry. The version for Jiri's 3.12.15 would be the same as that for 3.13.7. But the version for 3.10.34 (or perhaps now 3.10.35) is this below. Yes, more differences, and the old mem_cgroup_reparent_charges line is intentionally left in for 3.10 whereas it was removed for 3.12+: that's because the css/cgroup iterator changed in between, it used not to supply the root of the subtree, but nowadays it does. Thanks, Hugh ------------------ version for 3.10.34 ------------------ >From 4fb1a86fb5e4209a7d4426d4e586c58e9edc74ac Mon Sep 17 00:00:00 2001 From: Filipe Brandenburger <filbranden@xxxxxxxxxx> Date: Mon, 3 Mar 2014 15:38:25 -0800 Subject: [PATCH] memcg: reparent charges of children before processing parent Sometimes the cleanup after memcg hierarchy testing gets stuck in mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0. There may turn out to be several causes, but a major cause is this: the workitem to offline parent can get run before workitem to offline child; parent's mem_cgroup_reparent_charges() circles around waiting for the child's pages to be reparented to its lrus, but it's holding cgroup_mutex which prevents the child from reaching its mem_cgroup_reparent_charges(). Further testing showed that an ordered workqueue for cgroup_destroy_wq is not always good enough: percpu_ref_kill_and_confirm's call_rcu_sched stage on the way can mess up the order before reaching the workqueue. Instead, when offlining a memcg, call mem_cgroup_reparent_charges() on all its children (and grandchildren, in the correct order) to have their charges reparented first. Fixes: e5fca243abae ("cgroup: use a dedicated workqueue for cgroup destruction") Signed-off-by: Filipe Brandenburger <filbranden@xxxxxxxxxx> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> Reviewed-by: Tejun Heo <tj@xxxxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> [v3.10+] Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6326,9 +6326,23 @@ static void mem_cgroup_invalidate_reclai static void mem_cgroup_css_offline(struct cgroup *cont) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); + struct cgroup *iter; mem_cgroup_invalidate_reclaim_iterators(memcg); + + /* + * This requires that offlining is serialized. Right now that is + * guaranteed because css_killed_work_fn() holds the cgroup_mutex. + */ + rcu_read_lock(); + cgroup_for_each_descendant_post(iter, cont) { + rcu_read_unlock(); + mem_cgroup_reparent_charges(mem_cgroup_from_cont(iter)); + rcu_read_lock(); + } + rcu_read_unlock(); mem_cgroup_reparent_charges(memcg); + mem_cgroup_destroy_all_caches(memcg); } -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html