This is a note to let you know that I've just added the patch titled memcg: fix css reference leak and endless loop in mem_cgroup_iter to the 3.13-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: memcg-fix-css-reference-leak-and-endless-loop-in-mem_cgroup_iter.patch and it can be found in the queue-3.13 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 0eef615665ede1e0d603ea9ecca88c1da6f02234 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@xxxxxxx> Date: Thu, 23 Jan 2014 15:53:37 -0800 Subject: memcg: fix css reference leak and endless loop in mem_cgroup_iter From: Michal Hocko <mhocko@xxxxxxx> commit 0eef615665ede1e0d603ea9ecca88c1da6f02234 upstream. Commit 19f39402864e ("memcg: simplify mem_cgroup_iter") has reorganized mem_cgroup_iter code in order to simplify it. A part of that change was dropping an optimization which didn't call css_tryget on the root of the walked tree. The patch however didn't change the css_put part in mem_cgroup_iter which excludes root. This wasn't an issue at the time because __mem_cgroup_iter_next bailed out for root early without taking a reference as cgroup iterators (css_next_descendant_pre) didn't visit root themselves. Nevertheless cgroup iterators have been reworked to visit root by commit bd8815a6d802 ("cgroup: make css_for_each_descendant() and friends include the origin css in the iteration") when the root bypass have been dropped in __mem_cgroup_iter_next. This means that css_put is not called for root and so css along with mem_cgroup and other cgroup internal object tied by css lifetime are never freed. Fix the issue by reintroducing root check in __mem_cgroup_iter_next and do not take css reference for it. This reference counting magic protects us also from another issue, an endless loop reported by Hugh Dickins when reclaim races with root removal and css_tryget called by iterator internally would fail. There would be no other nodes to visit so __mem_cgroup_iter_next would return NULL and mem_cgroup_iter would interpret it as "start looping from root again" and so mem_cgroup_iter would loop forever internally. Signed-off-by: Michal Hocko <mhocko@xxxxxxx> Reported-by: Hugh Dickins <hughd@xxxxxxxxxx> Tested-by: Hugh Dickins <hughd@xxxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Greg Thelen <gthelen@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- mm/memcontrol.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1098,14 +1098,22 @@ skip_node: * skipped and we should continue the tree walk. * last_visited css is safe to use because it is * protected by css_get and the tree walk is rcu safe. + * + * We do not take a reference on the root of the tree walk + * because we might race with the root removal when it would + * be the only node in the iterated hierarchy and mem_cgroup_iter + * would end up in an endless loop because it expects that at + * least one valid node will be returned. Root cannot disappear + * because caller of the iterator should hold it already so + * skipping css reference should be safe. */ if (next_css) { - if ((next_css->flags & CSS_ONLINE) && css_tryget(next_css)) + if ((next_css->flags & CSS_ONLINE) && + (next_css == &root->css || css_tryget(next_css))) return mem_cgroup_from_css(next_css); - else { - prev_css = next_css; - goto skip_node; - } + + prev_css = next_css; + goto skip_node; } return NULL; Patches currently in stable-queue which might be from mhocko@xxxxxxx are queue-3.13/mm-page-writeback.c-do-not-count-anon-pages-as-dirtyable-memory.patch queue-3.13/memcg-fix-css-reference-leak-and-endless-loop-in-mem_cgroup_iter.patch queue-3.13/memcg-fix-endless-loop-caused-by-mem_cgroup_iter.patch queue-3.13/mm-memcg-iteration-skip-memcgs-not-yet-fully-initialized.patch queue-3.13/mm-page-writeback.c-fix-dirty_balance_reserve-subtraction-from-dirtyable-memory.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html