[PATCH v3 0/7] rework mem_cgroup iterator

Michal Hocko <mhocko@xxxxxxx> · Thu, 3 Jan 2013 18:54:14 +0100

Hi all,
this is a third version of the patchset previously posted here:
https://lkml.org/lkml/2012/11/26/616

The patch set tries to make mem_cgroup_iter saner in the way how it
walks hierarchies. css->id based traversal is far from being ideal as it
is not deterministic because it depends on the creation ordering.

Diffstat doesn't look that promising as in previous versions anymore but
I think it is worth the resulting outcome (and the sanity ;)).

The first patch fixes a potential misbehaving which I haven't seen but
the fix is needed for the later patches anyway. We could take it alone
as well but I do not have any bug report to base the fix on. The second
one is also preparatory and it is new to the series.

The third patch is the core of the patchset and it replaces css_get_next
based on css_id by the generic cgroup pre-order iterator which
means that css_id is no longer used by memcg. This brings some
chalanges for the last visited group caching during the reclaim
(mem_cgroup_per_zone::reclaim_iter). We have to use memcg pointers
directly now which means that we have to keep a reference to those
groups' css to keep them alive.

The next patch fixups an unbounded cgroup removal holdoff caused by
the elevated css refcount and does the clean up on the group removal.
Thanks to Ying who spotted this during testing of the previous version
of the patchset.
I could have folded it into the previous patch but I felt it would be
too big to review but if people feel it would be better that way, I have
no problems to squash them together.

The fourth and fifth patches are an attempt for simplification of the
mem_cgroup_iter. css juggling is removed and the iteration logic is
moved to a helper so that the reference counting and iteration are
separated.

The last patch just removes css_get_next as there is no user for it any
longer.

I am also thinking that leaf-to-root iteration makes more sense but this
patch is not included in the series yet because I have to think some
more about the justification.

Same as with the previous version I have tested with a quite simple
hierarchy:
        A (limit = 280M, use_hierarchy=true)
      / | \
     B  C  D (all have 100M limit)

And a separate kernel build in the each leaf group. This triggers
both children only and hierarchical reclaim which is parallel so the
iter_reclaim caching is active a lot. I will hammer it some more but the
series should be in quite a good shape already. 

Michal Hocko (7):
      memcg: synchronize per-zone iterator access by a spinlock
      memcg: keep prev's css alive for the whole mem_cgroup_iter
      memcg: rework mem_cgroup_iter to use cgroup iterators
      memcg: remove memcg from the reclaim iterators
      memcg: simplify mem_cgroup_iter
      memcg: further simplify mem_cgroup_iter
      cgroup: remove css_get_next

And the diffstat says:
 include/linux/cgroup.h |    7 --
 kernel/cgroup.c        |   49 ------------
 mm/memcontrol.c        |  199 ++++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 175 insertions(+), 80 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>