Hi all, this is a third version of the patchset previously posted here: https://lkml.org/lkml/2012/11/26/616 The patch set tries to make mem_cgroup_iter saner in the way how it walks hierarchies. css->id based traversal is far from being ideal as it is not deterministic because it depends on the creation ordering. Diffstat doesn't look that promising as in previous versions anymore but I think it is worth the resulting outcome (and the sanity ;)). The first patch fixes a potential misbehaving which I haven't seen but the fix is needed for the later patches anyway. We could take it alone as well but I do not have any bug report to base the fix on. The second one is also preparatory and it is new to the series. The third patch is the core of the patchset and it replaces css_get_next based on css_id by the generic cgroup pre-order iterator which means that css_id is no longer used by memcg. This brings some chalanges for the last visited group caching during the reclaim (mem_cgroup_per_zone::reclaim_iter). We have to use memcg pointers directly now which means that we have to keep a reference to those groups' css to keep them alive. The next patch fixups an unbounded cgroup removal holdoff caused by the elevated css refcount and does the clean up on the group removal. Thanks to Ying who spotted this during testing of the previous version of the patchset. I could have folded it into the previous patch but I felt it would be too big to review but if people feel it would be better that way, I have no problems to squash them together. The fourth and fifth patches are an attempt for simplification of the mem_cgroup_iter. css juggling is removed and the iteration logic is moved to a helper so that the reference counting and iteration are separated. The last patch just removes css_get_next as there is no user for it any longer. I am also thinking that leaf-to-root iteration makes more sense but this patch is not included in the series yet because I have to think some more about the justification. Same as with the previous version I have tested with a quite simple hierarchy: A (limit = 280M, use_hierarchy=true) / | \ B C D (all have 100M limit) And a separate kernel build in the each leaf group. This triggers both children only and hierarchical reclaim which is parallel so the iter_reclaim caching is active a lot. I will hammer it some more but the series should be in quite a good shape already. Michal Hocko (7): memcg: synchronize per-zone iterator access by a spinlock memcg: keep prev's css alive for the whole mem_cgroup_iter memcg: rework mem_cgroup_iter to use cgroup iterators memcg: remove memcg from the reclaim iterators memcg: simplify mem_cgroup_iter memcg: further simplify mem_cgroup_iter cgroup: remove css_get_next And the diffstat says: include/linux/cgroup.h | 7 -- kernel/cgroup.c | 49 ------------ mm/memcontrol.c | 199 ++++++++++++++++++++++++++++++++++++++++++------ 3 files changed, 175 insertions(+), 80 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>