mem_cgroup_iter() wraps around cgroup_next_descendant_pre() to provide pre-order walk of memcg hierarchy. In addition to normal walk, it also implements shared iteration keyed by zone, node and priority so that multiple reclaimers don't end up hitting on the same nodes. Reclaimers working on the same zone, node and priority will push the same iterator forward. Unfortunately, the way this is implemented is disturbingly complicated. It ends up implementing pretty unique synchronization construct inside memcg which is never a good sign for any subsystem. While the implemented sychronization is overly elaborate and fragile, the intention behind it is understandable as previously cgroup iterator required each iteration to be contained inside a single RCU read critical section disallowing implementation of saner mechanism. To work around the limitation, memcg developed this Rube Goldberg machine to detect whether the cached last cgroup is still alive, which of course was ever so subtly broken. Now that cgroup iterations can survive being dropped out of RCU read critical section, this can be made a lot simpler. This patchset contains the following three patches. 0001-memcg-fix-subtle-memory-barrier-bug-in-mem_cgroup_it.patch 0002-memcg-restructure-mem_cgroup_iter.patch 0003-memcg-simplify-mem_cgroup_reclaim_iter.patch 0001 is fix for a subtle memory barrier bug in the current implementation. Should be applied to for-3.10-fixes and backported through -stable. In general, memory barriers are bad ideas. Please don't do it unless utterly necessary, and, if you're doing it, please add ample documentation explaining how they're paired and what they're achieving. Documenting is often extremely helpful for the implementor oneself too because one ends up looking at and thinking about things a lot more carefully. 0002 restructure mem_cgroup_iter() so that it's easier to follow and change. 0003 reimplements the iterator sharing so that it's simpler and more conventional. It depends on the new cgroup iterator updates. This patchset is on top of cgroup/for-3.11[1] which contains the iterator updates this patchset depends upon and available in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-memcg-simpler-iter Lightly tested. Proceed with caution. mm/memcontrol.c | 134 ++++++++++++++++++++++---------------------------------- 1 file changed, 54 insertions(+), 80 deletions(-) Thanks. -- tejun [1] git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-3.11 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>