Hi Tejun, On Tue, May 21, 2013 at 10:50:20AM +0900, Tejun Heo wrote: > Currently all cgroup iterators require the whole traversal to be > contained in a single RCU read critical section, which can be too > restrictive as there are times when blocking operations are necessary > during traversal. This forces controllers to implement specific > workarounds in those cases - building separate iteration list, punting > actual operations to work items and so on. > > This patchset updates cgroup iterators so that they allow dropping RCU > read lock while iteration is in progress so that controllers which > require sleeping during iteration don't need to implement their own > mechanisms. > > Dropping RCU read lock during iteration is unsafe because > cgroup->sibling.next can't be trusted once RCU read lock is dropped. > The sibling list is a RCU list and when a cgroup is removed the next > pointer is retained to keep RCU traversal working. If the next > sibling is removed while RCU read lock is dropped, the removed current > cgroup's next won't be updated and the next sibling may complete its > grace period and get freed leaving the next pointer dangling. > > Working around the problem is relatiely simple. Whether > ->sibling.next can be trusted can be trusted can be decided by looking > at CGRP_REMOVED - as cgroup removals are fully serialized, the flag is > guaranteed to be visible before the next sibling finishes its grace > period. For those cases, each cgroup is assigned a monotonically > increasing serial number. Because new cgroups are always appeneded to > the children list, it's guaranteed that all children list are sorted > in the ascending order of the serial numbers. When the next pointer > can't be trusted, the next sibling can be located by walking the > parent's children list from the beginning looking for the first cgroup > with higher serial number. > > The above is implemented in cgroup_next_sibling() and all iterators > are updated to use it to find out the next sibling thus allowing > droppping RCU read lock while iteration is in progress. This patchset > replaces separate iteration list in device_cgroup with direct > descendant walk and there will be further patches making use of this > update. > > This patchset contains the following five patches. > > 0001-cgroup-fix-a-subtle-bug-in-descendant-pre-order-walk.patch > 0002-cgroup-make-cgroup_is_removed-static.patch > 0003-cgroup-add-cgroup-serial_nr-and-implement-cgroup_nex.patch > 0004-cgroup-update-iterators-to-use-cgroup_next_sibling.patch > 0005-device_cgroup-simplify-cgroup-tree-walk-in-propagate.patch > > 0001 fixes a subtle iteration bug. Will be applied to for-3.10-fixes. > > 0002 is a trivial prep patch. > > 0003 implements cgroup_next_sibling() which can find out the next > sibling regardless of the state of the current cgroup. > > 0004 updates all iterators to use cgroup_next_sibling(). > > 0005 replaces iteration list work around in device_cgroup with direct > iteration. > > This patchset is on top of cgroup/for-3.11 23958e729e ("cgroup.h: > remove some functions that are now gone") and available in the > following git branch. > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-interruptible-iter patchset looks good to me. ran some tests in a kernel with it without problems. Acked-by: Aristeu Rozanski <aris@xxxxxxxxxx> -- Aristeu _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers