Re: [PATCHSET] cgroup: allow dropping RCU read lock while iterating

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tejun,
On Tue, May 21, 2013 at 10:50:20AM +0900, Tejun Heo wrote:
> Currently all cgroup iterators require the whole traversal to be
> contained in a single RCU read critical section, which can be too
> restrictive as there are times when blocking operations are necessary
> during traversal.  This forces controllers to implement specific
> workarounds in those cases - building separate iteration list, punting
> actual operations to work items and so on.
> 
> This patchset updates cgroup iterators so that they allow dropping RCU
> read lock while iteration is in progress so that controllers which
> require sleeping during iteration don't need to implement their own
> mechanisms.
> 
> Dropping RCU read lock during iteration is unsafe because
> cgroup->sibling.next can't be trusted once RCU read lock is dropped.
> The sibling list is a RCU list and when a cgroup is removed the next
> pointer is retained to keep RCU traversal working.  If the next
> sibling is removed while RCU read lock is dropped, the removed current
> cgroup's next won't be updated and the next sibling may complete its
> grace period and get freed leaving the next pointer dangling.
> 
> Working around the problem is relatiely simple.  Whether
> ->sibling.next can be trusted can be trusted can be decided by looking
> at CGRP_REMOVED - as cgroup removals are fully serialized, the flag is
> guaranteed to be visible before the next sibling finishes its grace
> period.  For those cases, each cgroup is assigned a monotonically
> increasing serial number.  Because new cgroups are always appeneded to
> the children list, it's guaranteed that all children list are sorted
> in the ascending order of the serial numbers.  When the next pointer
> can't be trusted, the next sibling can be located by walking the
> parent's children list from the beginning looking for the first cgroup
> with higher serial number.
> 
> The above is implemented in cgroup_next_sibling() and all iterators
> are updated to use it to find out the next sibling thus allowing
> droppping RCU read lock while iteration is in progress.  This patchset
> replaces separate iteration list in device_cgroup with direct
> descendant walk and there will be further patches making use of this
> update.
> 
> This patchset contains the following five patches.
> 
>  0001-cgroup-fix-a-subtle-bug-in-descendant-pre-order-walk.patch
>  0002-cgroup-make-cgroup_is_removed-static.patch
>  0003-cgroup-add-cgroup-serial_nr-and-implement-cgroup_nex.patch
>  0004-cgroup-update-iterators-to-use-cgroup_next_sibling.patch
>  0005-device_cgroup-simplify-cgroup-tree-walk-in-propagate.patch
> 
> 0001 fixes a subtle iteration bug.  Will be applied to for-3.10-fixes.
> 
> 0002 is a trivial prep patch.
> 
> 0003 implements cgroup_next_sibling() which can find out the next
> sibling regardless of the state of the current cgroup.
> 
> 0004 updates all iterators to use cgroup_next_sibling().
> 
> 0005 replaces iteration list work around in device_cgroup with direct
> iteration.
> 
> This patchset is on top of cgroup/for-3.11 23958e729e ("cgroup.h:
> remove some functions that are now gone") and available in the
> following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-interruptible-iter

patchset looks good to me. ran some tests in a kernel with it without problems.

Acked-by: Aristeu Rozanski <aris@xxxxxxxxxx>

-- 
Aristeu

_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers




[Index of Archives]     [Cgroups]     [Netdev]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux