Re: [PATCH cgroup 1/2] cgroup: move module ref handling into rebind_subsystems()

Li Zefan <lizefan@xxxxxxxxxx> · Fri, 12 Jul 2013 17:03:12 +0800

On 2013/6/29 12:12, Tejun Heo wrote:
> Hello,
> 
> These two patches are on top of
> 
>   cgroup/for-3.11 0ce6cba35 ("cgroup: CGRP_ROOT_SUBSYS_BOUND should be ignored when comparing mount options")
> + "cgroup: fix and clean up cgroup file creations and removals" patchset
>    http://thread.gmane.org/gmane.linux.kernel.cgroups/8245
> 
> and available in the following git branch
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-module-ref
> 
> Both patchsets I'm posting today are too late for for-3.11 unless
> there's gonna be -rc8.  I'll collect the reviews and apply these after
> -rc1 drops.
> 
> Thanks.
> 
> ---------------------------------- 8< ----------------------------------
>>From e8645bf68bcada3e07538fd042e9f0ce8a1e72cd Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@xxxxxxxxxx>
> Date: Fri, 28 Jun 2013 21:08:27 -0700
> Subject: [PATCH 1/2] cgroup: move module ref handling into rebind_subsystems()
> 
> Module ref handling in cgroup is rather weird.
> parse_cgroupfs_options() grabs all the modules for the specified
> subsystems.  A module ref is kept if the specified subsystem is newly
> bound to the hierarchy.  If not, or the operation fails, the refs are
> dropped.  This scatters module ref handling across multiple functions
> making it difficult to track.  It also make the function nasty to use
> for dynamic subsystem binding which is necessary for the planned
> unified hierarchy.
> 
> There's nothing which requires the subsystem modules to be pinned
> between parse_cgroupfs_options() and rebind_subsystems() in both mount
> and remount paths.  parse_cgroupfs_options() can just parse and
> rebind_subsystems() can handle pinning the subsystems that it wants to
> bind, which is a natural part of its task - binding - anyway.
> 
> Move module ref handling into rebind_subsystems() which makes the code
> a lot simpler - modules are gotten iff it's gonna be bound and put iff
> unbound or binding fails.
> 
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> ---
>  kernel/cgroup.c | 87 +++++++++++++++------------------------------------------
>  1 file changed, 22 insertions(+), 65 deletions(-)
> 
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 3bc7a1a..a65aff1 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -1003,6 +1003,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
>  {
>  	struct cgroup *cgrp = &root->top_cgroup;
>  	struct cgroup_subsys *ss;
> +	unsigned long pinned = 0;
>  	int i, ret;
>  
>  	BUG_ON(!mutex_is_locked(&cgroup_mutex));
> @@ -1010,20 +1011,26 @@ static int rebind_subsystems(struct cgroupfs_root *root,
>  
>  	/* Check that any added subsystems are currently free */
>  	for_each_subsys(ss, i) {
> -		unsigned long bit = 1UL << i;
> -
> -		if (!(bit & added_mask))
> +		if (!(added_mask & (1 << i)))
>  			continue;
>  
> +		/* is the subsystem mounted elsewhere? */
>  		if (ss->root != &cgroup_dummy_root) {
> -			/* Subsystem isn't free */
> -			return -EBUSY;
> +			ret = -EBUSY;
> +			goto out_put;
>  		}
> +
> +		/* pin the module */
> +		if (!try_module_get(ss->module)) {
> +			ret = -ENOENT;
> +			goto out_put;
> +		}
> +		pinned |= 1 << i;
>  	}

This looks wrong to me.

cgroup_mount()
{
	mutex_lock(cgroup_mutex);
	parse_cgroupfs_options();
	mutex_unlock(cgroup_mutex);
	...

	mutex_lock(cgroup_mutex);
	...
	rebind_subsystems();
	...
	mutex_unlock(cgroup_mutex);
}

so a modular cgroup subsystem can be unloaded inbetween, say it's net_cls, and
then it's possible that:

	# mount -t cgroup -o net_cls xxx /cgroup

The above operation succeeds but it's not binded to cgroupfs as it just got
unloaded.

_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers