This is a note to let you know that I've just added the patch titled cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() to the 3.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: cgroup-fix-a-race-between-cgroup_mount-and-cgroup_kill_sb.patch and it can be found in the queue-3.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 3a32bd72d77058d768dbb38183ad517f720dd1bc Mon Sep 17 00:00:00 2001 From: Li Zefan <lizefan@xxxxxxxxxx> Date: Mon, 30 Jun 2014 11:50:59 +0800 Subject: cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() From: Li Zefan <lizefan@xxxxxxxxxx> commit 3a32bd72d77058d768dbb38183ad517f720dd1bc upstream. We've converted cgroup to kernfs so cgroup won't be intertwined with vfs objects and locking, but there are dark areas. Run two instances of this script concurrently: for ((; ;)) { mount -t cgroup -o cpuacct xxx /cgroup umount /cgroup } After a while, I saw two mount processes were stuck at retrying, because they were waiting for a subsystem to become free, but the root associated with this subsystem never got freed. This can happen, if thread A is in the process of killing superblock but hasn't called percpu_ref_kill(), and at this time thread B is mounting the same cgroup root and finds the root in the root list and performs percpu_ref_try_get(). To fix this, we try to increase both the refcnt of the superblock and the percpu refcnt of cgroup root. v2: - we should try to get both the superblock refcnt and cgroup_root refcnt, because cgroup_root may have no superblock assosiated with it. - adjust/add comments. tj: Updated comments. Renamed @sb to @pinned_sb. Signed-off-by: Li Zefan <lizefan@xxxxxxxxxx> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx> [lizf: Backported to 3.15: - Adjust context - s/percpu_tryget_live/atomic_inc_not_zero/] Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- kernel/cgroup.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1484,6 +1484,7 @@ static struct dentry *cgroup_mount(struc int flags, const char *unused_dev_name, void *data) { + struct super_block *pinned_sb = NULL; struct cgroup_subsys *ss; struct cgroup_root *root; struct cgroup_sb_opts opts; @@ -1584,10 +1585,25 @@ retry: * destruction to complete so that the subsystems are free. * We can use wait_queue for the wait but this path is * super cold. Let's just sleep for a bit and retry. + + * We want to reuse @root whose lifetime is governed by its + * ->cgrp. Let's check whether @root is alive and keep it + * that way. As cgroup_kill_sb() can happen anytime, we + * want to block it by pinning the sb so that @root doesn't + * get killed before mount is complete. + * + * With the sb pinned, inc_not_zero can reliably indicate + * whether @root can be reused. If it's being killed, + * drain it. We can use wait_queue for the wait but this + * path is super cold. Let's just sleep a bit and retry. */ - if (!atomic_inc_not_zero(&root->cgrp.refcnt)) { + pinned_sb = kernfs_pin_sb(root->kf_root, NULL); + if (IS_ERR(pinned_sb) || + !atomic_inc_not_zero(&root->cgrp.refcnt)) { mutex_unlock(&cgroup_mutex); mutex_unlock(&cgroup_tree_mutex); + if (!IS_ERR_OR_NULL(pinned_sb)) + deactivate_super(pinned_sb); msleep(10); mutex_lock(&cgroup_tree_mutex); mutex_lock(&cgroup_mutex); @@ -1634,6 +1650,16 @@ out_unlock: CGROUP_SUPER_MAGIC, &new_sb); if (IS_ERR(dentry) || !new_sb) cgroup_put(&root->cgrp); + + /* + * If @pinned_sb, we're reusing an existing root and holding an + * extra ref on its sb. Mount is complete. Put the extra ref. + */ + if (pinned_sb) { + WARN_ON(new_sb); + deactivate_super(pinned_sb); + } + return dentry; } Patches currently in stable-queue which might be from lizefan@xxxxxxxxxx are queue-3.15/kernfs-introduce-kernfs_pin_sb.patch queue-3.15/cgroup-fix-mount-failure-in-a-corner-case.patch queue-3.15/cpuset-mempolicy-fix-sleeping-function-called-from-invalid-context.patch queue-3.15/kernfs-implement-kernfs_root-supers-list.patch queue-3.15/cgroup-fix-a-race-between-cgroup_mount-and-cgroup_kill_sb.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html