On Fri, Apr 14, 2017 at 04:27:37PM -0700, Andrei Vagin wrote: > Hello, > > One of our CRIU tests hangs with this patch. > > Steps to reproduce: > curl -o cgroupns.c https://gist.githubusercontent.com/avagin/f87c8a8bd2a0de9afcc74976327786bc/raw/5843701ef3679f50dd2427cf57a80871082eb28c/gistfile1.txt > gcc cgroupns.c -o cgroupns > ./cgroupns > ./cgroupns I've found a trivial reproducer: mkdir /tmp/xxx mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx mkdir /tmp/xxx/xxx umount /tmp/xxx mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx > > [root@fc24 ~]# strace -s 256 -fe clone,unshare,setns,mount ./cgroupns > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = 0 > unshare(CLONE_NEWCGROUP) = 0 > clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe5da0b89d0) = 529 > strace: Process 529 attached > [pid 529] setns(3, CLONE_NEWCGROUP) = 0 > [pid 529] +++ exited with 0 +++ > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=529, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- > +++ exited with 0 +++ > [root@fc24 ~]# strace -s 256 -fe clone,unshare,setns,mount ./cgroupns > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) > .... > > Thanks, > Andrei > > On Fri, Apr 07, 2017 at 04:51:55PM +0800, Li Zefan wrote: > > Run this: > > > > touch file0 > > for ((; ;)) > > { > > mount -t cpuset xxx file0 > > } > > > > And this concurrently: > > > > touch file1 > > for ((; ;)) > > { > > mount -t cpuset xxx file1 > > } > > > > We'll trigger a warning like this: > > > > ------------[ cut here ]------------ > > WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0 > > percpu_ref_kill_and_confirm called more than once on css_release! > > CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5 > > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > > Call Trace: > > dump_stack+0x63/0x84 > > __warn+0xd1/0xf0 > > warn_slowpath_fmt+0x5f/0x80 > > percpu_ref_kill_and_confirm+0x92/0xb0 > > cgroup_kill_sb+0x95/0xb0 > > deactivate_locked_super+0x43/0x70 > > deactivate_super+0x46/0x60 > > ... > > ---[ end trace a79f61c2a2633700 ]--- > > > > Here's a race: > > > > Thread A Thread B > > > > cgroup1_mount() > > # alloc a new cgroup root > > cgroup_setup_root() > > cgroup1_mount() > > # no sb yet, returns NULL > > kernfs_pin_sb() > > > > # but succeeds in getting the refcnt, > > # so re-use cgroup root > > percpu_ref_tryget_live() > > # alloc sb with cgroup root > > cgroup_do_mount() > > > > cgroup_kill_sb() > > # alloc another sb with same root > > cgroup_do_mount() > > > > cgroup_kill_sb() > > > > We end up using the same cgroup root for two different superblocks, > > so percpu_ref_kill() will be called twice on the same root when the > > two superblocks are destroyed. > > > > We should fix to make sure the superblock pinning is really successful. > > > > Cc: stable@xxxxxxxxxxxxxxx # 3.16+ > > Reported-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx> > > Signed-off-by: Zefan Li <lizefan@xxxxxxxxxx> > > --- > > kernel/cgroup/cgroup-v1.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c > > index 1dc22f6..12e19f0 100644 > > --- a/kernel/cgroup/cgroup-v1.c > > +++ b/kernel/cgroup/cgroup-v1.c > > @@ -1146,7 +1146,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, > > * path is super cold. Let's just sleep a bit and retry. > > */ > > pinned_sb = kernfs_pin_sb(root->kf_root, NULL); > > - if (IS_ERR(pinned_sb) || > > + if (IS_ERR_OR_NULL(pinned_sb) || > > !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) { > > mutex_unlock(&cgroup_mutex); > > if (!IS_ERR_OR_NULL(pinned_sb)) -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html