Re: cgroup: avoid attaching a cgroup root to two different superblocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 14, 2017 at 04:27:37PM -0700, Andrei Vagin wrote:
> Hello,
> 
> One of our CRIU tests hangs with this patch.
> 
> Steps to reproduce:
> curl -o cgroupns.c https://gist.githubusercontent.com/avagin/f87c8a8bd2a0de9afcc74976327786bc/raw/5843701ef3679f50dd2427cf57a80871082eb28c/gistfile1.txt
> gcc cgroupns.c -o cgroupns
> ./cgroupns
> ./cgroupns

I've found a trivial reproducer:
mkdir /tmp/xxx
mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx
mkdir /tmp/xxx/xxx
umount /tmp/xxx
mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx

> 
> [root@fc24 ~]# strace -s 256 -fe clone,unshare,setns,mount ./cgroupns 
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = 0
> unshare(CLONE_NEWCGROUP)                = 0
> clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe5da0b89d0) = 529
> strace: Process 529 attached
> [pid   529] setns(3, CLONE_NEWCGROUP)   = 0
> [pid   529] +++ exited with 0 +++
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=529, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> +++ exited with 0 +++
> [root@fc24 ~]# strace -s 256 -fe clone,unshare,setns,mount ./cgroupns 
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
> ....
> 
> Thanks,
> Andrei
> 
> On Fri, Apr 07, 2017 at 04:51:55PM +0800, Li Zefan wrote:
> > Run this:
> > 
> >     touch file0
> >     for ((; ;))
> >     {
> >         mount -t cpuset xxx file0
> >     }
> > 
> > And this concurrently:
> > 
> >     touch file1
> >     for ((; ;))
> >     {
> >         mount -t cpuset xxx file1
> >     }
> > 
> > We'll trigger a warning like this:
> > 
> >  ------------[ cut here ]------------
> >  WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0
> >  percpu_ref_kill_and_confirm called more than once on css_release!
> >  CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5
> >  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> >  Call Trace:
> >   dump_stack+0x63/0x84
> >   __warn+0xd1/0xf0
> >   warn_slowpath_fmt+0x5f/0x80
> >   percpu_ref_kill_and_confirm+0x92/0xb0
> >   cgroup_kill_sb+0x95/0xb0
> >   deactivate_locked_super+0x43/0x70
> >   deactivate_super+0x46/0x60
> >  ...
> >  ---[ end trace a79f61c2a2633700 ]---
> > 
> > Here's a race:
> > 
> >   Thread A				Thread B
> > 
> >   cgroup1_mount()
> >     # alloc a new cgroup root
> >     cgroup_setup_root()
> > 					cgroup1_mount()
> > 					  # no sb yet, returns NULL
> > 					  kernfs_pin_sb()
> > 
> > 					  # but succeeds in getting the refcnt,
> > 					  # so re-use cgroup root
> > 					  percpu_ref_tryget_live()
> >     # alloc sb with cgroup root
> >     cgroup_do_mount()
> > 
> >   cgroup_kill_sb()
> > 					  # alloc another sb with same root
> > 					  cgroup_do_mount()
> > 
> > 					cgroup_kill_sb()
> > 
> > We end up using the same cgroup root for two different superblocks,
> > so percpu_ref_kill() will be called twice on the same root when the
> > two superblocks are destroyed.
> > 
> > We should fix to make sure the superblock pinning is really successful.
> > 
> > Cc: stable@xxxxxxxxxxxxxxx # 3.16+
> > Reported-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> > Signed-off-by: Zefan Li <lizefan@xxxxxxxxxx>
> > ---
> >  kernel/cgroup/cgroup-v1.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> > index 1dc22f6..12e19f0 100644
> > --- a/kernel/cgroup/cgroup-v1.c
> > +++ b/kernel/cgroup/cgroup-v1.c
> > @@ -1146,7 +1146,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
> >  		 * path is super cold.  Let's just sleep a bit and retry.
> >  		 */
> >  		pinned_sb = kernfs_pin_sb(root->kf_root, NULL);
> > -		if (IS_ERR(pinned_sb) ||
> > +		if (IS_ERR_OR_NULL(pinned_sb) ||
> >  		    !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
> >  			mutex_unlock(&cgroup_mutex);
> >  			if (!IS_ERR_OR_NULL(pinned_sb))
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux