Re: [PATCH vfs/for-next v2] cgroup: fix top cgroup refcnt leak

Andrei Vagin <avagin@xxxxxxxxx> · Wed, 2 Jan 2019 11:37:38 -0800

On Wed, Jan 02, 2019 at 02:28:04AM +0000, Al Viro wrote:
> On Fri, Dec 28, 2018 at 04:04:00PM -0800, Andrei Vagin wrote:
> > It looks like the c6b3d5bcd67c ("cgroup: fix top cgroup refcnt leak")
> > commit was reverted by mistake.
> > 
> > $ mkdir /tmp/cgroup
> > $ mkdir /tmp/cgroup2
> > $ mount -t cgroup -o none,name=test test /tmp/cgroup
> > $ mount -t cgroup -o none,name=test test /tmp/cgroup2
> > $ umount /tmp/cgroup
> > $ umount /tmp/cgroup2
> > $ cat /proc/self/cgroup | grep test
> > 12:name=test:/
> > 
> > You can see the test cgroup was not freed.
> > 
> > Cc: Li Zefan <lizefan@xxxxxxxxxx>
> > Fixes: aea3f2676c83 ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context")
> > Signed-off-by: Andrei Vagin <avagin@xxxxxxxxx>
> > ---
> > 
> > v2: clean up code and add the vfs/for-next tag
> > 
> >  kernel/cgroup/cgroup.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> > index fb0717696895..f63974a3725f 100644
> > --- a/kernel/cgroup/cgroup.c
> > +++ b/kernel/cgroup/cgroup.c
> > @@ -2047,6 +2047,9 @@ int cgroup_do_get_tree(struct fs_context *fc)
> >  	ret = 0;
> >  	if (ctx->kfc.new_sb_created)
> >  		goto out_cgrp;
> > +	else
> > +		cgroup_put(&ctx->root->cgrp);
> > +
> >  	apply_cgroup_root_flags(ctx->flags);
> >  	return 0;
> 
> That looks horrible, especially since out_cgrp is return ret;
> If anything, it should be
> 	if (!ctx->kfc.new_sb_created) {
> 		cgroup_put(&ctx->root->cgrp);
> 		apply_cgroup_root_flags(ctx->flags);
> 	}
> 	return 0;
> 
> What I don't understand is why apply_cgroup_root_flags() is not
> called in "new superblock" case here.  It used to, prior to that
> conversion...

It is a good question and I don't have an answer on it. I think David
can tell more about this.

> 
> Another fishy place I see there is
>                 nsdentry = kernfs_node_dentry(cgrp->kn, fc->root->d_sb);
>                 if (IS_ERR(nsdentry))
>                         return PTR_ERR(nsdentry);
>                 dput(fc->root);
>                 fc->root = nsdentry;
> What happens if we get here with non-NULL fc->root (and we'd better,
> after successful from kernfs_get_tree() a bit earlier) and hit that
> failure exit?  A leak?

Yes, here is a leak too. I fixed it and sent v3, but then I decided that
it would be good to test this error path and found one more problem:

[   22.669696] ================================================
[   22.670468] WARNING: lock held when returning to user space!
[   22.671225] 4.20.0-rc1-00081-g01a72fa4bd0e-dirty #12 Not tainted
[   22.672018] ------------------------------------------------
[   22.672817] mount/1148 is leaving the kernel with locks still held!
[   22.673660] 1 lock held by mount/1148:
[   22.674165]  #0: 00000000c07be72c (&fc->fs_type->s_umount_key#41){+.+.}, at: grab_super+0x29/0x90

deactivate_locked_super() has to be called on the error path.  I sent v4 with
this fix. I'm sorry for the noise in the mailing list, I had to test this code
before sending v3;

> With apologies for being MIA for a week - it had been insane here...

Good to see you back in stride.