Re: [PATCH vfs/for-next v6] cgroup: fix top cgroup refcnt leak

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrei,

It turns out that the cgroup-v1 refcounting is slightly broken upstream.  If
kernfs_get_inode() fails in kernfs_fill_super(), then there's refcount
breakage in the error handling.

This can be provoked by making the following change:

--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -240,6 +240,8 @@ static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
 	sb->s_shrink.seeks = 0;
 
 	/* get root inode, initialize and unlock it */
+	if (strcmp(current->comm, "foobar") == 0)
+		return -ENOANO;
 	mutex_lock(&kernfs_mutex);
 	inode = kernfs_get_inode(sb, info->root->kn);
 	mutex_unlock(&kernfs_mutex);

and then copying /bin/mount to /tmp/foobar and doing:

[root@andromeda ~]# /tmp/foobar -t cgroup -o none,name=xxxxy xxx /tmp/x/a1
foobar: /tmp/x/a1: mount(2) system call failed: No anode.

In dmesg I see the attached traces (see below).

The problem appears to be because cgroup_do_mount() calls kernfs_mount(), but
the refcount on the new root cgroup object hasn't been properly initialised
yet.  However, because we make it past sget(), the superblock thinks it has
taken the caller's ref - and this gets eaten by cgroup_kill_sb().

Further, another ref appears to be released by cgroup_do_mount() in the event
that kernfs_mount() fails.

David

------------[ cut here ]------------
percpu_ref_kill_and_confirm called more than once on css_release!
WARNING: CPU: 3 PID: 3218 at lib/percpu-refcount.c:336 percpu_ref_kill_and_confirm+0x4b/0x14c
Modules linked in:
CPU: 3 PID: 3218 Comm: bugger Not tainted 4.20.0-fscache+ #1287
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
RIP: 0010:percpu_ref_kill_and_confirm+0x4b/0x14c
Code: c6 74 29 80 3d 42 0a 00 01 00 75 20 48 8b 53 10 48 c7 c6 f0 cd e9 81 48 c7 c7 82 99 16 82 c6 05 27 0a 00 01 01 e8 04 54 ac ff <0f> 0b 48 83 4b 08 02 4c 89 e6 48 89 df e8 fb fc ff ff 65 ff 05 54
RSP: 0018:ffff8880c5103c60 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff8880d35e8020 RCX: ffff8880c5103b4c
RDX: 0000000000000046 RSI: ffffffff8245fef8 RDI: ffffffff810ac1ec
RBP: ffff8880c5103c78 R08: 0000000000000041 R09: 0000000000021900
R10: ffff8880c5103900 R11: 0000006423114dc0 R12: 0000000000000000
R13: 000000000027e0eb R14: 0000000000000246 R15: 0000000000000000
FS:  00007f8ec2dbb080(0000) GS:ffff8880c6d80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005586974b0198 CR3: 00000000c4bfe002 CR4: 00000000001606e0
Call Trace:
 cgroup_kill_sb+0x131/0x141
 deactivate_locked_super+0x29/0x5b
 kernfs_mount_ns+0x1fa/0x223
 cgroup_do_mount+0x36/0x1c8
 cgroup1_mount+0x5b2/0x610
 cgroup_mount+0x33b/0x37f
 mount_fs+0x6a/0x10b
 vfs_kern_mount+0x67/0x13c
 do_mount+0x90e/0xb7e
 ? kmem_cache_alloc_trace+0x241/0x27d
 ksys_mount+0x72/0x97
 __x64_sys_mount+0x21/0x24
 do_syscall_64+0x7d/0x1a0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f8ec1e14ada
Code: 48 8b 0d c9 a3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 96 a3 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe2fa32b18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00005586974ae2a0 RCX: 00007f8ec1e14ada
RDX: 00005586974ae480 RSI: 00005586974ae500 RDI: 00005586974ae4e0
RBP: 0000000000000000 R08: 00005586974ae4a0 R09: 00005586974ae480
R10: 00000000c0ed0000 R11: 0000000000000246 R12: 00005586974ae4e0
R13: 00005586974ae480 R14: 0000000000000000 R15: 00007f8ec2ba8184
irq event stamp: 3532
hardirqs last  enabled at (3531): [<ffffffff811c0584>] kfree+0x152/0x159
hardirqs last disabled at (3532): [<ffffffff819e27e8>] _raw_spin_lock_irqsave+0x12/0x44
softirqs last  enabled at (3326): [<ffffffff81c00353>] __do_softirq+0x353/0x38f
softirqs last disabled at (3317): [<ffffffff81058c16>] irq_exit+0x63/0xd1
---[ end trace 9bca09dc135d9213 ]---
WARNING: CPU: 3 PID: 3218 at lib/percpu-refcount.c:359 percpu_ref_reinit+0x10/0x17
Modules linked in:
CPU: 3 PID: 3218 Comm: bugger Tainted: G        W         4.20.0-fscache+ #1287
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
RIP: 0010:percpu_ref_reinit+0x10/0x17
Code: fa ff ff 48 8d 65 f0 4c 89 f6 48 c7 c7 e0 8d 51 82 5b 41 5e 5d e9 3c 50 45 00 48 8b 47 08 a8 03 74 08 48 8b 07 48 85 c0 74 02 <0f> 0b e9 c4 fe ff ff 55 31 f6 53 48 89 fb 48 83 ec 30 65 48 8b 04
RSP: 0018:ffff8880c5103d38 EFLAGS: 00010282
RAX: fffffffffffffffe RBX: ffff8880d35e8000 RCX: ffffffff810f1a8f
RDX: 00000000ffffbf7e RSI: 00000000425b018d RDI: ffff8880d35e8020
RBP: ffff8880c5103da8 R08: 0000000000000001 R09: 0000000000000000
R10: ffff8880c5103d40 R11: 0000000000000001 R12: 0000000000000001
R13: 0000000000000000 R14: ffffffffffffffc9 R15: ffff88803f63f001
FS:  00007f8ec2dbb080(0000) GS:ffff8880c6d80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005586974b0198 CR3: 00000000c4bfe002 CR4: 00000000001606e0
Call Trace:
 cgroup1_mount+0x5d1/0x610
 cgroup_mount+0x33b/0x37f
 mount_fs+0x6a/0x10b
 vfs_kern_mount+0x67/0x13c
 do_mount+0x90e/0xb7e
 ? kmem_cache_alloc_trace+0x241/0x27d
 ksys_mount+0x72/0x97
 __x64_sys_mount+0x21/0x24
 do_syscall_64+0x7d/0x1a0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f8ec1e14ada
Code: 48 8b 0d c9 a3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 96 a3 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe2fa32b18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00005586974ae2a0 RCX: 00007f8ec1e14ada
RDX: 00005586974ae480 RSI: 00005586974ae500 RDI: 00005586974ae4e0
RBP: 0000000000000000 R08: 00005586974ae4a0 R09: 00005586974ae480
R10: 00000000c0ed0000 R11: 0000000000000246 R12: 00005586974ae4e0
R13: 00005586974ae480 R14: 0000000000000000 R15: 00007f8ec2ba8184
irq event stamp: 3666
hardirqs last  enabled at (3665): [<ffffffff810bb0b1>] __call_rcu+0x1dc/0x1fa
hardirqs last disabled at (3666): [<ffffffff81001639>] trace_hardirqs_off_thunk+0x1a/0x1c
softirqs last  enabled at (3608): [<ffffffff81c00353>] __do_softirq+0x353/0x38f
softirqs last disabled at (3535): [<ffffffff81058c16>] irq_exit+0x63/0xd1
---[ end trace 9bca09dc135d9214 ]---



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux