On Mon, Jan 06, 2025 at 08:19:04AM +0000, Chen Ridong wrote: > From: Chen Ridong <chenridong@xxxxxxxxxx> > > A warning was found: > > WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828 > CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G > RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0 > RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202 > RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000 > RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04 > RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180 > R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08 > R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0 > FS: 00007f84cd0be740(0000) GS:ffff8887ddc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > kernfs_drain+0x15e/0x2f0 > __kernfs_remove+0x165/0x300 > kernfs_remove_by_name_ns+0x7b/0xc0 > cgroup_rm_file+0x154/0x1c0 > cgroup_addrm_files+0x1c2/0x1f0 > css_clear_dir+0x77/0x110 > kill_css+0x4c/0x1b0 > cgroup_destroy_locked+0x194/0x380 > cgroup_rmdir+0x2a/0x140 > > It can be explained by: > rmdir echo 1 > cpuset.cpus > kernfs_fop_write_iter // active=0 > cgroup_rm_file > kernfs_remove_by_name_ns kernfs_get_active // active=1 > __kernfs_remove // active=0x80000002 > kernfs_drain cpuset_write_resmask > wait_event > //waiting (active == 0x80000001) > kernfs_break_active_protection > // active = 0x80000001 > // continue > kernfs_unbreak_active_protection > // active = 0x80000002 > ... > kernfs_should_drain_open_files > // warning occurs > kernfs_put_active > > This warning is caused by 'kernfs_break_active_protection' when it is > writing to cpuset.cpus, and the cgroup is removed concurrently. > > The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside > get_online_cpus()") made cpuset_hotplug_workfn asynchronous, This change > involves calling flush_work(), which can create a multiple processes > circular locking dependency that involve cgroup_mutex, potentially leading > to a deadlock. To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset: break > kernfs active protection in cpuset_write_resmask()") added > 'kernfs_break_active_protection' in the cpuset_write_resmask. This could > lead to this warning. > > After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug > processing synchronous"), the cpuset_write_resmask no longer needs to > wait the hotplug to finish, which means that concurrent hotplug and cpuset > operations are no longer possible. Therefore, the deadlock doesn't exist > anymore and it does not have to 'break active protection' now. To fix this > warning, just remove kernfs_break_active_protection operation in the > 'cpuset_write_resmask'. > > Fixes: 76bb5ab8f6e3 ("cpuset: break kernfs active protection in cpuset_write_resmask()") > Reported-by: Ji Fa <jifa@xxxxxxxxxx> > Signed-off-by: Chen Ridong <chenridong@xxxxxxxxxx> Applied to cgroup/for-6.13-fixes. Thanks. -- tejun