On 2025/1/3 0:02, Michal Koutný wrote: > On Fri, Dec 20, 2024 at 01:31:06AM +0000, Chen Ridong <chenridong@xxxxxxxxxxxxxxx> wrote: >> RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0 > > I assume it's this > WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS); > Right. >> It can be explained by: >> rmdir echo 1 > cpuset.cpus >> kernfs_fop_write_iter // active=0 >> cgroup_rm_file >> kernfs_remove_by_name_ns kernfs_get_active // active=1 >> __kernfs_remove // active=0x80000002 >> kernfs_drain cpuset_write_resmask >> wait_event >> //waiting (active == 0x80000001) >> kernfs_break_active_protection >> // active = 0x80000001 >> // continue >> kernfs_unbreak_active_protection >> // active = 0x80000002 >> ... >> kernfs_should_drain_open_files >> // warning occurs >> kernfs_put_active > > Thanks for this breakdown. > >> To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset: >> break kernfs active protection in cpuset_write_resmask()") added >> 'kernfs_break_active_protection' in the cpuset_write_resmask. This could >> lead to this warning. > > The deadlock cycle included cpuset_hotplug_work and since that was > removed in the said commit, there shouldn't be same deadlock possible. > > Ridong, have you run your patch with CONFIG_LOCKDEP to check that > eventuality? > Yes, I tested. >> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug >> processing synchronous"), the cpuset_write_resmask no longer needs to >> wait the hotplug to finish, which means that cpuset_write_resmask won't >> grab the cgroup_mutex. So the deadlock doesn't exist anymore. Therefore, >> remove kernfs_break_active_protection operation in the >> 'cpuset_write_resmask' >> >> Fixes: 76bb5ab8f6e3 ("cpuset: break kernfs active protection in cpuset_write_resmask()") > > This commit alone isn't sufficient to cause the warning you observed, > right? I think the commit 76bb5ab8f6e3 ("cpuset: break kernfs active protection in cpuset_write_resmask()") is causing the warning I observed. This warning was observed when removing a cpuset cgroup and writing to cpuset.cpus concurrently. Unlike the cgroup_kn_lock_live functions, which break active protection and grab the cgroup_mutex immediately to avoid concurrent removal, writing to 'cpuset_write_resmask' cannot avoid concurrent removal of the cgroup directory. Therefore, this could cause the warning. > As I read kernfs_break_active_protection() comment, I don't see cpuset > code violating its conditions: > a) it's broken/unbroken from withing a kernfs file operation handler, > b) it pins the needed struct cpuset independently of kernfs_node (it's > ok to be removed) > I am not sure if it is safe to call kernfs_unbreak_active_protection(atomic_inc(&kn->active)); after the 'kn' has been removed. I don't know much about this. However, I have not seen any Use-After-Free (UAF) issues so far. I would be grateful if you could provide more information. Best regards Ridong > All in all -- I think the particular break/unbreak pair is unncecessary > nowadays and the warning implemented with hiding/showing kernfs files > didn't take temporary breakage into account (only based on quick > searching and vague memories). > > Thanks, > Michal