On Wed, Oct 25, 2023 at 11:29:56AM -0700, "T.J. Mercier" <tjmercier@xxxxxxxxxx> wrote: > The cgroup_is_populated check in cgroup_destroy_locked is what's > currently blocking the removal, and in the case where > nr_populated_csets is not 0 I think we'd need to iterate through all > csets and ensure that each task has been signaled for a SIGKILL. > Or just ensure there are only dying tasks and the thread group leader > has 0 for task->signal->live since that's when cgroup.procs stops > showing the process? Yeah, both of these seem too complex checks when most of the time the "stale" nr_populated_csets is sufficient (and notifications still). (Tracking nr_leaders would be broken in the case that I called "group leader separation".) > Yes, I just tried this out and if we check both cgroup.procs and > cgroup.threads then we wait long enough to be sure that we can rmdir > successfully. Thanks for checking. > Interesting case, and in the same part of the code. If one of the exit > functions takes a long time in the leader I could see how this might > happen, but I think a lot of those (mm for example) should be shared > among the group members so not sure exactly what would be the cause. I've overlooked it at first (focused on exit_mm() only) but the reproducer is explicit about enabled kernel preemption which extends the gap quite a bit. (Fun fact: I tried moving cgroup_exit() next to task->signal->live decrement in do_exit() and test_core cgroup selftest still passes. (Not only) the preemption takes the fun out of it though.) Michal
Attachment:
signature.asc
Description: PGP signature