On Thu, Jan 30, 2025 at 04:05:42PM -0800, Shakeel Butt wrote: > Tejun reported the following race between fork() and cgroup.kill at [1]. > > Tejun: > I was looking at cgroup.kill implementation and wondering whether there > could be a race window. So, __cgroup_kill() does the following: > > k1. Set CGRP_KILL. > k2. Iterate tasks and deliver SIGKILL. > k3. Clear CGRP_KILL. > > The copy_process() does the following: > > c1. Copy a bunch of stuff. > c2. Grab siglock. > c3. Check fatal_signal_pending(). > c4. Commit to forking. > c5. Release siglock. > c6. Call cgroup_post_fork() which puts the task on the css_set and tests > CGRP_KILL. > > The intention seems to be that either a forking task gets SIGKILL and > terminates on c3 or it sees CGRP_KILL on c6 and kills the child. However, I > don't see what guarantees that k3 can't happen before c6. ie. After a > forking task passes c5, k2 can take place and then before the forking task > reaches c6, k3 can happen. Then, nobody would send SIGKILL to the child. > What am I missing? > > This is indeed a race. One way to fix this race is by taking > cgroup_threadgroup_rwsem in write mode in __cgroup_kill() as the fork() > side takes cgroup_threadgroup_rwsem in read mode from cgroup_can_fork() > to cgroup_post_fork(). However that would be heavy handed as this adds > one more potential stall scenario for cgroup.kill which is usually > called under extreme situation like memory pressure. > > To fix this race, let's maintain a sequence number per cgroup which gets > incremented on __cgroup_kill() call. On the fork() side, the > cgroup_can_fork() will cache the sequence number locally and recheck it > against the cgroup's sequence number at cgroup_post_fork() site. If the > sequence numbers mismatch, it means __cgroup_kill() can been called and > we should send SIGKILL to the newly created task. > > Reported-by: Tejun Heo <tj@xxxxxxxxxx> > Closes: https://lore.kernel.org/all/Z5QHE2Qn-QZ6M-KW@xxxxxxxxxxxxxxx/ [1] > Fixes: 661ee6280931 ("cgroup: introduce cgroup.kill") > Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> > --- Acked-by: Christian Brauner <brauner@xxxxxxxxxx>