On Mon, Jan 28, 2019 at 05:00:13PM +0100, Oleg Nesterov wrote: > The only user of cgroup_subsys->free() callback is pids_cgrp_subsys which > needs pids_free() to uncharge the pid. > > However, ->free() is called from __put_task_struct()->cgroup_free() and this > is too late. Even the trivial program which does > > for (;;) { > int pid = fork(); > assert(pid >= 0); > if (pid) > wait(NULL); > else > exit(0); > } > > can run out of limits because release_task()->call_rcu(delayed_put_task_struct) > implies an RCU gp after the task/pid goes away and before the final put(). > > Test-case: > > mkdir -p /tmp/CG > mount -t cgroup2 none /tmp/CG > echo '+pids' > /tmp/CG/cgroup.subtree_control > > mkdir /tmp/CG/PID > echo 2 > /tmp/CG/PID/pids.max > > perl -e 'while ($p = fork) { wait; } $p // die "fork failed: $!\n"' & > echo $! > /tmp/CG/PID/cgroup.procs > > Without this patch the forking process fails soon after migration. > > Rename cgroup_subsys->free() to cgroup_subsys->release() and move the callsite > into the new helper, cgroup_release(), called by release_task() which actually > frees the pid(s). > > Reported-by: Herton R. Krzesinski <hkrzesin@xxxxxxxxxx> > Reported-by: Jan Stancek <jstancek@xxxxxxxxxx> > Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> Applied to cgroup/for-5.0. Thanks, Oleg. -- tejun