On Fri, Apr 07, 2023 at 11:30:14AM -0400, Waiman Long wrote: > On 4/6/23 05:44, Christian Brauner wrote: > > On Fri, Mar 31, 2023 at 10:50:45AM -0400, Waiman Long wrote: > > > The current cpuset code uses the global cpuset_attach_old_cs variable > > > to store the old cpuset value between consecutive cpuset_can_attach() > > > and cpuset_attach() calls. Since a caller of cpuset_can_attach() may > > > not need to hold the global cgroup_threadgroup_rwsem, parallel cpuset > > > attach operations are possible. > > > > > > When there are concurrent cpuset attach operations in progress, > > > cpuset_attach() may fetch the wrong value from cpuset_attach_old_cs > > > causing incorrect result. To avoid this problem while still allowing > > > certain level of parallelism, drop cpuset_attach_old_cs and use a > > > per-cpuset attach_old_cs value. Also restrict to at most one active > > > attach operation per cpuset to avoid corrupting the value of the > > > per-cpuset attach_old_cs value. > > > > > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > > > --- > > > kernel/cgroup/cpuset.c | 19 ++++++++++++++----- > > > 1 file changed, 14 insertions(+), 5 deletions(-) > > > > > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > > > index 2367de611c42..3f925c261513 100644 > > > --- a/kernel/cgroup/cpuset.c > > > +++ b/kernel/cgroup/cpuset.c > > > @@ -198,6 +198,8 @@ struct cpuset { > > > /* Handle for cpuset.cpus.partition */ > > > struct cgroup_file partition_file; > > > + > > > + struct cpuset *attach_old_cs; > > > }; > > > /* > > > @@ -2456,22 +2458,27 @@ static int fmeter_getrate(struct fmeter *fmp) > > > return val; > > > } > > > -static struct cpuset *cpuset_attach_old_cs; > > > - > > > /* Called by cgroups to determine if a cpuset is usable; cpuset_rwsem held */ > > > static int cpuset_can_attach(struct cgroup_taskset *tset) > > > { > > > struct cgroup_subsys_state *css; > > > - struct cpuset *cs; > > > + struct cpuset *cs, *oldcs; > > > struct task_struct *task; > > > int ret; > > > /* used later by cpuset_attach() */ > > > - cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css)); > > > + oldcs = task_cs(cgroup_taskset_first(tset, &css)); > > > cs = css_cs(css); > > > percpu_down_write(&cpuset_rwsem); > > > + /* > > > + * Only one cpuset attach operation is allowed for each cpuset. > > > + */ > > > + ret = -EBUSY; > > > + if (cs->attach_in_progress) > > > + goto out_unlock; > > That'll mean CLONE_INTO_CGROUP becomes even more interestig because it > > isn't subject to this restriction in contrast to fork()+migrate, right? > > This patch is not related to the CLONE_INTO_CGROUP. In fact, I have dropped I know that. My point had been that this patch introduced a new restriction affecting parallelism in the migration path while the CLONE_INTO_CGROUP path remained unaffected.