When change the 'cpuset.mems' under some cgroup, system will hung for a long time. From the dmesg, many processes or theads are stuck in fork/exit. The reason is show as follows. thread A: cpuset_write_resmask /* takes cpuset_rwsem */ ... update_tasks_nodemask mpol_rebind_mm /* waits mmap_lock */ thread B: worker_thread ... cpuset_migrate_mm_workfn do_migrate_pages /* takes mmap_lock */ thread C: cgroup_procs_write /* takes cgroup_mutex and cgroup_threadgroup_rwsem */ ... cpuset_can_attach percpu_down_write /* waits cpuset_rwsem */ Once update the nodemasks of cpuset, thread A wakes up thread B to migrate mm. But when thread A iterates through all tasks, including child threads and group leader, it has to wait the mmap_lock which has been take by thread B. Unfortunately, thread C wants to migrate tasks into cgroup at this moment, it must wait thread A to release cpuset_rwsem. If thread B spends much time to migrate mm, the fork/exit which acquire cgroup_threadgroup_rwsem also need to wait for a long time. There is no need to migrate the mm of child threads which is shared with group leader. Just iterate through the group leader only. Signed-off-by: haifeng.xu <haifeng.xu@xxxxxxxxxx> --- kernel/cgroup/cpuset.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 589827ccda8b..43cbd09546d0 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1968,6 +1968,9 @@ static void update_tasks_nodemask(struct cpuset *cs) cpuset_change_task_nodemask(task, &newmems); + if (!thread_group_leader(task)) + continue; + mm = get_task_mm(task); if (!mm) continue; -- 2.25.1