On Wed, Nov 23, 2022 at 08:21:57AM +0000, haifeng.xu wrote: > When change the 'cpuset.mems' under some cgroup, system will hung > for a long time. From the dmesg, many processes or theads are > stuck in fork/exit. The reason is show as follows. > > thread A: > cpuset_write_resmask /* takes cpuset_rwsem */ > ... > update_tasks_nodemask > mpol_rebind_mm /* waits mmap_lock */ > > thread B: > worker_thread > ... > cpuset_migrate_mm_workfn > do_migrate_pages /* takes mmap_lock */ > > thread C: > cgroup_procs_write /* takes cgroup_mutex and cgroup_threadgroup_rwsem */ > ... > cpuset_can_attach > percpu_down_write /* waits cpuset_rwsem */ > > Once update the nodemasks of cpuset, thread A wakes up thread B to > migrate mm. But when thread A iterates through all tasks, including > child threads and group leader, it has to wait the mmap_lock which > has been take by thread B. Unfortunately, thread C wants to migrate > tasks into cgroup at this moment, it must wait thread A to release > cpuset_rwsem. If thread B spends much time to migrate mm, the > fork/exit which acquire cgroup_threadgroup_rwsem also need to > wait for a long time. > > There is no need to migrate the mm of child threads which is > shared with group leader. This is only a problem in cgroup1 and cgroup1 doesn't require the threads of a given task to be in the same cgroup. I don't think you can optimize it this way. Thanks. -- tejun