On Thu, Aug 11, 2022 at 04:11:21PM +0200, Michal Hocko wrote: >fix the lkml address (fat fingers, sorry) > >On Thu 11-08-22 16:06:37, Michal Hocko wrote: >> [Cc Wei Yang who is author of 78b132e9bae9] >> >> On Thu 11-08-22 20:41:57, Abel Wu wrote: >> > The mems_allowed field can be modified by other tasks, so it isn't >> > safe to access it with alloc_lock unlocked even in the current >> > process context. >> > >> > Say there are two tasks: A from cpusetA is performing set_mempolicy(2), >> > and B is changing cpusetA's cpuset.mems: >> > >> > A (set_mempolicy) B (echo xx > cpuset.mems) >> > ------------------------------------------------------- >> > pol = mpol_new(); >> > update_tasks_nodemask(cpusetA) { >> > foreach t in cpusetA { >> > cpuset_change_task_nodemask(t) { >> > mpol_set_nodemask(pol) { >> > task_lock(t); // t could be A >> > new = f(A->mems_allowed); >> > update t->mems_allowed; >> > pol.create(pol, new); >> > task_unlock(t); >> > } >> > } >> > } >> > } >> > task_lock(A); >> > A->mempolicy = pol; >> > task_unlock(A); >> > >> > In this case A's pol->nodes is computed by old mems_allowed, and could >> > be inconsistent with A's new mems_allowed. >> >> Just to clarify. With an unfortunate timing and those two nodemasks >> overlap the end user effect could be a premature OOM because some nodes >> wouldn't be considered, right? >> >> > While it is different when replacing vmas' policy: the pol->nodes is >> > gone wild only when current_cpuset_is_being_rebound(): >> > >> > A (mbind) B (echo xx > cpuset.mems) >> > ------------------------------------------------------- >> > pol = mpol_new(); >> > mmap_write_lock(A->mm); >> > cpuset_being_rebound = cpusetA; >> > update_tasks_nodemask(cpusetA) { >> > foreach t in cpusetA { >> > cpuset_change_task_nodemask(t) { >> > mpol_set_nodemask(pol) { >> > task_lock(t); // t could be A >> > mask = f(A->mems_allowed); >> > update t->mems_allowed; >> > pol.create(pol, mask); >> > task_unlock(t); >> > } >> > } >> > foreach v in A->mm { >> > if (cpuset_being_rebound == cpusetA) >> > pol.rebind(pol, cpuset.mems); >> > v->vma_policy = pol; >> > } >> > mmap_write_unlock(A->mm); >> > mmap_write_lock(t->mm); >> > mpol_rebind_mm(t->mm); >> > mmap_write_unlock(t->mm); >> > } >> > } >> > cpuset_being_rebound = NULL; >> > >> > In this case, the cpuset.mems, which has already done updating, is >> > finally used for calculating pol->nodes, rather than A->mems_allowed. >> > So it is OK to call mpol_set_nodemask() with alloc_lock unlocked when >> > doing mbind(2). >> > >> > Fixes: 78b132e9bae9 ("mm/mempolicy: remove or narrow the lock on current") >> > Signed-off-by: Abel Wu <wuyun.abel@xxxxxxxxxxxxx> >> Thanks for pointing out. This looks correct. Reviewed-by: Wei Yang <richard.weiyang@xxxxxxxxx>