On Fri, Aug 05, 2022 at 08:59:03AM +0800, Feng Tang wrote: > Muchun Song found that after MPOL_PREFERRED_MANY policy was introduced > in commit b27abaccf8e8 ("mm/mempolicy: add MPOL_PREFERRED_MANY for multiple preferred nodes"), > the policy_nodemask_current()'s semantics for this new policy has been > changed, which returns 'preferred' nodes instead of 'allowed' nodes. > > With the changed semantic of policy_nodemask_current, a task with > MPOL_PREFERRED_MANY policy could fail to get its reservation even though > it can fall back to other nodes (either defined by cpusets or all online > nodes) for that reservation failing mmap calles unnecessarily early. > > The fix is to not consider MPOL_PREFERRED_MANY for reservations at all > because they, unlike MPOL_MBIND, do not pose any actual hard constrain. > > Michal suggested the policy_nodemask_current() is only used by hugetlb, > and could be moved to hugetlb code with more explicit name to enforce > the 'allowed' semantics for which only MPOL_BIND policy matters. > > apply_policy_zone() is made extern to be called in hugetlb code > and its return value is changed to bool. > > [1]. https://lore.kernel.org/lkml/20220801084207.39086-1-songmuchun@xxxxxxxxxxxxx/t/ > > Fixes: b27abaccf8e8 ("mm/mempolicy: add MPOL_PREFERRED_MANY for multiple preferred nodes") > Reported-by: Muchun Song <songmuchun@xxxxxxxxxxxxx> > Suggested-by: Michal Hocko <mhocko@xxxxxxxx> > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx> > Acked-by: Michal Hocko <mhocko@xxxxxxxx> Thanks for fixing this. Reviewed-by: Muchun Song <songmuchun@xxxxxxxxxxxxx>