On 10/13/2016 03:37 PM, Michal Hocko wrote: > On Thu 13-10-16 15:24:54, Anshuman Khandual wrote: > [...] >> Which makes the function look like this. Even with these changes, MPOL_BIND is >> still going to pick up the local node's zonelist instead of the first node in >> policy->v.nodes nodemask. It completely ignores policy->v.nodes which it should >> not. > > Not really. I have tried to explain earlier. We do not ignore policy > nodemask. This one comes from policy_nodemask. We start with the local > node but fallback to some of the nodes from the nodemask defined by the > policy. > Yeah saw your response but did not get that exactly. We dont ignore policy nodemask while memory allocation, correct. But my point was we are ignoring policy nodemask while selecting zonelist which will be used during page allocation. Though the zone contents of both the zonelists are likely to be same, would not it be better to get the zone list from the nodemask as well ? Or I am still missing something here. The following change is what I am trying to propose. diff --git a/mm/mempolicy.c b/mm/mempolicy.c index ad1c96a..f60ab80 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1685,14 +1685,7 @@ static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy, nd = policy->v.preferred_node; break; case MPOL_BIND: - /* - * Normally, MPOL_BIND allocations are node-local within the - * allowed nodemask. However, if __GFP_THISNODE is set and the - * current node isn't part of the mask, we use the zonelist for - * the first node in the mask instead. - */ - if (unlikely(gfp & __GFP_THISNODE) && - unlikely(!node_isset(nd, policy->v.nodes))) + if (unlikely(!node_isset(nd, policy->v.nodes))) nd = first_node(policy->v.nodes); break; default: -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>