On Wed 12-10-16 11:43:37, Michal Hocko wrote: > On Wed 12-10-16 14:55:24, Anshuman Khandual wrote: [...] > > Why we insist on __GFP_THISNODE ? > > AFAIU __GFP_THISNODE just overrides the given node to the policy > nodemask in case the current node is not part of that node mask. In > other words we are ignoring the given node and use what the policy says. > I can see how this can be confusing especially when confronting the > documentation: > > * __GFP_THISNODE forces the allocation to be satisified from the requested > * node with no fallbacks or placement policy enforcements. You made me think and look into this deeper. I came to the conclusion that this is actually a relict from the past. policy_zonelist is called only from 3 places: - huge_zonelist - never should do __GFP_THISNODE when going this path - alloc_pages_vma - which shouldn't depend on __GFP_THISNODE either - alloc_pages_current - which uses default_policy id __GFP_THISNODE is used So AFAICS this is essentially a dead code or I am missing something. Mel do you remember why we needed it in the past? I would be really tempted to just get rid of this confusing code and this instead: --- diff --git a/mm/mempolicy.c b/mm/mempolicy.c index ad1c96ac313c..98beec47bba9 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1679,25 +1679,17 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy, int nd) { - switch (policy->mode) { - case MPOL_PREFERRED: - if (!(policy->flags & MPOL_F_LOCAL)) - nd = policy->v.preferred_node; - break; - case MPOL_BIND: + if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL)) + nd = policy->v.preferred_node; + else { /* - * Normally, MPOL_BIND allocations are node-local within the - * allowed nodemask. However, if __GFP_THISNODE is set and the - * current node isn't part of the mask, we use the zonelist for - * the first node in the mask instead. + * __GFP_THISNODE shouldn't even be used with the bind policy because + * we might easily break the expectation to stay on the requested node + * and not break the policy. */ - if (unlikely(gfp & __GFP_THISNODE) && - unlikely(!node_isset(nd, policy->v.nodes))) - nd = first_node(policy->v.nodes); - break; - default: - BUG(); + WARN_ON_ONCE(polic->mode == MPOL_BIND && (gfp && __GFP_THISNODE)); } + return node_zonelist(nd, gfp); } -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>