On Wed 12-10-16 16:08:48, Anshuman Khandual wrote: > On 10/12/2016 03:13 PM, Michal Hocko wrote: > > On Wed 12-10-16 14:55:24, Anshuman Khandual wrote: > >> Hi, > >> > >> We have the following function policy_zonelist() which selects a zonelist > >> during various allocation paths. With this, general user space allocations > >> (IIUC might not have __GFP_THISNODE) fails while trying to get memory from > >> a memory only node without CPUs as the application runs some where else > >> and that node is not part of the nodemask. > > My bad. Was playing with some changes to the zonelists rebuild after > a memory node hotplug and the order of various zones in them. > > > > > I am not sure I understand. So you have a task with MPOL_BIND without a > > cpu less node in the mask and you are wondering why the memory is not > > allocated from that node? > > In my experiment, there is a MPOL_BIND call with a CPU less node in > the node mask and the memory is not allocated from that CPU less node. > Thats because the zone of the CPU less node was absent from the > FALLBACK zonelist of the local node. So do I understand this correctly that the issue was caused by non-upstream changes? > >> Why we insist on __GFP_THISNODE ? > > > > AFAIU __GFP_THISNODE just overrides the given node to the policy > > nodemask in case the current node is not part of that node mask. In > > other words we are ignoring the given node and use what the policy says. > > Right but provided the gfp flag has __GFP_THISNODE in it. In absence > of __GFP_THISNODE, the node from the nodemask will not be selected. In absence of __GFP_THISNODE we will use the zonelist for the given node and that should contain even memoryless nodes for the fallback. The nodemask from policy_nodemask() will then make sure that only nodes relevant to the used policy is used. > I still wonder why ? Can we always go to the first node in the > nodemask for MPOL_BIND interface calls ? Just curious to know why > preference is given to the local node and it's FALLBACK zonelist. It is not always a local node. Look at how do_huge_pmd_wp_page_fallback tries to make all the pages into the same node. Also we have alloc_pages_current() which tries to allocate from the local node which should not fallback to the firs node in the policy nodemask. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>