> On Sep 21, 2016, at 05:53, David Rientjes <rientjes@xxxxxxxxxx> wrote: > > On Tue, 20 Sep 2016, Vlastimil Babka wrote: > >> On 09/12/2016 11:18 AM, Michal Hocko wrote: >>> On Mon 05-09-16 16:18:29, Vlastimil Babka wrote: >>> >>>> Also OOM is skipped for __GFP_THISNODE >>>> allocations, so we might also consider the same for nodemask-constrained >>>> allocations? >>>> >>>>> The patch checks whether it is the last node on the system, and if it >>>> is, then >>>>> don't clear the nid in the nodemask. >>>> >>>> I'd rather see the allocation not OOM, and rely on the fallback in >>>> new_node_page() that doesn't have nodemask. But I suspect it might also >>>> make >>>> sense to treat empty nodemask as something unexpected and put some WARN_ON >>>> (instead of OOM) in the allocator. >>> >>> To be honest I am really not all that happy about 394e31d2ceb4 >>> ("mem-hotplug: alloc new page from a nearest neighbor node when >>> mem-offline") and find it a bit fishy. I would rather re-iterate that >>> patch rather than build new hacks on top. >> >> OK, IIRC I suggested the main idea of clearing the current node from nodemask >> and relying on nodelist to get us the other nodes sorted by their distance. >> Which I thought was an easy way to get to the theoretically optimal result. >> How would you rewrite it then? (but note that the fix is already mainline). >> > > This is a mess. Commit 9bb627be47a5 ("mem-hotplug: don't clear the only > node in new_node_page()") is wrong because it's clearing nid when the next > node in node_online_map doesn't match. node_online_map is wrong because > it includes memoryless nodes. (Nodes with closest NUMA distance also do > not need to have adjacent node ids.) Thanks for pointing out that, so it is still possible that we are allocating from one or more memoryless nodes, which is the same as from an empty mask. I will try to fix it as you suggested below, test and send it soon. > > This is all protected by mem_hotplug_begin() and the zonelists will be > stable. The solution is to rewrite new_node_page() to work correctly. > Use node_states[N_MEMORY] as mask, clear page_to_nid(page). If mask is > not empty, do > > __alloc_pages_nodemask(gfp_mask, 0, > node_zonelist(page_to_nid(page), gfp_mask), &mask) > > and fallback to alloc_page(gfp_mask), which should also be used if the > mask is empty -- do not try to allocate memory from the empty set of > nodes. > > mm-page_alloc-warn-about-empty-nodemask.patch is a rather ridiculous > warning to need. The largest user of a page allocator nodemask is > mempolicies which makes sure it doesn't pass an empty set. If it's really > required, it should at least be unlikely() since the vast majority of > callers will have ac->nodemask == NULL. > OK, I’ll send a new version adding unlikely(). Thanks, Zhong -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href