On Wed 26-02-20 22:45:52, Vlastimil Babka wrote: > On 2/26/20 7:41 PM, Michal Hocko wrote: > > On Wed 26-02-20 18:25:28, Cristopher Lameter wrote: > >> On Mon, 24 Feb 2020, Michal Hocko wrote: > >> > >>> Hmm, nasty. Is there any reason why kmalloc_node behaves differently > >>> from the page allocator? > >> > >> The page allocator will do the same thing if you pass GFP_THISNODE and > >> insist on allocating memory from a node that does not exist. > > > > I do not think that the page allocator would blow up even with > > GFP_THISNODE. The allocation would just fail on memory less node. > > > > Besides that kmalloc_node shouldn't really have an implicit GFP_THISNODE > > semantic right? At least I do not see anything like that documented > > anywhere. > > Seems like SLAB at least behaves like the page allocator. See > ____cache_alloc_node() where it basically does: > > page = cache_grow_begin(cachep, gfp_exact_node(flags), nodeid); > ... > if (!page) > fallback_alloc(cachep, flags) > > gfp_exact_node() adds __GFP_THISNODE among other things, so the initial > attempt does try to stick only to the given node. But fallback_alloc() > doesn't. In fact, even if kmalloc_node() was called with __GFP_THISNODE > then it wouldn't work as intended, as fallback_alloc() doesn't get the > nodeid, but instead will use numa_mem_id(). That part could probably be > improved. > > SLUB's ___slab_alloc() has for example this: > if (node != NUMA_NO_NODE && !node_present_pages(node)) Hmm, just a quick note. Shouldn't this be node_managed_pages? In most cases the difference is negligible but I can imagine crazy setups where all present pages are simply consumed. > searchnode = node_to_mem_node(node); > > That's from Joonsoo's 2014 commit a561ce00b09e ("slub: fall back to > node_to_mem_node() node if allocating on memoryless node"), suggesting > that the scenario in this bug report should work. Perhaps it just got > broken unintentionally later. A very good reference. Thanks! > And AFAICS the whole path leading to alloc_slab_page() also doesn't add > __GFP_THISNODE, but will keep it if caller passed it, and ultimately it > does: > > > if (node == NUMA_NO_NODE) > page = alloc_pages(flags, order); > else > page = __alloc_pages_node(node, flags, order); > > So yeah looks like SLUB's kmalloc_node() is supposed to behave like the > page allocator's __alloc_pages_node() and respect __GFP_THISNODE but not > enforce it by itself. There's probably just some missing data structure > initialization somewhere right now for memoryless nodes. Thanks for the confirmation! -- Michal Hocko SUSE Labs