On 4/5/24 6:50 PM, Christoph Lameter (Ampere) wrote: > On Sat, 30 Mar 2024, Chen Jun wrote: > >> When kmalloc_node() is called without __GFP_THISNODE and the target node >> lacks sufficient memory, SLUB allocates a folio from a different node >> other than the requested node, instead of taking a partial slab from it. > > Hmmm... This would mean that we do not consult the partial lists of the > other nodes. That is something to be fixed in the allocator. Which allocator? If you mean SLUB, this patch fixes it. If you mean page allocator, I don't see how. >> However, since the allocated folio does not belong to the requested >> node, it is deactivated and added to the partial slab list of the node >> it belongs to. > > That should only occur if a request for an object for node X follows a > request for an object from node Y. Are you sure? I think it's a stream of requests for node X happening on a cpu of node Y, AFAICS the first attempt will allocate the slab page from node different than X (possibly node Y because it's local and has pages available unlike node X which is full). It does get installed as the cpu slab, but then the next request is also for node X, so the node matching checks make the slab deactivate and allocate a new one. >> This behavior can result in excessive memory usage when the requested >> node has insufficient memory, as SLUB will repeatedly allocate folios >> from other nodes without reusing the previously allocated ones. > > That is bad. Can we avoid that by verifying proper allocator behavior > during deactivationand ensuring that it searches remote partial objects > first before doing something drastic as going to the page allocator? > >> To prevent memory wastage, >> when (node != NUMA_NO_NODE) && !(gfpflags & __GFP_THISNODE) is, >> 1) try to get a partial slab from target node with GFP_NOWAIT | >> __GFP_THISNODE opportunistically. > > Did we check the partial lists of that node first for available > objects before going to the page allocator? > > get_any_partial() should do that. Maybe it is not called in the > kmalloc_node case. Yes, get_any_partial() is currently skipped for requests of numa node different from NUMA_NO_NODE. I think it's a useful tradeof to first try satisfy the node preference with a GFP_NOWAIT allocation. If it succeeds, the target node is not overloaded, we get the page from the desired node and further allocations will of the same node will not deactivate it. If it doesn't succeed then we indeed fallback to slabs on partial list from other nodes before wastefully allocating new pages from the other nodes, which addresses the scenario that motivated this patch.