On Thu, 11 Jul 2024, Vlastimil Babka wrote:
There are also the cpuset/cgroup restrictions via the zonelists that are
bypassed by removing alloc_pages()
AFAICS cpusets are handled on a level that's reached by both paths, i.e.
prepare_alloc_pages(), and I see nothing that would make switching to
alloc_pages_node() bypass it. Am I missing something?
You are correct. cpuset/cgroup restrictions also apply to
alloc_pages_node().
We have some internal patches now that implement memory policies on a per
object basis for SLUB here.
This is a 10-15% regression on various benchmarks when objects like the
scheduler statistics structures are misplaced.
I believe it would be best if you submitted a patch with with all that
reasoning. Thanks!
Turns out those performance issues are related to the issue that NUMA
locality is only considered at the folio level for slab allocation.
Individual object allocations are not subject to it.
The performance issue comes about in the following way:
Two kernel threads run on the same cpu using the same slab cache. One of
them keeps on allocating from a different node via kmalloc_node() and the
other is using kmalloc(). Then the kmalloc_node() thread will always
ensure that the per cpu slab is from the other node.
The other thread will use kmalloc() which does not check which node the
per cpu slab is from. Therefore the kmallc thread can continually be
served objects that are not local. That is not good and causes
misplacement of objects.
But that issue is something separate from this commit here and we see the
same regression before this commit.
This patch still needs to be reverted since the rationale for the patch
is not right and it disables memory policy support. Results in the
strange situation that memory policies are used in get_any_partial() in
slub but not during allocation anymore.