On 7/14/21 5:15 PM, Andrew Morton wrote: > On Mon, 12 Jul 2021 16:09:28 +0800 Feng Tang <feng.tang@xxxxxxxxx> wrote: >> This patch series introduces the concept of the MPOL_PREFERRED_MANY mempolicy. >> This mempolicy mode can be used with either the set_mempolicy(2) or mbind(2) >> interfaces. Like the MPOL_PREFERRED interface, it allows an application to set a >> preference for nodes which will fulfil memory allocation requests. Unlike the >> MPOL_PREFERRED mode, it takes a set of nodes. Like the MPOL_BIND interface, it >> works over a set of nodes. Unlike MPOL_BIND, it will not cause a SIGSEGV or >> invoke the OOM killer if those preferred nodes are not available. > Do we have any real-world testing which demonstrates the benefits of > all of this? Yes, it's actually been quite useful in practice already. If we take persistent memory media (PMEM) and hot-add/online it with the DAX kmem driver, we get NUMA nodes with lots of capacity (~6TB is typical) but weird performance; PMEM has good read speed, but low write speed. That low write speed is *so* low that it dominates the performance more than the distance from the CPUs. Folks who want PMEM really don't care about locality. The discussions with the testers usually go something like this: Tester: How do I make my test use PMEM on nodes 2 and 3? Kernel Guys: use 'numactl --membind=2-3' Tester: I tried that, but I'm getting allocation failures once I fill up PMEM. Shouldn't it fall back to DRAM? Kernel Guys: Fine, use 'numactl --preferred=2-3' Tester: That worked, but it started using DRAM after it exhausted node 2 Kernel Guys: Dang it. I forgot --preferred ignores everything after the first node. Fine, we'll patch the kernel. This has happened more than once. End users want to be able to specify a specific physical media, but don't want to have to deal with the sharp edges of strict binding. This has happened both with slow media like PMEM and "faster" media like High-Bandwidth Memory.