On Fri, Mar 22, 2019 at 9:45 PM Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote: > > When running applications on the machine with NVDIMM as NUMA node, the > memory allocation may end up on NVDIMM node. This may result in silent > performance degradation and regression due to the difference of hardware > property. > > DRAM first should be obeyed to prevent from surprising regression. Any > non-DRAM nodes should be excluded from default allocation. Use nodemask > to control the memory placement. Introduce def_alloc_nodemask which has > DRAM nodes set only. Any non-DRAM allocation should be specified by > NUMA policy explicitly. > > In the future we may be able to extract the memory charasteristics from > HMAT or other source to build up the default allocation nodemask. > However, just distinguish DRAM and PMEM (non-DRAM) nodes by SRAT flag > for the time being. > > Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> > --- > arch/x86/mm/numa.c | 1 + > drivers/acpi/numa.c | 8 ++++++++ > include/linux/mmzone.h | 3 +++ > mm/page_alloc.c | 18 ++++++++++++++++-- > 4 files changed, 28 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > index dfb6c4d..d9e0ca4 100644 > --- a/arch/x86/mm/numa.c > +++ b/arch/x86/mm/numa.c > @@ -626,6 +626,7 @@ static int __init numa_init(int (*init_func)(void)) > nodes_clear(numa_nodes_parsed); > nodes_clear(node_possible_map); > nodes_clear(node_online_map); > + nodes_clear(def_alloc_nodemask); > memset(&numa_meminfo, 0, sizeof(numa_meminfo)); > WARN_ON(memblock_set_node(0, ULLONG_MAX, &memblock.memory, > MAX_NUMNODES)); > diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c > index 867f6e3..79dfedf 100644 > --- a/drivers/acpi/numa.c > +++ b/drivers/acpi/numa.c > @@ -296,6 +296,14 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit) > goto out_err_bad_srat; > } > > + /* > + * Non volatile memory is excluded from zonelist by default. > + * Only regular DRAM nodes are set in default allocation node > + * mask. > + */ > + if (!(ma->flags & ACPI_SRAT_MEM_NON_VOLATILE)) > + node_set(node, def_alloc_nodemask); Hmm, no, I don't think we should do this. Especially considering current generation NVDIMMs are energy backed DRAM there is no performance difference that should be assumed by the non-volatile flag. Why isn't default SLIT distance sufficient for ensuring a DRAM-first default policy?