On Mon, Mar 25, 2019 at 4:36 PM Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote: [..] > >>> Hmm, no, I don't think we should do this. Especially considering > >>> current generation NVDIMMs are energy backed DRAM there is no > >>> performance difference that should be assumed by the non-volatile > >>> flag. > >> Actually, here I would like to initialize a node mask for default > >> allocation. Memory allocation should not end up on any nodes excluded by > >> this node mask unless they are specified by mempolicy. > >> > >> We may have a few different ways or criteria to initialize the node > >> mask, for example, we can read from HMAT (when HMAT is ready in the > >> future), and we definitely could have non-DRAM nodes set if they have no > >> performance difference (I'm supposed you mean NVDIMM-F or HBM). > >> > >> As long as there are different tiers, distinguished by performance, for > >> main memory, IMHO, there should be a defined default allocation node > >> mask to control the memory placement no matter where we get the information. > > I understand the intent, but I don't think the kernel should have such > > a hardline policy by default. However, it would be worthwhile > > mechanism and policy to consider for the dax-hotplug userspace > > tooling. I.e. arrange for a given device-dax instance to be onlined, > > but set the policy to require explicit opt-in by numa binding for it > > to be an allocation / migration option. > > > > I added Vishal to the cc who is looking into such policy tooling. > > We may assume the nodes returned by cpu_to_node() would be treated as > the default allocation nodes from the kernel point of view. > > So, the below code may do the job: > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > index d9e0ca4..a3e07da 100644 > --- a/arch/x86/mm/numa.c > +++ b/arch/x86/mm/numa.c > @@ -764,6 +764,8 @@ void __init init_cpu_to_node(void) > init_memory_less_node(node); > > numa_set_node(cpu, node); > + > + node_set(node, def_alloc_nodemask); > } > } > > Actually, the kernel should not care too much what kind of memory is > used, any node could be used for memory allocation. But it may be better > to restrict to some default nodes due to the performance disparity, for > example, default to regular DRAM only. Here kernel assumes the nodes > associated with CPUs would be DRAM nodes. > > The node mask could be exported to user space to be override by > userspace tool or sysfs or kernel commandline. Yes, sounds good. > But I still think kernel does need a default node mask. Yes, just depends on what is less surprising for userspace to contend with by default. I would expect an unaware userspace to be confused by the fact that the system has free memory, but it's unusable. So, usable by default sounds a safer option, and special cases to forbid default usage of given nodes is an administrator / application opt-in mechanism.