Aneesh Kumar K V <aneesh.kumar@xxxxxxxxxxxxx> writes: > On 7/15/22 1:23 PM, Huang, Ying wrote: [snip] >> >> You dropped the original sysfs interface patches from the series, but >> the kernel internal implementation is still for the original sysfs >> interface. For example, memory tier ID is for the original sysfs >> interface, not for the new proposed sysfs interface. So I suggest you >> to implement with the new interface in mind. What do you think about >> the following design? >> > > Sorry I am not able to follow you here. This patchset completely drops > exposing memory tiers to userspace via sysfs. Instead it allow > creation of memory tiers with specific tierID from within the kernel/device driver. > Default tierID is 200 and dax kmem creates memory tier with tierID 100. > > >> - Each NUMA node belongs to a memory type, and each memory type >> corresponds to a "abstract distance", so each NUMA node corresonds to >> a "distance". For simplicity, we can start with static distances, for >> example, DRAM (default): 150, PMEM: 250. The distance of each NUMA >> node can be recorded in a global array, >> >> int node_distances[MAX_NUMNODES]; >> >> or, just >> >> pgdat->distance >> > > I don't follow this. I guess you are trying to have a different design. > Would it be much easier if you can write this in the form of a patch? Written some pseudo code as follow to show my basic idea. #define MEMORY_TIER_ADISTANCE_DRAM 150 #define MEMORY_TIER_ADISTANCE_PMEM 250 struct memory_tier { /* abstract distance range covered by the memory tier */ int adistance_start; int adistance_len; struct list_head list; nodemask_t nodemask; }; /* RCU list of memory tiers */ static LIST_HEAD(memory_tiers); /* abstract distance of each NUMA node */ int node_adistances[MAX_NUMNODES]; struct memory_tier *find_create_memory_tier(int adistance) { struct memory_tier *tier; list_for_each_entry(tier, &memory_tiers, list) { if (adistance >= tier->adistance_start && adistance < tier->adistance_start + tier->adistance_len) return tier; } /* allocate a new memory tier and return */ } void memory_tier_add_node(int nid) { int adistance; struct memory_tier *tier; adistance = node_adistances[nid] || MEMORY_TIER_ADISTANCE_DRAM; tier = find_create_memory_tier(adistance); node_set(nid, &tier->nodemask); /* setup demotion data structure, etc */ } static int __meminit migrate_on_reclaim_callback(struct notifier_block *self, unsigned long action, void *_arg) { struct memory_notify *arg = _arg; int nid; nid = arg->status_change_nid; if (nid < 0) return notifier_from_errno(0); switch (action) { case MEM_ONLINE: memory_tier_add_node(nid); break; } return notifier_from_errno(0); } /* kmem.c */ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) { node_adistances[dev_dax->target_node] = MEMORY_TIER_ADISTANCE_PMEM; /* add_memory_driver_managed() */ } [snip] Best Regards, Huang, Ying