Aneesh Kumar K V <aneesh.kumar@xxxxxxxxxxxxx> writes: > On 8/1/22 7:36 AM, Huang, Ying wrote: >> "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> writes: >> >>> "Huang, Ying" <ying.huang@xxxxxxxxx> writes: >>> >>>> "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> writes: >>>> >>>>> By default, all nodes are assigned to the default memory tier which >>>>> is the memory tier designated for nodes with DRAM >>>>> >>>>> Set dax kmem device node's tier to slower memory tier by assigning >>>>> abstract distance to MEMTIER_ADISTANCE_PMEM. PMEM tier >>>>> appears below the default memory tier in demotion order. >>>>> >>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> >>>>> --- >>>>> drivers/dax/kmem.c | 9 +++++++++ >>>>> include/linux/memory-tiers.h | 19 ++++++++++++++++++- >>>>> mm/memory-tiers.c | 28 ++++++++++++++++------------ >>>>> 3 files changed, 43 insertions(+), 13 deletions(-) >>>>> >>>>> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c >>>>> index a37622060fff..6b0d5de9a3e9 100644 >>>>> --- a/drivers/dax/kmem.c >>>>> +++ b/drivers/dax/kmem.c >>>>> @@ -11,6 +11,7 @@ >>>>> #include <linux/fs.h> >>>>> #include <linux/mm.h> >>>>> #include <linux/mman.h> >>>>> +#include <linux/memory-tiers.h> >>>>> #include "dax-private.h" >>>>> #include "bus.h" >>>>> >>>>> @@ -41,6 +42,12 @@ struct dax_kmem_data { >>>>> struct resource *res[]; >>>>> }; >>>>> >>>>> +static struct memory_dev_type default_pmem_type = { >>>> >>>> Why is this named as default_pmem_type? We will not change the memory >>>> type of a node usually. >>>> >>> >>> Any other suggestion? pmem_dev_type? >> >> Or dax_pmem_type? >> >> DAX is used to enumerate the memory device. >> >>> >>>>> + .adistance = MEMTIER_ADISTANCE_PMEM, >>>>> + .tier_sibiling = LIST_HEAD_INIT(default_pmem_type.tier_sibiling), >>>>> + .nodes = NODE_MASK_NONE, >>>>> +}; >>>>> + >>>>> static int dev_dax_kmem_probe(struct dev_dax *dev_dax) >>>>> { >>>>> struct device *dev = &dev_dax->dev; >>>>> @@ -62,6 +69,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) >>>>> return -EINVAL; >>>>> } >>>>> >>>>> + init_node_memory_type(numa_node, &default_pmem_type); >>>>> + >>>> >>>> The memory hot-add below may fail. So the error handling needs to be >>>> added. >>>> >>>> And, it appears that the memory type and memory tier of a node may be >>>> fully initialized here before NUMA hot-adding started. So I suggest to >>>> set node_memory_types[] here only. And set memory_dev_type->nodes in >>>> node hot-add callback. I think there is the proper place to complete >>>> the initialization. >>>> >>>> And, in theory dax/kmem.c can be unloaded. So we need to clear >>>> node_memory_types[] for nodes somewhere. >>>> >>> >>> I guess by module exit we can be sure that all the memory managed >>> by dax/kmem is hotplugged out. How about something like below? >> >> Because we set node_memorty_types[] in dev_dax_kmem_probe(), it's >> natural to clear it in dev_dax_kmem_remove(). >> > > Most of required reset/clear is done as part of memory hotunplug. So > if we did manage to successfully unplug the memory, everything except > node_memory_types[node] should be reset. That makes the clear_node_memory_type > the below. > > void clear_node_memory_type(int node, struct memory_dev_type *memtype) > { > > mutex_lock(&memory_tier_lock); > /* > * memory unplug did clear the node from the memtype and > * dax/kem did initialize this node's memory type. > */ > if (!node_isset(node, memtype->nodes) && node_memory_types[node] == memtype){ > node_memory_types[node] = NULL; > } > mutex_unlock(&memory_tier_lock); > } > > With the module unload, it is kind of force removing the usage of the specific memtype. > Considering module unload will remove the usage of specific memtype from other parts > of the kernel and we already do all the required reset in memory hot unplug, do we > need to do the clear_node_memory_type above? Per my understanding, we need to call clear_node_memory_type() in dev_dax_kmem_remove(). After that, we have nothing to do in dax_kmem_exit(). Best Regards, Huang, Ying