On 8/1/22 12:43 PM, Huang, Ying wrote: > Aneesh Kumar K V <aneesh.kumar@xxxxxxxxxxxxx> writes: > >> On 8/1/22 12:07 PM, Huang, Ying wrote: >>> Aneesh Kumar K V <aneesh.kumar@xxxxxxxxxxxxx> writes: >>> >>>> On 8/1/22 10:40 AM, Huang, Ying wrote: >>>>> Aneesh Kumar K V <aneesh.kumar@xxxxxxxxxxxxx> writes: >>>>> >>>>>> On 8/1/22 7:36 AM, Huang, Ying wrote: >>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> writes: >>>>>>> >>>>>>>> "Huang, Ying" <ying.huang@xxxxxxxxx> writes: >>>>>>>> >>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> writes: >> >> .... >> >>>>>> >>>>>> With the module unload, it is kind of force removing the usage of the specific memtype. >>>>>> Considering module unload will remove the usage of specific memtype from other parts >>>>>> of the kernel and we already do all the required reset in memory hot unplug, do we >>>>>> need to do the clear_node_memory_type above? >>>>> >>>>> Per my understanding, we need to call clear_node_memory_type() in >>>>> dev_dax_kmem_remove(). After that, we have nothing to do in >>>>> dax_kmem_exit(). >>>>> >>>> >>>> Ok, I guess you are suggesting to do the clear_node_memory_type even if we fail the memory remove. >>> >>> Can we use node_memory_types[] to indicate whether a node is managed by >>> a driver? >>> >>> Regardless being succeeded or failed, dev_dax_kmem_remove() will set >>> node_memory_types[] = NULL. But until node is offlined, we will still >>> keep the node in the memory_dev_type (dax_pmem_type). >>> >>> And we will prevent dax/kmem from unloading via try_module_get() and add >>> "struct module *" to struct memory_dev_type. >>> >> >> Current dax/kmem driver is not holding any module reference and allows the module to be unloaded >> anytime. Even if the memory onlined by the driver fails to be unplugged. Addition of memory_dev_type >> as suggested by you will be different than that. Page demotion can continue to work without the >> support of dax_pmem_type as long as we keep the older demotion order. Any new demotion order >> rebuild will remove the the memory node which was not hotunplugged from the demotion order. Isn't that >> a much simpler implementation? > > Per my understanding, unbinding/binding the dax/kmem driver means > changing the memory type of a memory device. For example, unbinding > dax/kmem driver may mean changing the memory type from dax_pmem_type to > default_memory_type (or default_dram_type). That appears strange. But > if we force the NUMA node to be offlined for unbinding, we can avoid to > change the memory type to default_memory_type. > If we are able to unplug all the memory, we do remove the node from N_MEMORY. If we fail to unplug the memory, we have two options. 1) Keep the same demotion order 2) Rebuild the demotion order which results in memory NUMA node not participating in demotion. I agree with you that we should not switch to default memory type. The below code demonstrate how it can be done. If we want to keep the same demotion order, we can remove establish_demotion_target() from the below code. void clear_node_memory_type(int node, struct memory_dev_type *memtype) { struct memory_tier *memtier; pg_data_t *pgdat = NODE_DATA(node); mutex_lock(&memory_tier_lock); /* * Even if we fail to unplug memory, clear the association of * this node to this specific memory type. */ if (node_isset(node, memtype->nodes) && node_memory_types[node] == memtype) { memtier = __node_get_memory_tier(node); if (memtier) { rcu_assign_pointer(pgdat->memtier, NULL); synchronize_rcu(); } node_clear(node, memtype->nodes); if (nodes_empty(memtype->nodes)) { list_del(&memtype->tier_sibiling); memtype->memtier = NULL; if (memtier && list_empty(&memtier->memory_types)) destroy_memory_tier(memtier); } establish_demotion_targets(); } node_memory_types[node] = NULL; mutex_unlock(&memory_tier_lock); } If we agree that we want to keep the current behavior (that is to allow kmem driver unload even on memory unplug failure) we can go with the above change. If we are suggesting we should prevent a driver unload, then IMHO it should be independent of memory_dev_type (or this patch series). We should make sure we take a module reference on successful memory online and drop it only on successful offline. -aneesh