Aneesh Kumar K V <aneesh.kumar@xxxxxxxxxxxxx> writes: > On 8/1/22 12:43 PM, Huang, Ying wrote: >> Aneesh Kumar K V <aneesh.kumar@xxxxxxxxxxxxx> writes: >> >>> On 8/1/22 12:07 PM, Huang, Ying wrote: >>>> Aneesh Kumar K V <aneesh.kumar@xxxxxxxxxxxxx> writes: >>>> >>>>> On 8/1/22 10:40 AM, Huang, Ying wrote: >>>>>> Aneesh Kumar K V <aneesh.kumar@xxxxxxxxxxxxx> writes: >>>>>> >>>>>>> On 8/1/22 7:36 AM, Huang, Ying wrote: >>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> writes: >>>>>>>> >>>>>>>>> "Huang, Ying" <ying.huang@xxxxxxxxx> writes: >>>>>>>>> >>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> writes: >>> >>> .... >>> >>>>>>> >>>>>>> With the module unload, it is kind of force removing the usage of the specific memtype. >>>>>>> Considering module unload will remove the usage of specific memtype from other parts >>>>>>> of the kernel and we already do all the required reset in memory hot unplug, do we >>>>>>> need to do the clear_node_memory_type above? >>>>>> >>>>>> Per my understanding, we need to call clear_node_memory_type() in >>>>>> dev_dax_kmem_remove(). After that, we have nothing to do in >>>>>> dax_kmem_exit(). >>>>>> >>>>> >>>>> Ok, I guess you are suggesting to do the clear_node_memory_type even if we fail the memory remove. >>>> >>>> Can we use node_memory_types[] to indicate whether a node is managed by >>>> a driver? >>>> >>>> Regardless being succeeded or failed, dev_dax_kmem_remove() will set >>>> node_memory_types[] = NULL. But until node is offlined, we will still >>>> keep the node in the memory_dev_type (dax_pmem_type). >>>> >>>> And we will prevent dax/kmem from unloading via try_module_get() and add >>>> "struct module *" to struct memory_dev_type. >>>> >>> >>> Current dax/kmem driver is not holding any module reference and allows the module to be unloaded >>> anytime. Even if the memory onlined by the driver fails to be unplugged. Addition of memory_dev_type >>> as suggested by you will be different than that. Page demotion can continue to work without the >>> support of dax_pmem_type as long as we keep the older demotion order. Any new demotion order >>> rebuild will remove the the memory node which was not hotunplugged from the demotion order. Isn't that >>> a much simpler implementation? >> >> Per my understanding, unbinding/binding the dax/kmem driver means >> changing the memory type of a memory device. For example, unbinding >> dax/kmem driver may mean changing the memory type from dax_pmem_type to >> default_memory_type (or default_dram_type). That appears strange. But >> if we force the NUMA node to be offlined for unbinding, we can avoid to >> change the memory type to default_memory_type. >> > > If we are able to unplug all the memory, we do remove the node from N_MEMORY. > If we fail to unplug the memory, we have two options. > > 1) Keep the same demotion order > 2) Rebuild the demotion order which results in memory NUMA node not participating > in demotion. > > I agree with you that we should not switch to default memory type. > > The below code demonstrate how it can be done. If we want to keep > the same demotion order, we can remove establish_demotion_target() from > the below code. > > void clear_node_memory_type(int node, struct memory_dev_type *memtype) > { > struct memory_tier *memtier; > pg_data_t *pgdat = NODE_DATA(node); > > mutex_lock(&memory_tier_lock); > /* > * Even if we fail to unplug memory, clear the association of > * this node to this specific memory type. > */ > if (node_isset(node, memtype->nodes) && node_memory_types[node] == memtype) { > > memtier = __node_get_memory_tier(node); > if (memtier) { > rcu_assign_pointer(pgdat->memtier, NULL); > synchronize_rcu(); > } > node_clear(node, memtype->nodes); > if (nodes_empty(memtype->nodes)) { > list_del(&memtype->tier_sibiling); > memtype->memtier = NULL; > if (memtier && list_empty(&memtier->memory_types)) > destroy_memory_tier(memtier); > > } > establish_demotion_targets(); > } > node_memory_types[node] = NULL; > mutex_unlock(&memory_tier_lock); > } > > > If we agree that we want to keep the current behavior (that is to allow kmem driver unload > even on memory unplug failure) we can go with the above change. If we are suggesting we > should prevent a driver unload, then IMHO it should be independent of memory_dev_type > (or this patch series). We should make sure we take a module reference on successful > memory online and drop it only on successful offline. I suggest to keep a NUMA node in the memory_dev_type (dax_pmem_type) until the node is offlined. Yes. The dax/kmem driver may be unbound to the dax device even if memory offlining fails. But we can still find someway to keep the memory_dev_type of the NUMA node unchanged. Solution 1 is to prevent dax/kmem driver from unloading via module reference. I think we do that in this series. Solution 2 is to allocate dax_pmem_type dynamically, and manage it like "kmem_name". We may need some kind of reference counting to manage it. Best Regards, Huang, Ying