On Mon, Mar 25, 2024 at 8:08 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > > "Ho-Ren (Jack) Chuang" <horenchuang@xxxxxxxxxxxxx> writes: > > > On Fri, Mar 22, 2024 at 1:41 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > >> > >> "Ho-Ren (Jack) Chuang" <horenchuang@xxxxxxxxxxxxx> writes: > >> > >> > The current implementation treats emulated memory devices, such as > >> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory > >> > (E820_TYPE_RAM). However, these emulated devices have different > >> > characteristics than traditional DRAM, making it important to > >> > distinguish them. Thus, we modify the tiered memory initialization process > >> > to introduce a delay specifically for CPUless NUMA nodes. This delay > >> > ensures that the memory tier initialization for these nodes is deferred > >> > until HMAT information is obtained during the boot process. Finally, > >> > demotion tables are recalculated at the end. > >> > > >> > * late_initcall(memory_tier_late_init); > >> > Some device drivers may have initialized memory tiers between > >> > `memory_tier_init()` and `memory_tier_late_init()`, potentially bringing > >> > online memory nodes and configuring memory tiers. They should be excluded > >> > in the late init. > >> > > >> > * Handle cases where there is no HMAT when creating memory tiers > >> > There is a scenario where a CPUless node does not provide HMAT information. > >> > If no HMAT is specified, it falls back to using the default DRAM tier. > >> > > >> > * Introduce another new lock `default_dram_perf_lock` for adist calculation > >> > In the current implementation, iterating through CPUlist nodes requires > >> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up > >> > trying to acquire the same lock, leading to a potential deadlock. > >> > Therefore, we propose introducing a standalone `default_dram_perf_lock` to > >> > protect `default_dram_perf_*`. This approach not only avoids deadlock > >> > but also prevents holding a large lock simultaneously. > >> > > >> > * Upgrade `set_node_memory_tier` to support additional cases, including > >> > default DRAM, late CPUless, and hot-plugged initializations. > >> > To cover hot-plugged memory nodes, `mt_calc_adistance()` and > >> > `mt_find_alloc_memory_type()` are moved into `set_node_memory_tier()` to > >> > handle cases where memtype is not initialized and where HMAT information is > >> > available. > >> > > >> > * Introduce `default_memory_types` for those memory types that are not > >> > initialized by device drivers. > >> > Because late initialized memory and default DRAM memory need to be managed, > >> > a default memory type is created for storing all memory types that are > >> > not initialized by device drivers and as a fallback. > >> > > >> > Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@xxxxxxxxxxxxx> > >> > Signed-off-by: Hao Xiang <hao.xiang@xxxxxxxxxxxxx> > >> > --- > >> > mm/memory-tiers.c | 73 ++++++++++++++++++++++++++++++++++++++++------- > >> > 1 file changed, 63 insertions(+), 10 deletions(-) > >> > > >> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c > >> > index 974af10cfdd8..9396330fa162 100644 > >> > --- a/mm/memory-tiers.c > >> > +++ b/mm/memory-tiers.c > >> > @@ -36,6 +36,11 @@ struct node_memory_type_map { > >> > > >> > static DEFINE_MUTEX(memory_tier_lock); > >> > static LIST_HEAD(memory_tiers); > >> > +/* > >> > + * The list is used to store all memory types that are not created > >> > + * by a device driver. > >> > + */ > >> > +static LIST_HEAD(default_memory_types); > >> > static struct node_memory_type_map node_memory_types[MAX_NUMNODES]; > >> > struct memory_dev_type *default_dram_type; > >> > > >> > @@ -108,6 +113,7 @@ static struct demotion_nodes *node_demotion __read_mostly; > >> > > >> > static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms); > >> > > >> > +static DEFINE_MUTEX(default_dram_perf_lock); > >> > >> Better to add comments about what is protected by this lock. > >> > > > > Thank you. I will add a comment like this: > > + /* The lock is used to protect `default_dram_perf*` info and nid. */ > > +static DEFINE_MUTEX(default_dram_perf_lock); > > > > I also found an error path was not handled and > > found the lock could be put closer to what it protects. > > I will have them fixed in V5. > > > >> > static bool default_dram_perf_error; > >> > static struct access_coordinate default_dram_perf; > >> > static int default_dram_perf_ref_nid = NUMA_NO_NODE; > >> > @@ -505,7 +511,8 @@ static inline void __init_node_memory_type(int node, struct memory_dev_type *mem > >> > static struct memory_tier *set_node_memory_tier(int node) > >> > { > >> > struct memory_tier *memtier; > >> > - struct memory_dev_type *memtype; > >> > + struct memory_dev_type *mtype; > >> > >> mtype may be referenced without initialization now below. > >> > > > > Good catch! Thank you. > > > > Please check below. > > I may found a potential NULL pointer dereference. > > > >> > + int adist = MEMTIER_ADISTANCE_DRAM; > >> > pg_data_t *pgdat = NODE_DATA(node); > >> > > >> > > >> > @@ -514,11 +521,20 @@ static struct memory_tier *set_node_memory_tier(int node) > >> > if (!node_state(node, N_MEMORY)) > >> > return ERR_PTR(-EINVAL); > >> > > >> > - __init_node_memory_type(node, default_dram_type); > >> > + mt_calc_adistance(node, &adist); > >> > + if (node_memory_types[node].memtype == NULL) { > >> > + mtype = mt_find_alloc_memory_type(adist, &default_memory_types); > >> > + if (IS_ERR(mtype)) { > >> > + mtype = default_dram_type; > >> > + pr_info("Failed to allocate a memory type. Fall back.\n"); > >> > + } > >> > + } > >> > > >> > - memtype = node_memory_types[node].memtype; > >> > - node_set(node, memtype->nodes); > >> > - memtier = find_create_memory_tier(memtype); > >> > + __init_node_memory_type(node, mtype); > >> > + > >> > + mtype = node_memory_types[node].memtype; > >> > + node_set(node, mtype->nodes); > > > > If the `mtype` could be NULL, would there be a potential > > NULL pointer dereference. Do you have a preferred idea > > to fix this if needed? > > Initialize mtype with default_dram_type? > Sounds like a plan. Thank you. I'm working on V5. > > >> > + memtier = find_create_memory_tier(mtype); > >> > if (!IS_ERR(memtier)) > >> > rcu_assign_pointer(pgdat->memtier, memtier); > >> > return memtier; > >> > @@ -655,6 +671,34 @@ void mt_put_memory_types(struct list_head *memory_types) > >> > } > >> > EXPORT_SYMBOL_GPL(mt_put_memory_types); > >> > > > [snip] > > -- > Best Regards, > Huang, Ying -- Best regards, Ho-Ren (Jack) Chuang 莊賀任