Hi Tony, On 11/30/2023 12:57 PM, Tony Luck wrote: > On Thu, Nov 30, 2023 at 06:02:42PM +0000, Fam Zheng wrote: >>> +static __init int snc_get_config(void) >>> +{ >>> + unsigned long *node_caches; >>> + int mem_only_nodes = 0; >>> + int cpu, node, ret; >>> + int num_l3_caches; >>> + >>> + if (!x86_match_cpu(snc_cpu_ids)) >>> + return 1; >>> + >>> + node_caches = bitmap_zalloc(nr_node_ids, GFP_KERNEL); >>> + if (!node_caches) >>> + return 1; >>> + >>> + cpus_read_lock(); >>> + >>> + if (num_online_cpus() != num_present_cpus()) >>> + pr_warn("Some CPUs offline, SNC detection may be incorrect\n"); >>> + >>> + for_each_node(node) { >>> + cpu = cpumask_first(cpumask_of_node(node)); >>> + if (cpu < nr_cpu_ids) >>> + set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches); >> >> Are we sure get_cpu_cacheinfo_id() is an valid index here? Looking at >> the function it could be -1 or larger than nr_node_ids. > > Fam, > > Return -1 is possible (in the case where first CPU on a node doesn't > have an L3 cache). Larger than nr_node_ids seems a bit more speculative. > It would mean a system with multiple L3 cache instances per node. I > suppose that's theoretically possible. In the limit case every CPU may > have its own personal L3 cache, but still have multiple CPUs grouped > together on a node. > > Patch below (to be folded into part7 of next version). Increases the > size of the bitmap. Checks for get_cpu_cacheinfo_id() returning -1. > Patch just ignores the node in this case. I'm never quite sure how much > code to add for "Can't happen" scenarios. > Thank you. > -Tony > > > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c > index 3293ab4c58b0..85f8a1b3feaf 100644 > --- a/arch/x86/kernel/cpu/resctrl/core.c > +++ b/arch/x86/kernel/cpu/resctrl/core.c > @@ -1056,12 +1056,13 @@ static __init int snc_get_config(void) > unsigned long *node_caches; > int mem_only_nodes = 0; > int cpu, node, ret; > + int cache_id; > int num_l3_caches; Please do maintain reverse fir order. > > if (!x86_match_cpu(snc_cpu_ids)) > return 1; I understand and welcome this change as motivated by robustness. Apart from that, with this being a model specific feature for this particular group of systems, it it not clear to me in which scenarios this could run on a system where a present CPU does not have access to L3 cache. > > - node_caches = bitmap_zalloc(nr_node_ids, GFP_KERNEL); > + node_caches = bitmap_zalloc(num_online_cpus(), GFP_KERNEL); Please do take care to take new bitmap size into account in all places. From what I can tell there is a later bitmap_weight() call that still uses nr_node_ids as size. > if (!node_caches) > return 1; > > @@ -1072,10 +1073,13 @@ static __init int snc_get_config(void) > > for_each_node(node) { > cpu = cpumask_first(cpumask_of_node(node)); > - if (cpu < nr_cpu_ids) > - set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches); > - else > + if (cpu < nr_cpu_ids) { > + cache_id = get_cpu_cacheinfo_id(cpu, 3); > + if (cache_id != -1) > + set_bit(cache_id, node_caches); > + } else { > mem_only_nodes++; > + } > } > cpus_read_unlock(); > Could this code be made even more robust by checking the computed snc_nodes_per_l3_cache against the limited actually possible values? Forcing it to 1 if something went wrong? Reinette