From: Michael Kelley <mhklinux@xxxxxxxxxxx> Sent: Wednesday, November 15, 2023 9:26 PM > > > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c > > b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > index 6367de0c2c2e..839be819d46e 100644 > > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c > > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > @@ -1243,13 +1243,115 @@ void mana_gd_free_res_map(struct > > gdma_resource *r) > > r->size = 0; > > } > > > > +static void cpu_mask_set(cpumask_var_t *filter_mask, cpumask_var_t > > **filter_mask_list) > > +{ > > + unsigned int core_count = 0, cpu; > > + cpumask_var_t *filter_mask_list_tmp; > > + > > + BUG_ON(!filter_mask || !filter_mask_list); > > This check seems superfluous. The function is invoked from only > one call site in the code below, and that site call site can easily > ensure that it doesn't pass a NULL value for either parameter. > > > + filter_mask_list_tmp = *filter_mask_list; > > + cpumask_copy(*filter_mask, cpu_online_mask); > > + /* for each core create a cpumask lookup table, > > + * which stores all the corresponding siblings > > + */ > > + for_each_cpu(cpu, *filter_mask) { > > + > BUG_ON(!alloc_cpumask_var(&filter_mask_list_tmp[core_count], GFP_KERNEL)); > > Don't panic if a memory allocation fails. Must back out, clean up, > and return an error. > > > + cpumask_or(filter_mask_list_tmp[core_count], filter_mask_list_tmp[core_count], > > + topology_sibling_cpumask(cpu)); > > alloc_cpumask_var() does not zero out the returned cpumask. So the > cpumask_or() is operating on uninitialized garbage. Use > zalloc_cpumask_var() to get a cpumask initialized to zero. > > But why is the cpumask_or() being done at all? Couldn't this > just be a cpumask_copy()? In that case, alloc_cpumask_var() is OK. > > > + cpumask_andnot(*filter_mask, *filter_mask, topology_sibling_cpumask(cpu)); > > + core_count++; > > + } > > +} > > + > > +static int irq_setup(int *irqs, int nvec) > > +{ > > + cpumask_var_t filter_mask; > > + cpumask_var_t *filter_mask_list; > > + unsigned int cpu_first, cpu, irq_start, cores = 0; > > + int i, core_count = 0, numa_node, cpu_count = 0, err = 0; > > + > > + BUG_ON(!alloc_cpumask_var(&filter_mask, GFP_KERNEL)); > > Again, don't panic if a memory allocation fails. Must back out and > return an error. > > > + cpus_read_lock(); > > + cpumask_copy(filter_mask, cpu_online_mask); > > For readability, I'd suggest adding a blank line before any > of the multi-line comments below. > > > + /* count the number of cores > > + */ > > + for_each_cpu(cpu, filter_mask) { > > + cpumask_andnot(filter_mask, filter_mask, topology_sibling_cpumask(cpu)); > > + cores++; > > + } > > + filter_mask_list = kcalloc(cores, sizeof(cpumask_var_t), GFP_KERNEL); > > + if (!filter_mask_list) { > > + err = -ENOMEM; > > + goto free_irq; One additional macro-level comment. Creating and freeing the filter_mask_list is a real pain. But it is needed to identify which CPUs in a core are available to have an IRQ assigned to them. So let me suggest a modified approach to meeting that need. 1) Count the number of cores just as before. 2) Then instead of allocating filter_mask_list, allocate an array of integers, with one array entry per core. Call the array core_id_list. Somewhat like the code in cpu_mask_set(), populate the array with the CPU number of the first CPU in each core. The populating only needs to be done once, so the code can be inline in irq_setup(). It doesn't need to be in a helper function like cpu_mask_set(). 3) Allocate a single cpumask_var_t local variable that is initialized to a copy of cpu_online_mask. Call it avail_cpus. This local variable indicates which CPUs (across all cores) are available to have an IRQ assigned. 4) At the beginning of the "for" loop over nvec, current code has: cpu_first = cpumask_first(filter_mask_list[core_count]); Instead, do: cpu_first = cpumask_first_and(&avail_cpus, topology_sibling_cpumask(core_id_list[core_count])); The above code effectively creates on-the-fly the cpumask previously stored in filter_mask_list, and finds the first CPU as before. 5) When a CPU is assigned an IRQ, clear that CPU in avail_cpus, not in the filter_mask_list entry. 6) When the NUMA node wraps around and current code calls cpu_mask_set(), instead reinitialize avail_cpus to cpu_online_mask like in my #3 above. In summary, with this approach filter_mask_list isn't needed at all. The messiness of allocating and freeing an array of cpumask's goes away. I haven't coded it, but I think the result will be simpler and less error prone. Michael > > + } > > + /* if number of cpus are equal to max_queues per port, then > > + * one extra interrupt for the hardware channel communication. > > + */ > > Note that higher level code may set nvec based on the # of > online CPUs, but until the cpus_read_lock is held, the # of online > CPUs can change. So nvec might have been determined when the > # of CPUs was 8, but 2 CPUs could go offline before the cpus_read_lock > is obtained. So nvec could be bigger than just 1 more than > num_online_cpus(). I'm not sure how to handle the check below to > special case the hardware communication channel. But realize > that the main "for" loop below could end up assigning multiple > IRQs to the same CPU if nvec > num_online_cpus(). > > > + if (nvec - 1 == num_online_cpus()) { > > + irq_start = 1; > > + cpu_first = cpumask_first(cpu_online_mask); > > + irq_set_affinity_and_hint(irqs[0], cpumask_of(cpu_first)); > > + } else { > > + irq_start = 0; > > + } > > + /* reset the core_count and num_node to 0. > > + */ > > + core_count = 0; > > + numa_node = 0; > > + cpu_mask_set(&filter_mask, &filter_mask_list); > > FWIW, passing filter_mask as the first argument above was > confusing to me until I realized that the value of filter_mask > doesn't matter -- it's just to use the memory allocated for > filter_mask. Maybe a comment would help. And ditto for > the invocation of cpu_mask_set() further down. > > > + /* for each interrupt find the cpu of a particular > > + * sibling set and if it belongs to the specific numa > > + * then assign irq to it and clear the cpu bit from > > + * the corresponding sibling list from filter_mask_list. > > + * Increase the cpu_count for that node. > > + * Once all cpus for a numa node is assigned, then > > + * move to different numa node and continue the same. > > + */ > > + for (i = irq_start; i < nvec; ) { > > + cpu_first = cpumask_first(filter_mask_list[core_count]); > > + if (cpu_first < nr_cpu_ids && cpu_to_node(cpu_first) == numa_node) { > > Note that it's possible to have a NUMA node with zero online CPUs. > It could be a NUMA node that was originally a memory-only NUMA > node and never had any CPUs, or the NUMA node could have had > CPUs, but they were all later taken offline. Such a NUMA node is > still considered to be online because it has memory, but it has > no online CPUs. > > If this code ever sets "numa_node" to such a NUMA node, > the "for" loop will become an infinite loop because the "if" > statement above will never find a CPU that is assigned to > "numa_node". > > > + irq_set_affinity_and_hint(irqs[i], cpumask_of(cpu_first)); > > + cpumask_clear_cpu(cpu_first, filter_mask_list[core_count]); > > + cpu_count = cpu_count + 1; > > + i = i + 1; > > + /* checking if all the cpus are used from the > > + * particular node. > > + */ > > + if (cpu_count == nr_cpus_node(numa_node)) { > > + numa_node = numa_node + 1; > > + if (numa_node == num_online_nodes()) { > > + cpu_mask_set(&filter_mask, &filter_mask_list); > > The second and subsequent calls to cpu_mask_set() will > leak any memory previously allocated by alloc_cpumask_var() > in cpu_mask_set(). > > I agree with Haiyang's comment about starting with a NUMA > node other than NUMA node zero. But if you do so, note that > testing for wrap-around at num_online_nodes() won't be > equivalent to testing for getting back to the starting NUMA node. > You really want to run cpu_mask_set() again only when you get > back to the starting NUMA node. > > > + numa_node = 0; > > + } > > + cpu_count = 0; > > + core_count = 0; > > + continue; > > + } > > + } > > + if ((core_count + 1) % cores == 0) > > + core_count = 0; > > + else > > + core_count++; > > The if/else could be more compactly written as: > > if (++core_count == cores) > core_count = 0; > > > + } > > +free_irq: > > + cpus_read_unlock(); > > + if (filter_mask) > > This check for non-NULL filter_mask is incorrect and might > not even compile if CONFIG_CPUMASK_OFFSTACK=n. For testing > purposes, you should make sure to build and run with each > option: with CONFIG_CPUMASK_OFFSTACK=y and > also CONFIG_CPUMASK_OFFSTACK=n. > > It's safe to just call free_cpumask_var() without the check. > > > + free_cpumask_var(filter_mask); > > + if (filter_mask_list) { > > + for (core_count = 0; core_count < cores; core_count++) > > + free_cpumask_var(filter_mask_list[core_count]); > > + kfree(filter_mask_list); > > + } > > + return err; > > +}