> -----Original Message----- > From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> > Sent: Wednesday, November 15, 2023 8:49 AM > To: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang > <haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui > <decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; > kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; > sharmaajay@xxxxxxxxxxxxx; leon@xxxxxxxxxx; cai.huoqing@xxxxxxxxx; > ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx; tglx@xxxxxxxxxxxxx; > linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx > Cc: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx>; Paul Rosswurm > <paulros@xxxxxxxxxxxxx>; Souradeep Chakrabarti > <schakrabarti@xxxxxxxxxxxxxxxxxxx> > Subject: [PATCH net-next] net: mana: Assigning IRQ affinity on HT cores > > Existing MANA design assigns IRQ affinity to every sibling CPUs, which causes > IRQ coalescing and may reduce the network performance with RSS. > > Improve the performance by adhering the configuration for RSS, which > prioritise > IRQ affinity on HT cores. > > Signed-off-by: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> > --- > .../net/ethernet/microsoft/mana/gdma_main.c | 126 ++++++++++++++++- > - > 1 file changed, 117 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c > b/drivers/net/ethernet/microsoft/mana/gdma_main.c > index 6367de0c2c2e..839be819d46e 100644 > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c > @@ -1243,13 +1243,115 @@ void mana_gd_free_res_map(struct > gdma_resource *r) > r->size = 0; > } > > +static void cpu_mask_set(cpumask_var_t *filter_mask, cpumask_var_t > **filter_mask_list) > +{ > + unsigned int core_count = 0, cpu; > + cpumask_var_t *filter_mask_list_tmp; > + > + BUG_ON(!filter_mask || !filter_mask_list); > + filter_mask_list_tmp = *filter_mask_list; > + cpumask_copy(*filter_mask, cpu_online_mask); > + /* for each core create a cpumask lookup table, > + * which stores all the corresponding siblings > + */ > + for_each_cpu(cpu, *filter_mask) { > + > BUG_ON(!alloc_cpumask_var(&filter_mask_list_tmp[core_count], > GFP_KERNEL)); > + cpumask_or(filter_mask_list_tmp[core_count], > filter_mask_list_tmp[core_count], > + topology_sibling_cpumask(cpu)); > + cpumask_andnot(*filter_mask, *filter_mask, > topology_sibling_cpumask(cpu)); > + core_count++; > + } > +} > + > +static int irq_setup(int *irqs, int nvec) > +{ > + cpumask_var_t filter_mask; > + cpumask_var_t *filter_mask_list; > + unsigned int cpu_first, cpu, irq_start, cores = 0; > + int i, core_count = 0, numa_node, cpu_count = 0, err = 0; > + > + BUG_ON(!alloc_cpumask_var(&filter_mask, GFP_KERNEL)); > + cpus_read_lock(); > + cpumask_copy(filter_mask, cpu_online_mask); > + /* count the number of cores > + */ > + for_each_cpu(cpu, filter_mask) { > + cpumask_andnot(filter_mask, filter_mask, > topology_sibling_cpumask(cpu)); > + cores++; > + } > + filter_mask_list = kcalloc(cores, sizeof(cpumask_var_t), GFP_KERNEL); > + if (!filter_mask_list) { > + err = -ENOMEM; > + goto free_irq; > + } > + /* if number of cpus are equal to max_queues per port, then > + * one extra interrupt for the hardware channel communication. > + */ > + if (nvec - 1 == num_online_cpus()) { > + irq_start = 1; > + cpu_first = cpumask_first(cpu_online_mask); > + irq_set_affinity_and_hint(irqs[0], cpumask_of(cpu_first)); > + } else { > + irq_start = 0; > + } > + /* reset the core_count and num_node to 0. > + */ > + core_count = 0; > + numa_node = 0; Please start with gc->numa_node here. I know it's 0 for now. But the host will provide real numa node# close to the device in the future. Also, as we discussed, consider using the NUMA distance to select the next numa node (in a separate patch). > + cpu_mask_set(&filter_mask, &filter_mask_list); > + /* for each interrupt find the cpu of a particular > + * sibling set and if it belongs to the specific numa > + * then assign irq to it and clear the cpu bit from > + * the corresponding sibling list from filter_mask_list. > + * Increase the cpu_count for that node. > + * Once all cpus for a numa node is assigned, then > + * move to different numa node and continue the same. > + */ > + for (i = irq_start; i < nvec; ) { > + cpu_first = cpumask_first(filter_mask_list[core_count]); > + if (cpu_first < nr_cpu_ids && cpu_to_node(cpu_first) == > numa_node) { > + irq_set_affinity_and_hint(irqs[i], > cpumask_of(cpu_first)); > + cpumask_clear_cpu(cpu_first, > filter_mask_list[core_count]); > + cpu_count = cpu_count + 1; > + i = i + 1; > + /* checking if all the cpus are used from the > + * particular node. > + */ > + if (cpu_count == nr_cpus_node(numa_node)) { > + numa_node = numa_node + 1; > + if (numa_node == num_online_nodes()) { > + cpu_mask_set(&filter_mask, > &filter_mask_list); > + numa_node = 0; Ditto. Other things look good to me. Thanks, - Haiyang