>-----Original Message----- >From: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> >Sent: Thursday, November 16, 2023 4:25 AM >To: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>; KY Srinivasan ><kys@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui <decui@xxxxxxxxxxxxx>; >davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx; >pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; >sharmaajay@xxxxxxxxxxxxx; leon@xxxxxxxxxx; cai.huoqing@xxxxxxxxx; >ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx; tglx@xxxxxxxxxxxxx; linux- >hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; >linux-rdma@xxxxxxxxxxxxxxx >Cc: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx>; Paul Rosswurm ><paulros@xxxxxxxxxxxxx> >Subject: RE: [PATCH net-next] net: mana: Assigning IRQ affinity on HT cores > > > >> -----Original Message----- >> From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> >> Sent: Wednesday, November 15, 2023 8:49 AM >> To: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang >> <haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui >> <decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; >> kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; >> sharmaajay@xxxxxxxxxxxxx; leon@xxxxxxxxxx; cai.huoqing@xxxxxxxxx; >> ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx; tglx@xxxxxxxxxxxxx; >> linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux- >> kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx >> Cc: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx>; Paul Rosswurm >> <paulros@xxxxxxxxxxxxx>; Souradeep Chakrabarti >> <schakrabarti@xxxxxxxxxxxxxxxxxxx> >> Subject: [PATCH net-next] net: mana: Assigning IRQ affinity on HT >> cores >> >> Existing MANA design assigns IRQ affinity to every sibling CPUs, which >> causes IRQ coalescing and may reduce the network performance with RSS. >> >> Improve the performance by adhering the configuration for RSS, which >> prioritise IRQ affinity on HT cores. >> >> Signed-off-by: Souradeep Chakrabarti >> <schakrabarti@xxxxxxxxxxxxxxxxxxx> >> --- >> .../net/ethernet/microsoft/mana/gdma_main.c | 126 ++++++++++++++++- >> - >> 1 file changed, 117 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c >> b/drivers/net/ethernet/microsoft/mana/gdma_main.c >> index 6367de0c2c2e..839be819d46e 100644 >> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c >> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c >> @@ -1243,13 +1243,115 @@ void mana_gd_free_res_map(struct >> gdma_resource *r) >> r->size = 0; >> } >> >> +static void cpu_mask_set(cpumask_var_t *filter_mask, cpumask_var_t >> **filter_mask_list) >> +{ >> + unsigned int core_count = 0, cpu; >> + cpumask_var_t *filter_mask_list_tmp; >> + >> + BUG_ON(!filter_mask || !filter_mask_list); >> + filter_mask_list_tmp = *filter_mask_list; >> + cpumask_copy(*filter_mask, cpu_online_mask); >> + /* for each core create a cpumask lookup table, >> + * which stores all the corresponding siblings >> + */ >> + for_each_cpu(cpu, *filter_mask) { >> + >> BUG_ON(!alloc_cpumask_var(&filter_mask_list_tmp[core_count], >> GFP_KERNEL)); >> + cpumask_or(filter_mask_list_tmp[core_count], >> filter_mask_list_tmp[core_count], >> + topology_sibling_cpumask(cpu)); >> + cpumask_andnot(*filter_mask, *filter_mask, >> topology_sibling_cpumask(cpu)); >> + core_count++; >> + } >> +} >> + >> +static int irq_setup(int *irqs, int nvec) { >> + cpumask_var_t filter_mask; >> + cpumask_var_t *filter_mask_list; >> + unsigned int cpu_first, cpu, irq_start, cores = 0; >> + int i, core_count = 0, numa_node, cpu_count = 0, err = 0; >> + >> + BUG_ON(!alloc_cpumask_var(&filter_mask, GFP_KERNEL)); >> + cpus_read_lock(); >> + cpumask_copy(filter_mask, cpu_online_mask); >> + /* count the number of cores >> + */ >> + for_each_cpu(cpu, filter_mask) { >> + cpumask_andnot(filter_mask, filter_mask, >> topology_sibling_cpumask(cpu)); >> + cores++; >> + } >> + filter_mask_list = kcalloc(cores, sizeof(cpumask_var_t), GFP_KERNEL); >> + if (!filter_mask_list) { >> + err = -ENOMEM; >> + goto free_irq; >> + } >> + /* if number of cpus are equal to max_queues per port, then >> + * one extra interrupt for the hardware channel communication. >> + */ >> + if (nvec - 1 == num_online_cpus()) { >> + irq_start = 1; >> + cpu_first = cpumask_first(cpu_online_mask); >> + irq_set_affinity_and_hint(irqs[0], cpumask_of(cpu_first)); >> + } else { >> + irq_start = 0; >> + } >> + /* reset the core_count and num_node to 0. >> + */ >> + core_count = 0; >> + numa_node = 0; > >Please start with gc->numa_node here. I know it's 0 for now. But the host will >provide real numa node# close to the device in the future. Thank you. Will take care of it in next version. > >Also, as we discussed, consider using the NUMA distance to select the next numa >node (in a separate patch). > >> + cpu_mask_set(&filter_mask, &filter_mask_list); >> + /* for each interrupt find the cpu of a particular >> + * sibling set and if it belongs to the specific numa >> + * then assign irq to it and clear the cpu bit from >> + * the corresponding sibling list from filter_mask_list. >> + * Increase the cpu_count for that node. >> + * Once all cpus for a numa node is assigned, then >> + * move to different numa node and continue the same. >> + */ >> + for (i = irq_start; i < nvec; ) { >> + cpu_first = cpumask_first(filter_mask_list[core_count]); >> + if (cpu_first < nr_cpu_ids && cpu_to_node(cpu_first) == >> numa_node) { >> + irq_set_affinity_and_hint(irqs[i], >> cpumask_of(cpu_first)); >> + cpumask_clear_cpu(cpu_first, >> filter_mask_list[core_count]); >> + cpu_count = cpu_count + 1; >> + i = i + 1; >> + /* checking if all the cpus are used from the >> + * particular node. >> + */ >> + if (cpu_count == nr_cpus_node(numa_node)) { >> + numa_node = numa_node + 1; >> + if (numa_node == num_online_nodes()) { >> + cpu_mask_set(&filter_mask, >> &filter_mask_list); >> + numa_node = 0; >Ditto. > >Other things look good to me. > >Thanks, >- Haiyang