Re: [PATCH v1 3/6] irqchip/gic-v3: support SGI broadcast

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 21 Oct 2024 05:22:15 +0100,
Yu Zhao <yuzhao@xxxxxxxxxx> wrote:
> 
> GIC v3 and later support SGI broadcast, i.e., the mode that routes
> interrupts to all PEs in the system excluding the local CPU.
> 
> Supporting this mode can avoid looping through all the remote CPUs
> when broadcasting SGIs, especially for systems with 200+ CPUs. The
> performance improvement can be measured with the rest of this series
> booted with "hugetlb_free_vmemmap=on irqchip.gicv3_pseudo_nmi=1":
> 
>   cd /sys/kernel/mm/hugepages/
>   echo 600 >hugepages-1048576kB/nr_hugepages
>   echo 2048kB >hugepages-1048576kB/demote_size
>   perf record -g -- bash -c "echo 600 >hugepages-1048576kB/demote"
> 
>          gic_ipi_send_mask()  bash sys time
> Before:  38.14%               0m10.513s
> After:    0.20%               0m5.132s
> 
> Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx>
> ---
>  drivers/irqchip/irq-gic-v3.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index ce87205e3e82..42c39385e1b9 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -1394,9 +1394,20 @@ static void gic_send_sgi(u64 cluster_id, u16 tlist, unsigned int irq)
>  	gic_write_sgi1r(val);
>  }
>  
> +static void gic_broadcast_sgi(unsigned int irq)
> +{
> +	u64 val;
> +
> +	val = BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT) | (irq << ICC_SGI1R_SGI_ID_SHIFT);

As picked up by the test bot, please fix the 32bit build.

> +
> +	pr_devel("CPU %d: broadcasting SGI %u\n", smp_processor_id(), irq);
> +	gic_write_sgi1r(val);
> +}
> +
>  static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
>  {
>  	int cpu;
> +	cpumask_t broadcast;
>  
>  	if (WARN_ON(d->hwirq >= 16))
>  		return;
> @@ -1407,6 +1418,13 @@ static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
>  	 */
>  	dsb(ishst);
>  
> +	cpumask_copy(&broadcast, cpu_present_mask);

Why cpu_present_mask? I'd expect that cpu_online_mask should be the
correct mask to use -- we don't IPI offline CPUs, in general.

> +	cpumask_clear_cpu(smp_processor_id(), &broadcast);
> +	if (cpumask_equal(&broadcast, mask)) {
> +		gic_broadcast_sgi(d->hwirq);
> +		goto done;
> +	}

So the (valid) case where you would IPI *everyone* is not handled as a
fast path? That seems a missed opportunity.

This also seem an like expensive way to do it. How about something
like:

	int mcnt = cpumask_weight(mask);
	int ocnt = cpumask_weight(cpu_online_mask);
	if (mcnt == ocnt)  {
		/* Broadcast to all CPUs including self */
	} else if (mcnt == (ocnt - 1) &&
		   !cpumask_test_cpu(smp_processor_id(), mask)) {
		/* Broadcast to all but self */
	}

which avoids the copy+update_full compare.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux