Hi Marek, On Thu, 14 Apr 2022 10:09:31 +0100, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote: > > Hi Marc, > > On 13.04.2022 19:26, Marc Zyngier wrote: > > Hi Marek, > > > > On Wed, 13 Apr 2022 15:59:21 +0100, > > Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote: > >> Hi Marc, > >> > >> On 05.04.2022 20:50, Marc Zyngier wrote: > >>> When booting with maxcpus=<small number> (or even loading a driver > >>> while most CPUs are offline), it is pretty easy to observe managed > >>> affinities containing a mix of online and offline CPUs being passed > >>> to the irqchip driver. > >>> > >>> This means that the irqchip cannot trust the affinity passed down > >>> from the core code, which is a bit annoying and requires (at least > >>> in theory) all drivers to implement some sort of affinity narrowing. > >>> > >>> In order to address this, always limit the cpumask to the set of > >>> online CPUs. > >>> > >>> Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > >> This patch landed in linux next-20220413 as commit 33de0aa4bae9 > >> ("genirq: Always limit the affinity to online CPUs"). Unfortunately it > >> breaks booting of most ARM 32bit Samsung Exynos based boards. > >> > >> I don't see anything specific in the log, though. Booting just hangs at > >> some point. The only Samsung Exynos boards that boot properly are those > >> Exynos4412 based. > >> > >> I assume that this is related to the Multi Core Timer IRQ configuration > >> specific for that SoCs. Exynos4412 uses PPI interrupts, while all other > >> Exynos SoCs have separate IRQ lines for each CPU. > >> > >> Let me know how I can help debugging this issue. > > Thanks for the heads up. Can you pick the last working kernel, enable > > CONFIG_GENERIC_IRQ_DEBUGFS, and dump the /sys/kernel/debug/irq/irqs/ > > entries for the timer IRQs? > > Exynos4210, Trats board, next-20220411: Thanks for all of the debug, super helpful. The issue is that we don't handle the 'force' case, which a handful of drivers are using when bringing up CPUs (and doing so before the CPUs are marked online). Can you please give the below hack a go? Thanks, M. diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index f71ecc100545..f1d5a94c6c9f 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -266,10 +266,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, prog_mask = mask; } - /* Make sure we only provide online CPUs to the irqchip */ + /* + * Make sure we only provide online CPUs to the irqchip, + * unless we are being asked to force the affinity (in which + * case we do as we are told). + */ cpumask_and(&tmp_mask, prog_mask, cpu_online_mask); - if (!cpumask_empty(&tmp_mask)) + if (!force && !cpumask_empty(&tmp_mask)) ret = chip->irq_set_affinity(data, &tmp_mask, force); + else if (force) + ret = chip->irq_set_affinity(data, mask, force); else ret = -EINVAL; -- Without deviation from the norm, progress is not possible.