Hi Marc, On 13.04.2022 19:26, Marc Zyngier wrote: > Hi Marek, > > On Wed, 13 Apr 2022 15:59:21 +0100, > Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote: >> Hi Marc, >> >> On 05.04.2022 20:50, Marc Zyngier wrote: >>> When booting with maxcpus=<small number> (or even loading a driver >>> while most CPUs are offline), it is pretty easy to observe managed >>> affinities containing a mix of online and offline CPUs being passed >>> to the irqchip driver. >>> >>> This means that the irqchip cannot trust the affinity passed down >>> from the core code, which is a bit annoying and requires (at least >>> in theory) all drivers to implement some sort of affinity narrowing. >>> >>> In order to address this, always limit the cpumask to the set of >>> online CPUs. >>> >>> Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> >> This patch landed in linux next-20220413 as commit 33de0aa4bae9 >> ("genirq: Always limit the affinity to online CPUs"). Unfortunately it >> breaks booting of most ARM 32bit Samsung Exynos based boards. >> >> I don't see anything specific in the log, though. Booting just hangs at >> some point. The only Samsung Exynos boards that boot properly are those >> Exynos4412 based. >> >> I assume that this is related to the Multi Core Timer IRQ configuration >> specific for that SoCs. Exynos4412 uses PPI interrupts, while all other >> Exynos SoCs have separate IRQ lines for each CPU. >> >> Let me know how I can help debugging this issue. > Thanks for the heads up. Can you pick the last working kernel, enable > CONFIG_GENERIC_IRQ_DEBUGFS, and dump the /sys/kernel/debug/irq/irqs/ > entries for the timer IRQs? Exynos4210, Trats board, next-20220411: root@target:~# cat /proc/interrupts | grep mct 40: 0 0 GIC-0 89 Level mct_comp_irq 41: 4337 0 GIC-0 74 Level mct_tick0 42: 0 11061 GIC-0 80 Level mct_tick1 root@target:~# cat /sys/kernel/debug/irq/irqs/41 handler: handle_fasteoi_irq device: (null) status: 0x00003504 _IRQ_NOPROBE _IRQ_NOAUTOEN istate: 0x00000000 ddepth: 0 wdepth: 0 dstate: 0x13403604 IRQ_TYPE_LEVEL_HIGH IRQD_LEVEL IRQD_ACTIVATED IRQD_IRQ_STARTED IRQD_NO_BALANCING IRQD_SINGLE_TARGET IRQD_AFFINITY_SET IRQD_DEFAULT_TRIGGER_SET IRQD_HANDLE_ENFORCE_IRQCTX node: 0 affinity: 0 effectiv: 0 domain: :soc:interrupt-controller@10490000 hwirq: 0x4a chip: GIC-0 flags: 0x15 IRQCHIP_SET_TYPE_MASKED IRQCHIP_MASK_ON_SUSPEND IRQCHIP_SKIP_SET_WAKE root@target:~# cat /sys/kernel/debug/irq/irqs/42 handler: handle_fasteoi_irq device: (null) status: 0x00003504 _IRQ_NOPROBE _IRQ_NOAUTOEN istate: 0x00000000 ddepth: 0 wdepth: 0 dstate: 0x13403604 IRQ_TYPE_LEVEL_HIGH IRQD_LEVEL IRQD_ACTIVATED IRQD_IRQ_STARTED IRQD_NO_BALANCING IRQD_SINGLE_TARGET IRQD_AFFINITY_SET IRQD_DEFAULT_TRIGGER_SET IRQD_HANDLE_ENFORCE_IRQCTX node: 0 affinity: 1 effectiv: 1 domain: :soc:interrupt-controller@10490000 hwirq: 0x50 chip: GIC-0 flags: 0x15 IRQCHIP_SET_TYPE_MASKED IRQCHIP_MASK_ON_SUSPEND IRQCHIP_SKIP_SET_WAKE root@target:~# > Also, see below. > >>> --- >>> kernel/irq/manage.c | 25 +++++++++++++++++-------- >>> 1 file changed, 17 insertions(+), 8 deletions(-) >>> >>> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c >>> index c03f71d5ec10..f71ecc100545 100644 >>> --- a/kernel/irq/manage.c >>> +++ b/kernel/irq/manage.c >>> @@ -222,11 +222,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, >>> { >>> struct irq_desc *desc = irq_data_to_desc(data); >>> struct irq_chip *chip = irq_data_get_irq_chip(data); >>> + const struct cpumask *prog_mask; >>> int ret; >>> >>> + static DEFINE_RAW_SPINLOCK(tmp_mask_lock); >>> + static struct cpumask tmp_mask; >>> + >>> if (!chip || !chip->irq_set_affinity) >>> return -EINVAL; >>> >>> + raw_spin_lock(&tmp_mask_lock); >>> /* >>> * If this is a managed interrupt and housekeeping is enabled on >>> * it check whether the requested affinity mask intersects with >>> @@ -248,24 +253,28 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, >>> */ >>> if (irqd_affinity_is_managed(data) && >>> housekeeping_enabled(HK_TYPE_MANAGED_IRQ)) { >>> - const struct cpumask *hk_mask, *prog_mask; >>> - >>> - static DEFINE_RAW_SPINLOCK(tmp_mask_lock); >>> - static struct cpumask tmp_mask; >>> + const struct cpumask *hk_mask; >>> >>> hk_mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ); >>> >>> - raw_spin_lock(&tmp_mask_lock); >>> cpumask_and(&tmp_mask, mask, hk_mask); >>> if (!cpumask_intersects(&tmp_mask, cpu_online_mask)) >>> prog_mask = mask; >>> else >>> prog_mask = &tmp_mask; >>> - ret = chip->irq_set_affinity(data, prog_mask, force); >>> - raw_spin_unlock(&tmp_mask_lock); >>> } else { >>> - ret = chip->irq_set_affinity(data, mask, force); >>> + prog_mask = mask; >>> } >>> + >>> + /* Make sure we only provide online CPUs to the irqchip */ >>> + cpumask_and(&tmp_mask, prog_mask, cpu_online_mask); >>> + if (!cpumask_empty(&tmp_mask)) >>> + ret = chip->irq_set_affinity(data, &tmp_mask, force); >>> + else >>> + ret = -EINVAL; > Can you also check that with the patch applied, it is this path that > is taken and that it is the timer interrupts that get rejected? If > that's the case, can you put a dump_stack() here and give me that > stack trace? The use of irq_force_affinity() in the driver looks > suspicious... [ 0.158241] smp: Bringing up secondary CPUs ... [ 0.166118] irq_do_set_affinity irq 42 [ 0.166160] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.18.0-rc1-00002-g33de0aa4bae9-dirty #11708 [ 0.166176] Hardware name: Samsung Exynos (Flattened Device Tree) [ 0.166188] unwind_backtrace from show_stack+0x10/0x14 [ 0.166220] show_stack from dump_stack_lvl+0x58/0x70 [ 0.166239] dump_stack_lvl from irq_do_set_affinity+0x188/0x1c8 [ 0.166258] irq_do_set_affinity from irq_set_affinity_locked+0xf8/0x17c [ 0.166274] irq_set_affinity_locked from irq_force_affinity+0x34/0x54 [ 0.166290] irq_force_affinity from exynos4_mct_starting_cpu+0xdc/0x11c [ 0.166308] exynos4_mct_starting_cpu from cpuhp_invoke_callback+0x190/0x38c [ 0.166328] cpuhp_invoke_callback from cpuhp_invoke_callback_range+0x98/0xb4 [ 0.166345] cpuhp_invoke_callback_range from notify_cpu_starting+0x54/0x94 [ 0.166364] notify_cpu_starting from secondary_start_kernel+0x160/0x26c [ 0.166383] secondary_start_kernel from 0x40101a00 [ 0.166498] CPU1: thread -1, cpu 1, socket 9, mpidr 80000901 [ 0.166515] CPU1: Spectre v2: using BPIALL workaround [ 0.265631] smp: Brought up 1 node, 2 CPUs [ 0.268660] SMP: Total of 2 processors activated (96.00 BogoMIPS). [ 0.274583] CPU: All CPU(s) started in SVC mode. > Finally, is there a QEMU emulation of one of these failing boards? yes, smdkc210, I've reproduced it with the following command: qemu-system-arm -serial null -serial stdio -kernel /tftpboot/qemu/zImage -dtb /tftpboot/qemu/exynos4210-smdkv310.dtb -append "console=ttySAC1,115200n8 earlycon root=/dev/mmcblk0p2 rootwait init=/bin/bash ip=::::target::off cpuidle.off=1" -M smdkc210 -drive file=qemu-smdkv310-mmcblk0.raw,if=sd,bus=0,index=2,format=raw Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland