Hi Finn,
Am 08.03.2024 um 22:33 schrieb Finn Thain:
On Fri, 8 Mar 2024, I wrote:
On Fri, 8 Mar 2024, Michael Schmitz wrote:
IMHO, Gueunter would do better to instrument the spinlock tests on the
assumption that the locking in arch/m68k really is correct.
I've come to agree - maybe logging any run queue locks taken by the init
task with IRQs enabled might help?
If I understood how the spinlock test algorithm worked, I'd be adding
sanity checks to it to test the assumptions underlying that algorithm.
I would not be surprised if it uses logic that sees irqs_disabled() and
then assumes that all hard irqs are disabled, or something similar.
That is quite likely, given our past experience with core code
assumptions about interrupts.
I don't think I'd be adding printk() because the BUG is intermittent and
printk() messes with timing.
Another way to tackle this is first find a reliable reproducer, then raise
the IPL before before dispatching hard interrupts.
The patch below was a forgotten experiment that I tried 3 years ago to
deal with a similar failure. At the time I was using stress-ng to
investigate a different issue, but maybe a workload like that could be a
reproducer.
---
After enabling CONFIG_DEBUG_SPINLOCK, I see the following failure on m68k.
[ 4089.070000] BUG: spinlock trylock failure on UP on CPU#0, stress-ng/1341
[ 4089.070000] lock: 0x53d4a0, .magic: dead4ead, .owner: stress-ng/1341, .owner_cpu: 0
[ 4089.070000] CPU: 0 PID: 1341 Comm: stress-ng Not tainted 5.14.0-multi-debug #2
[ 4089.070000] Stack from 00a7dcc8:
[ 4089.070000] 00a7dcc8 004da40f 004da40f 00000000 00a7dce4 0043e88e 004da40f 00a7dd10
[ 4089.070000] 0006316c 004d240e 0053d4a0 dead4ead 00956bba 0000053d 00000000 0053d4a0
[ 4089.070000] 00a7ddd8 0055e6ac 00a7dd28 000632b2 0053d4a0 004d2488 0005c81b 0053d458
[ 4089.070000] 00a7dd34 0044780e 0053d4a0 00a7dd58 002b8694 0053d4a0 00000001 0000007f
[ 4089.070000] 00860000 00000011 00528138 00528198 00a7dd7c 00069f6a 0000000d 00014200
[ 4089.070000] 00528138 00a7dd78 00a7def0 00528138 00014200 00a7dd98 00069ff0 00528138
[ 4089.070000] Call Trace: [<0043e88e>] dump_stack+0x10/0x16
[ 4089.070000] [<0006316c>] spin_dump+0x6c/0xc0
[ 4089.070000] [<000632b2>] do_raw_spin_trylock+0x32/0x80
[ 4089.070000] [<0005c81b>] init_dl_bandwidth+0x1b/0x80
[ 4089.070000] [<0044780e>] _raw_spin_trylock+0xe/0x40
[ 4089.070000] [<002b8694>] add_interrupt_randomness+0x154/0x1c0
[ 4089.070000] [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
[ 4089.070000] [<00014200>] INV_L2+0x8/0x10
[ 4089.070000] [<00014200>] INV_L2+0x8/0x10
[ 4089.070000] [<00069ff0>] handle_irq_event+0x30/0x80
[ 4089.070000] [<0006d9e4>] handle_simple_irq+0x64/0x80
[ 4089.070000] [<000695fe>] generic_handle_irq+0x3e/0x80
[ 4089.070000] [<00002ea4>] do_IRQ+0x24/0x40
[ 4089.070000] [<00002da8>] user_irqvec_fixup+0xc/0x14
[ 4089.070000] [<0005c81a>] init_dl_bandwidth+0x1a/0x80
[ 4089.070000] [<000d51c0>] find_vma+0x0/0x80
[ 4089.070000] [<00444500>] down_read+0x0/0x200
[ 4089.070000] [<002b86b0>] add_interrupt_randomness+0x170/0x1c0
[ 4089.070000] [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
[ 4089.070000] [<00069ff0>] handle_irq_event+0x30/0x80
[ 4089.070000] [<0006d9e4>] handle_simple_irq+0x64/0x80
[ 4089.070000] [<000695fe>] generic_handle_irq+0x3e/0x80
[ 4089.070000] [<00002ea4>] do_IRQ+0x24/0x40
[ 4089.070000] [<00002d74>] auto_irqhandler_fixup+0x4/0xc
[ 4089.070000] [<0000653c>] buserr_c+0x17c/0x600
[ 4089.070000] [<00002ba8>] buserr+0x20/0x28
[ 4089.070000] [<0008800d>] get_next_modinfo+0xcd/0x100
The problem seems to be that add_interrupt_randomness() was re-entered,
leading to a contended input_pool.lock and "spinlock trylock failure".
We already knew that add_interrupt_randomness() can be reentered, though
there are now checks in place that avoid taking input_pool.lock in
hardirq context. But these checks have been added less than three years
ago.
diff --git a/arch/m68k/kernel/irq.c b/arch/m68k/kernel/irq.c
index 9ab4f550342e..b46d8a57f4da 100644
--- a/arch/m68k/kernel/irq.c
+++ b/arch/m68k/kernel/irq.c
@@ -20,10 +20,13 @@
asmlinkage void do_IRQ(int irq, struct pt_regs *regs)
{
struct pt_regs *oldregs = set_irq_regs(regs);
+ unsigned long flags;
+ local_irq_save(flags);
irq_enter();
generic_handle_irq(irq);
irq_exit();
+ local_irq_restore(flags);
set_irq_regs(oldregs);
}
I notice legacy_timer_tick() states it must be called with interrupts
disabled. We do not explicitly do that in via_timer_handler(), but rely
on the IPL remaining high. Maybe adding local_irq_save(flags) /
local_irq_restore(flags) around the legacy_timer_tick() call would help
debug this?
Cheers,
Michael