On Mon, 30 Mar 2020, Sebastian Andrzej Siewior wrote: > I'm pleased to announce the v5.4.28-rt19 patch set. First, I don't believe this is necessarily a regression coming with this particular version, but this is the first kernel where I tried this and it crashed. When running this kernel in KVM guest, and flooding its 8250-based ttyS0 serial console quickly leads to [ 52.674358] 000: irq 4: nobody cared (try booting with the "irqpoll" option) [ 52.674381] 000: CPU: 0 PID: 155 Comm: irq/4-ttyS0 Not tainted 5.4.28-rt19+ #7 [ 52.674389] 000: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-rebuilt.suse.com 04/01/2014 [ 52.674398] 000: Call Trace: [ 52.674417] 000: <IRQ> [ 52.674426] 000: dump_stack+0x50/0x6b [ 52.674446] 000: __report_bad_irq+0x2b/0xb0 [ 52.674454] 000: note_interrupt+0x22e/0x280 [ 52.674457] 000: ? io_serial_out+0x11/0x20 [ 52.674463] 000: handle_irq_event_percpu+0x6e/0x90 [ 52.674471] 000: handle_irq_event+0x48/0x90 [ 52.674473] 000: handle_edge_irq+0x95/0x1f0 [ 52.674476] 000: do_IRQ+0x6c/0x120 [ 52.674487] 000: common_interrupt+0xf/0xf [ 52.674496] 000: </IRQ> [ 52.674497] 000: RIP: 0010:io_serial_out+0x11/0x20 [ 52.674503] 000: Code: 8b 57 30 d3 e6 01 f2 ec 0f b6 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f b6 8f e1 00 00 00 89 d0 8b 57 30 d3 e6 01 f2 ee <c3> 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f b6 87 e2 00 00 00 [ 52.674507] 000: RSP: 0018:ffffa860c03a7de8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdb [ 52.674508] 000: RAX: 0000000000000034 RBX: ffffffffb05a3bc0 RCX: 0000000000000000 [ 52.674510] 000: RDX: 00000000000003f8 RSI: 0000000000000000 RDI: ffffffffb05a3bc0 [ 52.674514] 000: RBP: ffff98c1b9718000 R08: ffffffffafa0b8e0 R09: ffffa860c03a7d80 [ 52.674515] 000: R10: ffffa860c03a7c40 R11: 0000000000000000 R12: 0000000000000009 [ 52.674516] 000: R13: 0000000000000020 R14: 0000000000000001 R15: ffff98c13714a8b8 [ 52.674524] 000: ? io_serial_in+0x20/0x20 [ 52.674529] 000: serial8250_tx_chars+0x121/0x310 [ 52.674535] 000: ? migrate_disable+0x33/0x90 1339913399133991[ 52.674540] 000: serial8250_handle_irq.part.27+0xc2/0xe0 [ 52.674543] 000: serial8250_default_handle_irq+0x45/0x50 [ 52.674545] 000: serial8250_interrupt+0x51/0xa0 [ 52.674548] 000: ? irq_finalize_oneshot.part.46+0xd0/0xd0 [ 52.674550] 000: irq_forced_thread_fn+0x2b/0x70 [ 52.674552] 000: irq_thread+0xd8/0x170 [ 52.674554] 000: ? wake_threads_waitq+0x30/0x30 [ 52.674556] 000: ? irq_thread_dtor+0x90/0x90 [ 52.674558] 000: kthread+0x10e/0x130 [ 52.674563] 000: ? kthread_park+0x80/0x80 [ 52.674565] 000: ret_from_fork+0x35/0x40 [ 52.674571] 000: handlers: [ 52.674579] 000: [<00000000d13c0583>] irq_default_primary_handler threaded [<00000000f172db97>] serial8250_interrupt [ 52.674580] 000: Disabling IRQ #4 (and things go downhill afterwards with RCU stall on the affected CPU, etc). Reproducing it is as easy as firing up qemu with '-serial stdio', booting with console set to ttyS0, and then doing e.g. i=0; while test $i -lt 100000; do /usr/bin/echo $i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i; i=$((i+1)); done in the serial console terminal. It explodes within a few tens of seconds. Now, whenever this happens, we always seem to be interrupted in 0010:io_serial_out+0x11/0x20, which is this in Code ... d3 e6 01 f2 ee <c3> 66 66 2e 0f ... where '0xee' is the actual outb. So it looks like outb would take so long that 99k IRQ4s trigger in the meantime, which would seem to point to KVM / qemu issue. *However* after either - turning CONFIG_PREEMPT_RT off or - starting qemu-kvm without any '-smp' args (i.e. starting the guest as UP) this issue never reproduces any more; the reason for it almost certainly never triggering with !CONFIG_PREEMPT_RT is that the above pattern obviously can't happen, because we never interrupt that io_serial_out, because the spin_lock_irqsave() in serial8250_handle_irq() disables IRQ in this region. Why it doesn't reproduce on UP I don't understand. I haven't done much debugging on this yet, but reporting this right away in case anyone has any idea what's happening here. Thanks, -- Jiri Kosina SUSE Labs