Re: Linux 4.9.6 ( Restore IO-APIC irq_chip retrigger callback , breaks my box )

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 12, 2017 at 9:48 PM, Gabriel C <nix.or.die@xxxxxxxxx> wrote:
>
> So far .. the kernel with the reverted revert + Mike's patch ( but even
> without Mike's patch ) will trigger an bug in mce code ,
> with mce=off it will trigger an bug in microcode code , with dis_ucode_ldr
> we trigger an bug(s) in smp_call_*() and hrtimer_*() functions..
>
> Also this 2 liner patch seems to break hell .. at least on my box.

I suspect you might be seeing symptoms that are "downstream" from the
real issue. So depending on what you enable/disable you get problems
in different parts.

That said, the block lockdep splat Jens sent a patch for. But it looks
very unlikely to be the root cause.

The fact that the two-liner patch matters so much and so consistently
(despite timing differences elsewhere) makes me think it realy is
specifically about irq retriggering, not the random symptoms we see.

Can you try this patch on top of a working setup that doesn't have a
lot of other noise in it (ie presumably just plain rc8, for example)?
All it does is do a WARN_ON() when the irq_retrigger() code is
actually triggered. It shouldn't be so common as to spam all your
logs, but I might be wrong, so I put a stub in where you could replace
the

    if (1)

with something like

    if (printk_ratelimit())

which should limit it to something like a max of 10 warnings every five seconds.

I'm wondering if we somehow end up trying to retrigger an NMI or other
internal APIC thing (maybe that's where the machine check comes in?)
that definitely shouldn't be retriggered.

                  Linus
 arch/x86/kernel/irq.c | 4 ++++
 kernel/irq/resend.c   | 5 +++++
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 7c6e9ffe4424..d115f0da2d25 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -532,6 +532,10 @@ void fixup_irqs(void)
 		if (irr  & (1 << (vector % 32))) {
 			desc = __this_cpu_read(vector_irq[vector]);
 
+			if (1) { // rate limit?
+				printk("IRQ retrigger for %s\n", desc->name);
+				WARN_ON_ONCE(1);
+			}
 			raw_spin_lock(&desc->lock);
 			data = irq_desc_get_irq_data(desc);
 			chip = irq_data_get_irq_chip(data);
diff --git a/kernel/irq/resend.c b/kernel/irq/resend.c
index b86886beee4f..4f2df79ac887 100644
--- a/kernel/irq/resend.c
+++ b/kernel/irq/resend.c
@@ -71,6 +71,11 @@ void check_irq_resend(struct irq_desc *desc)
 		desc->istate &= ~IRQS_PENDING;
 		desc->istate |= IRQS_REPLAY;
 
+		if (1) { // rate limit?
+			printk("IRQ retrigger for %s\n", desc->name);
+			WARN_ON_ONCE(1);
+		}
+
 		if (!desc->irq_data.chip->irq_retrigger ||
 		    !desc->irq_data.chip->irq_retrigger(&desc->irq_data)) {
 #ifdef CONFIG_HARDIRQS_SW_RESEND

[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]