> Perhaps calling setup_IO_APIC before setup_local_APIC would be a better fix? I checked Intel develper's manual and there is no restriction about the order of enabling IO_APIC/local_APIC. So, it may work. But, I don't understand why we have to change the stable boot-up code. If kdump disables both local_apic and IO_APIC in proper way in 1st kernel, 2nd kernel works without any change. I think busting spinlocks ,like io_apic_lock, in 1st kernel is reasonable. Seiji > -----Original Message----- > From: Don Zickus [mailto:dzickus at redhat.com] > Sent: Monday, April 30, 2012 4:54 PM > To: Seiji Aguchi; ebiederm at xmission.com > Cc: x86 at kernel.org; LKML; kexec-list; Vivek Goyal > Subject: Re: [PATCH] x86, kdump: No need to disable ioapic in crash path > > On Thu, Mar 15, 2012 at 05:16:50PM -0400, Seiji Aguchi wrote: > > Don, > > > > What do you think about following scenario? > > Disabling I/O APIC seems to be needed before booting kdump kernel. > > For some reason I actually believed this was cleared before interrupts were enabled on bootup. Apparently not. On a virt guest I can > easily create a scenario in which scp'ing a file then kdumping, leaves the ethernet interrupt in a triggered state. > > Before this patch, it would be masked by disable_IO_APIC. With my patch the irq nevers gets masked and during setup_local_APIC > the kernel falls over once the local APIC is enabled (as setup_IO_APIC is called later). > Perhaps calling setup_IO_APIC before setup_local_APIC would be a better fix? > > Just like NMIs prohibit the abilty to remove the disable local apic code, an actively triggered interrupt seems to prevent us from > removing the disable io apic. > > This leaves me with my original problem of deadlocking in the disable_IO_APIC path. > > Thoughts? > > Cheers, > Don > > > > > Seiji > > > > > > commit 1e75b31d638d5242ca8e9771dfdcbd28a5f041df > > Author: Suresh Siddha <suresh.b.siddha at intel.com> > > Date: Thu Aug 25 12:01:11 2011 -0700 > > > > x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC > > > > In the kdump scenario mentioned below, we can have a case where > > the device using level triggered interrupt will not generate any > > interrupts in the kdump kernel. > > > > 1. IO-APIC sends a level triggered interrupt to the CPU's local APIC. > > > > 2. Kernel crashed before the CPU services this interrupt, leaving > > the remote-IRR in the IO-APIC set. > > > > 3. kdump kernel boot sequence does clear_IO_APIC() as part of IO-APIC > > initialization. But this fails to reset remote-IRR bit of the > > IO-APIC RTE as the remote-IRR bit is read-only. > > > > 4. Device using that level triggered entry can't generate any > > more interrupts because of the remote-IRR bit. > > > > In clear_IO_APIC_pin(), check if the remote-IRR bit is set and if > > so do an explicit attempt to clear it (by doing EOI write on > > modern io-apic's and changing trigger mode to edge/level on > > older io-apic's). Also before doing the explicit EOI to the > > io-apic, ensure that the trigger mode is indeed set to level. > > This will enable the explicit EOI to the io-apic to reset the > > remote-IRR bit. > > > > Tested-by: Leonardo Chiquitto <lchiquitto at novell.com> > > Signed-off-by: Suresh Siddha <suresh.b.siddha at intel.com> > > Fixes: https://bugzilla.novell.com/show_bug.cgi?id=701686 > > Cc: Rafael Wysocki <rjw at novell.com> > > Cc: Maciej W. Rozycki <macro at linux-mips.org> > > Cc: Thomas Renninger <trenn at suse.de> > > Cc: jbeulich at novell.com > > Cc: yinghai at kernel.org > > Link: http://lkml.kernel.org/r/20110825190657.157502602 at sbsiddha-desk.sc.intel.com > > Signed-off-by: Ingo Molnar <mingo at elte.hu> > > > > > -----Original Message----- > > > From: linux-kernel-owner at vger.kernel.org > > > [mailto:linux-kernel-owner at vger.kernel.org] On Behalf Of Don Zickus > > > Sent: Thursday, March 15, 2012 4:27 PM > > > To: x86 at kernel.org > > > Cc: LKML; kexec-list; Eric W. Biederman; Vivek Goyal > > > Subject: Re: [PATCH] x86, kdump: No need to disable ioapic in crash > > > path > > > > > > On Wed, Feb 29, 2012 at 03:08:49PM -0500, Don Zickus wrote: > > > > A customer of ours noticed when their machine crashed, kdump did > > > > not work but hung instead. Using their firmware dumping solution > > > > they grabbed a vmcore and decoded the stacks on the cpus. What > > > > they noticed seemed to be a rare deadlock with the ioapic_lock. > > > > > > While we are discussing the NMI stuff in another thread, does anyone > > > have any objection to committing this patch. It fixes a real problem today. > > > > > > Cheers, > > > Don > > > > > > > > > > > CPU4: > > > > machine_crash_shutdown > > > > -> machine_ops.crash_shutdown > > > > -> native_machine_crash_shutdown > > > > -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs > > > > -> disable_IO_APIC > > > > -> clear_IO_APIC > > > > -> clear_IO_APIC_pin > > > > -> ioapic_read_entry > > > > -> spin_lock_irqsave(&ioapic_lock, flags) > > > > ---Infinite loop here--- > > > > > > > > CPU0: > > > > do_IRQ > > > > -> handle_irq > > > > -> handle_edge_irq > > > > -> ack_apic_edge > > > > -> move_native_irq > > > > -> mask_IO_APIC_irq > > > > -> mask_IO_APIC_irq_desc > > > > -> spin_lock_irqsave(&ioapic_lock, flags) > > > > ---Receive NMI here after getting spinlock--- > > > > -> nmi > > > > -> do_nmi > > > > -> crash_nmi_callback > > > > ---Infinite loop here--- > > > > > > > > The problem is that although kdump tries to shutdown minimal > > > > hardware, it still needs to disable the IO APIC. This requires > > > > spinlocks which may be held by another cpu. This other cpu is > > > > being held infinitely in an NMI context by kdump in order to serialize the crashing path. > > > > Instant deadlock. > > > > > > > > Eric, brought up a point that because the boot code was > > > > restructured we may not need to disable the io apic any more in the crash path. > > > > The original concern that led to the development of > > > > disable_IO_APIC, was that the jiffies calibration on boot up > > > > relied on the PIT timer for reference. Access to the PIT required > > > > 8259 interrupts to be working. This wouldn't work if the ioapic needed to be configured. > > > > So on panic path, the ioapic was reconfigured to use virtual wire mode to allow the 8259 to passthrough. > > > > > > > > Those concerns don't hold true now, thanks to the jiffies > > > > calibration code not needing the PIT. As a result, we can remove > > > > this call and simplify the locking needed in the panic path. > > > > > > > > I tested kdump on an Ivy Bridge platform, a Pentium4 and an old > > > > athlon that did not have an ioapic. All three were successful. > > > > > > > > I also tested using lkdtm that would use jprobes to panic the > > > > system when entering do_IRQ. The idea was to see how the system > > > > reacted with an interrupt pending in the second kernel. My core2 > > > > quad successfully kdump'd > > > > 3 times in a row with no issues. > > > > > > > > v2: removed the disable lapic code too > > > > v3: re-add disabling of lapic code > > > > > > > > Cc: Eric W. Biederman <ebiederm at xmission.com> > > > > Cc: Vivek Goyal <vgoyal at redhat.com> > > > > Signed-off-by: Don Zickus <dzickus at redhat.com> > > > > --- > > > > > > > > There are really two problems here. One is the deadlock of the > > > > ioapic_lock that I describe above. Removing the code to disable > > > > the ioapic seems to resolve that. > > > > > > > > The second issue is handling non-IRQ exceptions like NMIs. Eric > > > > asked me to include removing the disable lapic code too. However, > > > > because the nmi watchdog is stil active and kexec zeros out the > > > > idt before it jumps to purgatory, an NMI that comes in during the > > > > transition between the first kernel and second kernel will see an empty idt and reset the cpu. > > > > > > > > Leaving the code to disable the lapic in, turns off perf and > > > > blocks those NMIs from happening (though an external NMI would > > > > still be an issue but that is no different than right now). > > > > > > > > I tried playing with a stub idt and leaving it in place through > > > > the transition to the second kernel, but I can't quite get it to > > > > work correctly. Spinning in the first kernel before the purgatory > > > > jump catches the idt properly. Spinning in purgatory before the > > > > second kernel jump doesn't. I even disabled the zero'ing out of the idt in the purgatory code. > > > > > > > > I would like to get resolution on the ioapic deadlock to fix a > > > > customer issue while working the idt and NMI thing on the side, > > > > hence the split of this patchset. > > > > > > > > Hopefully, people recognize there are two issues here and that > > > > this patch resolves the first one and the second one needs more debugging and time. > > > > --- > > > > arch/x86/kernel/crash.c | 3 --- > > > > 1 files changed, 0 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c > > > > index > > > > 13ad899..b053cf9 100644 > > > > --- a/arch/x86/kernel/crash.c > > > > +++ b/arch/x86/kernel/crash.c > > > > @@ -96,9 +96,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs) > > > > cpu_emergency_svm_disable(); > > > > > > > > lapic_shutdown(); > > > > -#if defined(CONFIG_X86_IO_APIC) > > > > - disable_IO_APIC(); > > > > -#endif > > > > #ifdef CONFIG_HPET_TIMER > > > > hpet_disable(); > > > > #endif > > > > -- > > > > 1.7.7.6 > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe > > > linux-kernel" in the body of a message to majordomo at vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Please read the FAQ at http://www.tux.org/lkml/