+ kdump-fix-apic-shutdown-sequence.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     kdump: fix APIC shutdown sequence
has been added to the -mm tree.  Its filename is
     kdump-fix-apic-shutdown-sequence.patch

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
out what to do about this

------------------------------------------------------
Subject: kdump: fix APIC shutdown sequence
From: Martin Wilck <martin.wilck@xxxxxxxxxxxxxxxxxxx>

This patch fixes a problem that we have encountered with kdump under high
I/O load on some machines.  The machines showing the errors have an Intel
ICH7 chip set with a 6702PXH PCI Express-to-PCI Bridge (8086:032c)
containing an IO-APIC.

The bug symptom is that certain controllers connected to the 6702PXH bridge
wouldn't receive any IRQs in the kdump kernel.  In the error case (which is
about 20% of all cases) the IRR bit of the IO-APIC pin for that controller
is always set after the start of the kdump kernel, indicating an IRQ in
progress.  We haven't found a way to recover from this situation when it
has once occured, except for a system reset.

The error is caused by IRQs arriving while the APIC subsystem is
deactivated in machine_crash_shutdown().

Apparently, the IO-APIC gets stuck if it sends an IRQ message to a Local
APIC and never receives an EOI for that message.  This can have several
possible reasons:

1. If, under SMP, the IO-APIC logical destination field is set by the
   IRQ balancing code to one of the "other" CPUs (i.e.  not the
   crashing_cpu), and an IRQ arrives on the respective pin after that CPU
   has shut down its local APIC (but before the IO-APIC pin is masked) the
   IRQ message can't be delivered.

2. The crashing CPU itself disables its local APIC before the IO-APIC,
   leaving a short time window where the IOAPIC can receive IRQs, but not
   deliver them.

3. An IRQ is received and delivered to a local APIC, but no CPU ever
   executes the IRQ handler and therefore no EOI is sent.

After a lot of failed attempts, i have come up with the following patch,
which fixes the problem.

The patch first masks all IO-Apic pins to avoid a sitation where the
IO-Apic can receive, but not deliver, the IRQs.  Moreover, it enables
interrupts for a short period before eventually starting the kdump kernel,
so that EOIs can be sent to the APICs as necessary.

Notes:

a) Simply calling disable_IO_APIC() early doesn't work, probably because
   that also clears the IRQ vector information, so that arriving EOI
   messages can't be associated with pins by the IO-APIC.

b) We have tried patches that avoid re-enabling interrupts, but so far
   without success.  Re-enabling IRQs is of course dangerous while dumping,
   and I'd rather find a way to avoid it.

c) There are indications that besides the EOI, it's also necessary that
   the PCI IRQ pin is deasserted at least for a short time.  That usually
   requires that the driver IRQ handler is called and tells the FW that the
   IRQ was received.  Whether or not this is a requirement hasn't been
   finally clarified yet.

d) The problem is only seen with the IO-APIC in the 6702PXH PCI bridge,
   which is the system's secondary IO-APIC.  On the system's main IO-APIC,
   we see other IRQs (timer etc) arrive and never get an EOI, but we see no
   errors.

Signed-off-by: Martin Wilck <Martin.Wilck@xxxxxxxxxxxxxxxxxxx>
Cc: Vivek Goyal <vgoyal@xxxxxxxxxx>
Cc: Haren Myneni <hbabu@xxxxxxxxxx>
Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
Cc: Andi Kleen <ak@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 arch/x86_64/kernel/crash.c   |   69 ++++++++++++++++++++++++++++-----
 arch/x86_64/kernel/io_apic.c |   44 +++++++++++++++++++++
 arch/x86_64/kernel/smpboot.c |    2 
 3 files changed, 104 insertions(+), 11 deletions(-)

diff -puN arch/x86_64/kernel/crash.c~kdump-fix-apic-shutdown-sequence arch/x86_64/kernel/crash.c
--- a/arch/x86_64/kernel/crash.c~kdump-fix-apic-shutdown-sequence
+++ a/arch/x86_64/kernel/crash.c
@@ -18,6 +18,7 @@
 #include <linux/elf.h>
 #include <linux/elfcore.h>
 #include <linux/kdebug.h>
+#include <linux/interrupt.h>
 
 #include <asm/processor.h>
 #include <asm/hardirq.h>
@@ -30,6 +31,11 @@ static int crashing_cpu;
 
 #ifdef CONFIG_SMP
 static atomic_t waiting_for_crash_ipi;
+static atomic_t crash_ipi_stage2 = ATOMIC_INIT(0);
+
+extern void remove_siblinginfo(int);
+extern void remove_cpu_from_maps(void);
+extern void crash_mask_IO_APIC(int);
 
 static int crash_nmi_callback(struct notifier_block *self,
 				unsigned long val, void *data)
@@ -53,13 +59,30 @@ static int crash_nmi_callback(struct not
 	local_irq_disable();
 
 	crash_save_cpu(regs, cpu);
-	disable_local_APIC();
-	atomic_dec(&waiting_for_crash_ipi);
+	disable_APIC_timer();
+        atomic_dec(&waiting_for_crash_ipi);
+
+	while(atomic_read(&crash_ipi_stage2) == 0)
+		cpu_relax();
+
+ 	/* Send EOI for pending IRQs */
+ 	local_irq_enable();
+ 	udelay(10000);
+ 	local_irq_disable();
+
+ 	remove_siblinginfo(cpu);
+ 	cpu_clear(cpu, cpu_online_map);
+ 	remove_cpu_from_maps();
+
+ 	disable_local_APIC();
+ 	atomic_dec(&crash_ipi_stage2);
+
+
 	/* Assume hlt works */
 	for(;;)
 		halt();
 
-	return 1;
+	return NOTIFY_OK;
 }
 
 static void smp_send_nmi_allbutself(void)
@@ -79,7 +102,7 @@ static struct notifier_block crash_nmi_n
 
 static void nmi_shootdown_cpus(void)
 {
-	unsigned long msecs;
+	unsigned long usecs;
 
 	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
 	if (register_die_notifier(&crash_nmi_nb))
@@ -93,19 +116,33 @@ static void nmi_shootdown_cpus(void)
 
 	smp_send_nmi_allbutself();
 
+	usecs = 1000000; /* Wait at most a second for the other cpus to stop */
+	while ((atomic_read(&waiting_for_crash_ipi) > 0) && usecs) {
+		udelay(1);
+		usecs--;
+	}
+}
+
+static void nmi_shootdown_cpus_stage2(void)
+{
+
+	unsigned long msecs;
+	atomic_set(&crash_ipi_stage2, num_online_cpus());
+
 	msecs = 1000; /* Wait at most a second for the other cpus to stop */
-	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
+	while ((atomic_read(&crash_ipi_stage2) > 1) && msecs) {
 		mdelay(1);
 		msecs--;
 	}
-	/* Leave the nmi callback set */
-	disable_local_APIC();
 }
 #else
 static void nmi_shootdown_cpus(void)
 {
 	/* There are no cpus to shootdown */
 }
+static void nmi_shootdown_cpus_stage2(void)
+{
+}
 #endif
 
 void machine_crash_shutdown(struct pt_regs *regs)
@@ -121,15 +158,27 @@ void machine_crash_shutdown(struct pt_re
 	 */
 	/* The kernel is broken so disable interrupts */
 	local_irq_disable();
-
 	/* Make a note of crashing cpu. Will be used in NMI callback.*/
 	crashing_cpu = smp_processor_id();
+	crash_save_cpu(regs, crashing_cpu);
+
+ 	/* disable timer interrupts */
+ 	disable_irq_nosync(0);
+ 	disable_APIC_timer();
+
 	nmi_shootdown_cpus();
 
+ 	/* Mask all IRQs, and make sure they are delivered to this CPU */
+ 	crash_mask_IO_APIC(crashing_cpu);
+ 	nmi_shootdown_cpus_stage2();
+
 	if(cpu_has_apic)
 		 disable_local_APIC();
 
-	disable_IO_APIC();
+ 	/* Send EOI for pending IRQs */
+ 	local_irq_enable();
+ 	udelay(10000);
+ 	local_irq_disable();
 
-	crash_save_cpu(regs, smp_processor_id());
+	disable_IO_APIC();
 }
diff -puN arch/x86_64/kernel/io_apic.c~kdump-fix-apic-shutdown-sequence arch/x86_64/kernel/io_apic.c
--- a/arch/x86_64/kernel/io_apic.c~kdump-fix-apic-shutdown-sequence
+++ a/arch/x86_64/kernel/io_apic.c
@@ -371,6 +371,50 @@ static void unmask_IO_APIC_irq (unsigned
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
+static void crash_mask_IO_APIC_pin(unsigned int apic, unsigned int pin, int reg1)
+{
+	struct IO_APIC_route_entry entry;
+	unsigned long flags;
+
+	/* Check delivery_mode to be sure we're not clearing an SMI pin
+	 * Don't bother disabling masked pins
+	 */
+	spin_lock_irqsave(&ioapic_lock, flags);
+	*(((int*)&entry) + 0) = io_apic_read(apic, 0x10 + 2 * pin);
+	spin_unlock_irqrestore(&ioapic_lock, flags);
+	if (entry.delivery_mode == dest_SMI || entry.mask)
+		return;
+
+	entry.mask = 1;
+	*(((int*)&entry) + 1) = reg1;
+
+	spin_lock_irqsave(&ioapic_lock, flags);
+	io_apic_write(apic, 0x10 + 2 * pin, *(((int *)&entry) + 0));
+	io_apic_write(apic, 0x11 + 2 * pin, *(((int *)&entry) + 1));
+	io_apic_sync(apic);
+	spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
+/*
+ * This function is used for kdump to mask all IO-APIC IRQs, and reset
+ * the destination apic ID.
+ */
+void crash_mask_IO_APIC(int cpu)
+{
+	int apic, pin;
+	int reg1;
+	cpumask_t mask;
+
+	cpus_clear(mask);
+	cpu_set(cpu, mask);
+	reg1 = SET_APIC_LOGICAL_ID(cpu_mask_to_apicid(mask));
+
+	for (apic = 0; apic < nr_ioapics; apic++) {
+		for (pin = 0; pin < nr_ioapic_registers[apic]; pin++)
+			crash_mask_IO_APIC_pin(apic, pin, reg1);
+	}
+}
+
 static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 {
 	struct IO_APIC_route_entry entry;
diff -puN arch/x86_64/kernel/smpboot.c~kdump-fix-apic-shutdown-sequence arch/x86_64/kernel/smpboot.c
--- a/arch/x86_64/kernel/smpboot.c~kdump-fix-apic-shutdown-sequence
+++ a/arch/x86_64/kernel/smpboot.c
@@ -973,7 +973,7 @@ void __init smp_cpus_done(unsigned int m
 
 #ifdef CONFIG_HOTPLUG_CPU
 
-static void remove_siblinginfo(int cpu)
+void remove_siblinginfo(int cpu)
 {
 	int sibling;
 	struct cpuinfo_x86 *c = cpu_data;
_

Patches currently in -mm which might be from martin.wilck@xxxxxxxxxxxxxxxxxxx are

kdump-fix-apic-shutdown-sequence.patch

-
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Kernel Newbies FAQ]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux