[PATCH] [48/66] x86_64: timer interrupt lockup due to pending interrupt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Vivek Goyal <vgoyal@xxxxxxxxxx>

o check_timer() routine fails while second kernel is booting after a crash
  on an opetron box. Problem happens because timer vector (0x31) seems to be
  locked.

o After a system crash, it is not safe to service interrupts any more, hence
  interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC
  sends these interrupts to the CPU during early boot of second kernel. Other
  pending interrupts are discarded saying unexpected trap but timer interrupt
  is serviced and CPU does not issue an LAPIC EOI because it think this
  interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31
  locking as LAPIC does not clear respective ISR and keeps on waiting for
  EOI.

o This patch issues extra EOI for the pending interrupts who have ISR set.

o Though today only timer seems to be the special case because in early
  boot it thinks interrupts are coming from i8259 and uses
  mask_and_ack_8259A() as ack handler and does not issue LAPIC EOI. But
  probably doing it in generic manner for all vectors makes sense.

Signed-off-by: Vivek Goyal <vgoyal@xxxxxxxxxx>
Signed-off-by: Andi Kleen <ak@xxxxxxx>

---


---
 arch/x86_64/kernel/apic.c    |   20 ++++++++++++++++++++
 include/asm-x86_64/apicdef.h |    1 +
 2 files changed, 21 insertions(+)

Index: linux/arch/x86_64/kernel/apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/apic.c
+++ linux/arch/x86_64/kernel/apic.c
@@ -342,6 +342,7 @@ void __init init_bsp_APIC(void)
 void __cpuinit setup_local_APIC (void)
 {
 	unsigned int value, maxlvt;
+	int i, j;
 
 	value = apic_read(APIC_LVR);
 
@@ -371,6 +372,25 @@ void __cpuinit setup_local_APIC (void)
 	apic_write(APIC_TASKPRI, value);
 
 	/*
+	 * After a crash, we no longer service the interrupts and a pending
+	 * interrupt from previous kernel might still have ISR bit set.
+	 *
+	 * Most probably by now CPU has serviced that pending interrupt and
+	 * it might not have done the ack_APIC_irq() because it thought,
+	 * interrupt came from i8259 as ExtInt. LAPIC did not get EOI so it
+	 * does not clear the ISR bit and cpu thinks it has already serivced
+	 * the interrupt. Hence a vector might get locked. It was noticed
+	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
+	 */
+	for (i = APIC_ISR_NR - 1; i >= 0; i--) {
+		value = apic_read(APIC_ISR + i*0x10);
+		for (j = 31; j >= 0; j--) {
+			if (value & (1<<j))
+				ack_APIC_irq();
+		}
+	}
+
+	/*
 	 * Now that we are all set up, enable the APIC
 	 */
 	value = apic_read(APIC_SPIV);
Index: linux/include/asm-x86_64/apicdef.h
===================================================================
--- linux.orig/include/asm-x86_64/apicdef.h
+++ linux/include/asm-x86_64/apicdef.h
@@ -39,6 +39,7 @@
 #define			APIC_SPIV_FOCUS_DISABLED	(1<<9)
 #define			APIC_SPIV_APIC_ENABLED		(1<<8)
 #define		APIC_ISR	0x100
+#define		APIC_ISR_NR	0x8	/* Number of 32 bit ISR registers. */
 #define		APIC_TMR	0x180
 #define 	APIC_IRR	0x200
 #define 	APIC_ESR	0x280
-
: send the line "unsubscribe linux-x86_64" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux ia64]     [Linux Kernel]     [DCCP]     [Linux ARM]     [Yosemite News]     [Linux SCSI]     [Linux Hams]
  Powered by Linux