Re: [PATCH v1 1/1] x86/apic/vector: Fix NULL pointer exception in irq_complete_move()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx writes:
> If an IRQ is scheduled using generic_handle_irq() function in a non IRQ
> path, the irq_regs per CPU variable will not be set. Hence calling
> irq_complete_move() function in this scenario leads to NULL pointer
> de-reference exception. One example for this issue is, triggering fake
> AER errors using PCIe aer_inject framework. So add addition check for

What?

This is completely broken to begin with. You are fixing the wrong
end. The broken commit is:

390e2db82480 ("PCI/AER: Abstract AER interrupt handling")

I have to admit that it was already broken before that commit because
calling just the interrupt handler w/o serialization is as wrong as it
gets, but then calling a random function just because it's accessible
and does not explode in the face is not much better.

> [   58.368269]  handle_edge_irq+0x7d/0x1e0
> [   58.368272]  generic_handle_irq+0x27/0x30
> [   58.368278]  aer_inject_write+0x53a/0x720
> [   58.368283]  __vfs_write+0x36/0x1b0
> [   58.368289]  ? common_file_perm+0x47/0x130
> [   58.368293]  ? security_file_permission+0x2e/0xf0
> [   58.368295]  vfs_write+0xa5/0x180
> [   58.368296]  ksys_write+0x52/0xc0
> [   58.368300]  do_syscall_64+0x48/0x120
> [   58.368307]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Calling generic_handle_irq() through a sysfs write is in the worst case
going to corrupt state and that NULL pointer dereference is just one
particular effect which made this bogosity visible.

Even if you "fixed" this particular case, invoking this when an
interrupt affinity change is scheduled will also wreckage state. In the
best case it will only trigger the already existing WARN_ON() in the MSI
code when the interrupt in question is MSI and the invocation happens on
the wrong CPU. But there are worse things which can happen.

We are neither going to paper over it by just silently preventing this
particular NULL pointer dereference nor are we going to sprinkle more
checks all over the place just to deal with this. The interrupt delivery
hardware trainwreck of x86 CPUs is fragile as hell and we have enough
horrible code already to deal with that. No need for self inflicted
horrors.

The proper fix for this is below as it prevents the abuse of this
interface.

This will not break the AER error injection as it has been broken
forever. It just makes sure that the brokeness is not propagating
through the core code.

The right thing to make AER injection work is to inject the interrupt
via the retrigger mechanism, which will send an IPI. There is no core
interface for this, but that's a solvable problem.

Thanks,

        tglx

8<-----------------
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 2c5676b0a6e7..d7c4a3b815a6 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -556,6 +556,7 @@ static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq,
 		irqd->chip_data = apicd;
 		irqd->hwirq = virq + i;
 		irqd_set_single_target(irqd);
+		irqd_set_handle_enforce_irqctx(irqd);
 		/*
 		 * Legacy vectors are already assigned when the IOAPIC
 		 * takes them over. They stay on the same vector. This is
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 3ed5a055b5f4..9315fbb87db3 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -211,6 +211,8 @@ struct irq_data {
  * IRQD_CAN_RESERVE		- Can use reservation mode
  * IRQD_MSI_NOMASK_QUIRK	- Non-maskable MSI quirk for affinity change
  *				  required
+ * IRQD_HANDLE_ENFORCE_IRQCTX	- Enforce that handle_irq_*() is only invoked
+ *				  from actual interrupt context.
  */
 enum {
 	IRQD_TRIGGER_MASK		= 0xf,
@@ -234,6 +236,7 @@ enum {
 	IRQD_DEFAULT_TRIGGER_SET	= (1 << 25),
 	IRQD_CAN_RESERVE		= (1 << 26),
 	IRQD_MSI_NOMASK_QUIRK		= (1 << 27),
+	IRQD_HANDLE_ENFORCE_IRQCTX	= (1 << 28),
 };
 
 #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -303,6 +306,16 @@ static inline bool irqd_is_single_target(struct irq_data *d)
 	return __irqd_to_state(d) & IRQD_SINGLE_TARGET;
 }
 
+static inline void irqd_set_handle_enforce_irqctx(struct irq_data *d)
+{
+	__irqd_to_state(d) |= IRQD_HANDLE_ENFORCE_IRQCTX;
+}
+
+static inline bool irqd_is_handle_enforce_irqctx(struct irq_data *d)
+{
+	return __irqd_to_state(d) & IRQD_HANDLE_ENFORCE_IRQCTX;
+}
+
 static inline bool irqd_is_wakeup_set(struct irq_data *d)
 {
 	return __irqd_to_state(d) & IRQD_WAKEUP_STATE;
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 3924fbe829d4..4561f971bc74 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -427,6 +427,10 @@ static inline struct cpumask *irq_desc_get_pending_mask(struct irq_desc *desc)
 {
 	return desc->pending_mask;
 }
+static inline bool handle_enforce_irqctx(struct irq_data *data)
+{
+	return irqd_is_handle_enforce_irqctx(data);
+}
 bool irq_fixup_move_pending(struct irq_desc *desc, bool force_clear);
 #else /* CONFIG_GENERIC_PENDING_IRQ */
 static inline bool irq_can_move_pcntxt(struct irq_data *data)
@@ -453,6 +457,10 @@ static inline bool irq_fixup_move_pending(struct irq_desc *desc, bool fclear)
 {
 	return false;
 }
+static inline bool handle_enforce_irqctx(struct irq_data *data)
+{
+	return false;
+}
 #endif /* !CONFIG_GENERIC_PENDING_IRQ */
 
 #if !defined(CONFIG_IRQ_DOMAIN) || !defined(CONFIG_IRQ_DOMAIN_HIERARCHY)
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 98a5f10d1900..b3e9a66dd079 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -638,9 +638,15 @@ void irq_init_desc(unsigned int irq)
 int generic_handle_irq(unsigned int irq)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
+	struct irq_data *data;
 
 	if (!desc)
 		return -EINVAL;
+
+	data = irq_desc_get_irq_data(desc);
+	if (WARN_ON_ONCE(!in_irq() && handle_enforce_irqctx(data)))
+		return -EPERM;
+
 	generic_handle_irq_desc(desc);
 	return 0;
 }





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux