On Thursday, January 28, 2016 02:57:36 PM Bjorn Helgaas wrote: > Hi Chen, > > Thanks a lot for persevering and working this all out! > > On Thu, Jan 28, 2016 at 09:35:46AM +0800, Chen Fan wrote: > > In our X86 environment, when enable Secure boot, we found an abnormal > > phenomenon as following call trace shows. after investigation, we > > found the firmware assigned an irq number 255 which means unknown > > or no connection in PCI local spec for i801_smbus, meanwhile the > > ACPI didn't configure the pci irq routing. and the 255 irq number > > was assigned for megasa msix without IRQF_SHARED. then in this case > > during i801_smbus probe, the i801_smbus driver would request irq with > > bad irq number 255. but the 255 irq number was assigned for memgasa > > with MSIX enable. which will cause request_irq fails and result in > > the call trace below, here we introduce an IRQ_NOTCONNECTED to identify > > the device interrupt is not connected. > > > > i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143) > > i801_smbus 0000:00:1f.3: can't derive routing for PCI INT C > > i801_smbus 0000:00:1f.3: PCI INT C: no GSI > > genirq: Flags mismatch irq 255. 00000080 (i801_smbus) vs. 00000000 (megasa) > > CPU: 0 PID: 2487 Comm: kworker/0:1 Not tainted 3.10.0-229.el7.x86_64 #1 > > Hardware name: FUJITSU PRIMEQUEST 2800E2/D3736, BIOS PRIMEQUEST 2000 Serie5 > > > > Call Trace: > > dump_stack+0x19/0x1b > > __setup_irq+0x54a/0x570 > > request_threaded_irq+0xcc/0x170 > > i801_probe+0x32f/0x508 [i2c_i801] > > local_pci_probe+0x45/0xa0 > > i801_smbus 0000:00:1f.3: Failed to allocate irq 255: -16 > > i801_smbus: probe of 0000:00:1f.3 failed with error -16 > > > > Signed-off-by: Chen Fan <chen.fan.fnst@xxxxxxxxxxxxxx> > > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > Cc: Bjorn Helgaas <helgaas@xxxxxxxxxx> > > Acked-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > Rafael, I assume you'll take this if you think it's ready. I can do that. > This is a subtle problem and, if I understand correctly, can manifest > intermittently depending on the machine configuration. For example, > if you got rid of the "megasa" driver, I suspect i801_smbus would not > complain, but it wouldn't work. > > I think we might want to consider doing something for non-x86 arches > as well, but we can do that later. I propose a changelog like the > following. Please correct anything I got wrong. I suspect we will be > revisiting this issue eventually, so I'd like to have a good > description. > > > x86/PCI: Recognize that Interrupt Line 255 means "not connected" > > Per the x86-specific footnote to PCI spec r3.0, sec 6.2.4, the value 255 in > the Interrupt Line register means "unknown" or "no connection." > Previously, when we couldn't derive an IRQ from the _PRT, we fell back to > using the value from Interrupt Line as an IRQ. It's questionable whether > we should do that at all, but the spec clearly suggests we shouldn't do it > for the value 255 on x86. > > Calling request_irq() with IRQ 255 may succeed, but the driver won't > receive any interrupts. Or, if IRQ 255 is shared with another device, it > may succeed, and the driver's ISR will be called at random times when the > *other* device interrupts. Or it may fail if another device is using IRQ > 255 with incompatible flags. What we *want* is for request_irq() to fail > predictably so the driver can fall back to polling. > > On x86, assume 255 in the Interrupt Line means the INTx line is not > connected. In that case, set dev->irq to IRQ_NOTCONNECTED so request_irq() > will fail gracefully with -ENOTCONN. > > We found this problem on a system where Secure Boot firmware assigned > Interrupt Line 255 to an i801_smbus device and another device was already > using MSI-X IRQ 255. This was in v3.10, where i801_probe() fails if > request_irq() fails: > > i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143) > i801_smbus 0000:00:1f.3: can't derive routing for PCI INT C > i801_smbus 0000:00:1f.3: PCI INT C: no GSI > genirq: Flags mismatch irq 255. 00000080 (i801_smbus) vs. 00000000 (megasa) > CPU: 0 PID: 2487 Comm: kworker/0:1 Not tainted 3.10.0-229.el7.x86_64 #1 > Hardware name: FUJITSU PRIMEQUEST 2800E2/D3736, BIOS PRIMEQUEST 2000 Serie5 > Call Trace: > dump_stack+0x19/0x1b > __setup_irq+0x54a/0x570 > request_threaded_irq+0xcc/0x170 > i801_probe+0x32f/0x508 [i2c_i801] > local_pci_probe+0x45/0xa0 > i801_smbus 0000:00:1f.3: Failed to allocate irq 255: -16 > i801_smbus: probe of 0000:00:1f.3 failed with error -16 > > After aeb8a3d16ae0 ("i2c: i801: Check if interrupts are disabled"), > i801_probe() will fall back to polling if request_irq() fails. But we > still need this patch because request_irq() may succeed or fail depending > on other devices in the system. If request_irq() fails, i801_smbus will > work by falling back to polling, but if it succeeds, i801_smbus won't work > because it expects interrupts that it may not receive. I like this. :-) Chen, can you please add the changelog as suggested by Bjorn to the patch and resend? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html