On Sat, Sep 30, 2017 at 10:12:06AM +0800, Qiang Zheng wrote: > PCIe PME and hot plug share same interrupt number. In some special case, > Link down event cause hot plug interrupt, devices is not disconnected, > But read config will return 0xff. > > In that case, PME work function will run and not return Because > Root Status PME bit always 1 and can not be cleared. > > This patch add Root Status check in PME interrupt handler, > Just do same as pciehp isr Slot status check. > > Signed-off-by: Qiang Zheng <zhengqiang10@xxxxxxxxxx> > --- > drivers/pci/pcie/pme.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c > index fafdb16..2ff2e57 100644 > --- a/drivers/pci/pcie/pme.c > +++ b/drivers/pci/pcie/pme.c > @@ -273,7 +273,7 @@ static irqreturn_t pcie_pme_irq(int irq, void *context) > spin_lock_irqsave(&data->lock, flags); > pcie_capability_read_dword(port, PCI_EXP_RTSTA, &rtsta); > > - if (!(rtsta & PCI_EXP_RTSTA_PME)) { > + if (rtsta == U32_MAX || !(rtsta & PCI_EXP_RTSTA_PME)) { > spin_unlock_irqrestore(&data->lock, flags); > return IRQ_NONE; > } > I applied the patch below to pci/misc for v4.15. I think we need a similar test in pcie_pme_work_fn() itself. Without it, I think there's a race: if we get a legitimate PME, enter pcie_pme_work_fn(), and the link goes down, we could still get 0xffffffff the next time we read PCI_EXP_RTSTA. I also used "(u32) ~0" instead of U32_MAX because that's the style used elsewhere. It might be clunkier than necessary, but U32_MAX suggests a limit, and that's not really the concept here. I didn't add your reviewed-by, Rafael, since I made non-trivial changes to the patch. Bjorn commit 303029d6d55ba10a37c82c31a89eb5550cde77ec Author: Qiang <zhengqiang10@xxxxxxxxxx> Date: Thu Sep 28 11:54:34 2017 +0800 PCI/PME: Handle invalid data when reading Root Status PCIe PME and native hotplug share the same interrupt number, so hotplug interrupts are also processed by PME. In some cases, e.g., a Link Down interrupt, a device may be present but unreachable, so when we try to read its Root Status register, the read fails and we get all ones data (0xffffffff). Previously, we interpreted that data as PCI_EXP_RTSTA_PME being set, i.e., "some device has asserted PME," so we scheduled pcie_pme_work_fn(). This caused an infinite loop because pcie_pme_work_fn() tried to handle PME requests until PCI_EXP_RTSTA_PME is cleared, but with the link down, PCI_EXP_RTSTA_PME can't be cleared. Check for the invalid 0xffffffff data everywhere we read the Root Status register. 1469d17dd341 ("PCI: pciehp: Handle invalid data when reading from non-existent devices") added similar checks in the hotplug driver. Signed-off-by: Qiang Zheng <zhengqiang10@xxxxxxxxxx> [bhelgaas: changelog, also check in pcie_pme_work_fn(), use "~0" to follow other similar checks] Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c index fafdb165dd2e..df290aa58dce 100644 --- a/drivers/pci/pcie/pme.c +++ b/drivers/pci/pcie/pme.c @@ -226,6 +226,9 @@ static void pcie_pme_work_fn(struct work_struct *work) break; pcie_capability_read_dword(port, PCI_EXP_RTSTA, &rtsta); + if (rtsta == (u32) ~0) + break; + if (rtsta & PCI_EXP_RTSTA_PME) { /* * Clear PME status of the port. If there are other @@ -273,7 +276,7 @@ static irqreturn_t pcie_pme_irq(int irq, void *context) spin_lock_irqsave(&data->lock, flags); pcie_capability_read_dword(port, PCI_EXP_RTSTA, &rtsta); - if (!(rtsta & PCI_EXP_RTSTA_PME)) { + if (rtsta == (u32) ~0 || !(rtsta & PCI_EXP_RTSTA_PME)) { spin_unlock_irqrestore(&data->lock, flags); return IRQ_NONE; }