On Tue, Nov 02, 2021 at 03:45:00AM +0000, Bao, Joseph wrote: > Recently we encounter system hang (dead spinlock) when move to kernel > linux-5.4.y. > > Finally, we use bisect to locate the suspicious commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=4667358dab9cc07da044d5bc087065545b1000df. > > Our system has some HW defect, which will wrongly set PCI_EXP_SLTSTA_PFD > high, and this commit will lead to infinite loop jumping to read_status > (no chance to clear status PCI_EXP_SLTSTA_PFD bit since ctrl is not > updated), I know this is our HW defect, but this commit makes kernel > trapped in this isr function and leads to kernel hang (then the user > could not get useful information to show what's wrong), which I think > is not expected behavior, so I would like to report to you for discussion. Thanks a lot for the report and apologies for the breakage and the delay. Below please find a tentative fix. Could you test whether it fixes the issue? I don't think this is a hardware defect. If I'm reading the spec right (PCIe r5.0, sec. 6.7.1.8), the PFD bit is meant to remain set and cannot be cleared until the kernel disables slot power. When a power fault happens, we currently only change the LEDs (Power Indicator Off, Attention Indicator On) and emit a log message. We otherwise leave the slot as is, even though I'd assume that the PCI device in the slot is no longer accessible. I'm wondering whether we should interpret a power fault as surprise removal. Alternatively, we could attempt recovery, i.e. turn slot power off and back on. Similar to what we're doing when an Uncorrectable Error occurs. Do you have an opinion on that? What would be the desired behavior for your users? Thanks, Lukas -- >8 -- Subject: [PATCH] PCI: pciehp: Fix infinite loop in IRQ handler upon power fault The Power Fault Detected bit in the Slot Status register differs from all other hotplug events in that it is sticky: It can only be cleared after turning off slot power. Per PCIe r5.0, sec. 6.7.1.8: If a power controller detects a main power fault on the hot-plug slot, it must automatically set its internal main power fault latch [...]. The main power fault latch is cleared when software turns off power to the hot-plug slot. The stickiness used to cause interrupt storms and infinite loops which were fixed in 2009 by commits 5651c48cfafe ("PCI pciehp: fix power fault interrupt storm problem") and 99f0169c17f3 ("PCI: pciehp: enable software notification on empty slots"). Unfortunately in 2020 the infinite loop issue was inadvertently reintroduced by commit 8edf5332c393 ("PCI: pciehp: Fix MSI interrupt race"): The hardirq handler pciehp_isr() clears the PFD bit until pciehp's power_fault_detected flag is set. That happens in the IRQ thread pciehp_ist(), which never learns of the event because the hardirq handler is stuck in an infinite loop. Fix by setting the power_fault_detected flag already in the hardirq handler. Fixes: 8edf5332c393 ("PCI: pciehp: Fix MSI interrupt race") Link: https://bugzilla.kernel.org/show_bug.cgi?id=214989 Link: https://lore.kernel.org/linux-pci/DM8PR11MB5702255A6A92F735D90A4446868B9@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Reported-by: Joseph Bao <joseph.bao@xxxxxxxxx> Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx> Cc: stable@xxxxxxxxxxxxxxx # v4.19+ Cc: Stuart Hayes <stuart.w.hayes@xxxxxxxxx> --- drivers/pci/hotplug/pciehp_hpc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 6ac5ea5..fac6b8e 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -640,6 +640,8 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) */ if (ctrl->power_fault_detected) status &= ~PCI_EXP_SLTSTA_PFD; + else if (status & PCI_EXP_SLTSTA_PFD) + ctrl->power_fault_detected = true; events |= status; if (!events) { @@ -649,7 +651,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) } if (status) { - pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events); + pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, status); /* * In MSI mode, all event bits must be zero before the port @@ -723,8 +725,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) } /* Check Power Fault Detected */ - if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { - ctrl->power_fault_detected = 1; + if (events & PCI_EXP_SLTSTA_PFD) { ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(ctrl)); pciehp_set_indicators(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF, PCI_EXP_SLTCTL_ATTN_IND_ON); -- 2.33.0