Hi Bjorn, the below patch is marked "Changes Requested" in patchwork: https://patchwork.kernel.org/project/linux-pci/patch/20211111054258.7309-1-zhangliguang@xxxxxxxxxxxxxxxxx/ I think that might be erroneous because the patch is correct, I've provided a Reviewed-by and no change requests are recorded in patchwork or the mailing list archive. If you've got a few minutes to spare, could you double-check the state in patchwork and provide Liguang Zhang with the changes you'd want (if any)? Thanks! Lukas On Fri, Nov 26, 2021 at 06:33:09PM +0100, Lukas Wunner wrote: > On Thu, Nov 11, 2021 at 01:42:58PM +0800, Liguang Zhang wrote: > > This patch fixes this problem that on driver probe from system startup, > > pciehp checks the Presence Detect State bit in the Slot Status register > > to bring up an occupied slot or bring down an unoccupied slot. If empty > > slot's power status is on, turn power off. The Hot-Plug interrupt isn't > > requested yet, so avoid triggering a notification by calling > > pcie_disable_notification(). > > > > Both the CCIE and HPIE bits are masked in pcie_disable_notification(), > > when we issue a hotplug command, pcie_wait_cmd() will polling the > > Command Completed bit instead of waiting for an interrupt. But cmd_busy > > bit was not cleared when Command Completed which results in timeouts > > like this in pciehp_power_off_slot() and pcie_init_notification(): > > > > pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0 > > (issued 2264 msec ago) > > pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0 > > (issued 2288 msec ago) > > > > Signed-off-by: Liguang Zhang <zhangliguang@xxxxxxxxxxxxxxxxx> > > Fixes: a5dd4b4b0570 ("PCI: pciehp: Wait for hotplug command completion where necessary") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215143 > Reviewed-by: Lukas Wunner <lukas@xxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx # v4.2+ > > Thanks a lot, that's a really good catch. > > It's a somewhat intricate bug, so I'll try to explain in my own words: > > If notification is disabled (HPIE or CCIE not set in the Slot Status > register), we rely on pcie_poll_cmd() to poll for Command Completed. > But once it's signaled, we neglect to clear ctrl->cmd_busy. > (Normally it is cleared by the hardirq handler pciehp_isr() if > notification is enabled.) > > The result is that starting with the second Slot Control write, > pciehp will gratuitously wait for a command to finish which has > already finished and it will incorrectly report a timeout. > > The bug was originally introduced in 2015 by commit a5dd4b4b0570 > ("PCI: pciehp: Wait for hotplug command completion where necessary"), > but didn't manifest itself because the first Slot Control Write already > enabled notification and from that point on the hardirq handler would > clear ctrl->cmd_busy. However I think the bug may have manifested > itself with pciehp_poll_mode=1. > > It wasn't until commit 4e6a13356f1c ("PCI: pciehp: Deduplicate presence > check on probe & resume") that multiple consecutive Slot Control writes > were performed on ->probe with notification disabled, so that's the > commit which first exposed the bug with pciehp_poll_mode=0. > > Thanks, > > Lukas > > > --- > > drivers/pci/hotplug/pciehp_hpc.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > > index 83a0fa119cae..8698aefc6041 100644 > > --- a/drivers/pci/hotplug/pciehp_hpc.c > > +++ b/drivers/pci/hotplug/pciehp_hpc.c > > @@ -98,6 +98,8 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout) > > if (slot_status & PCI_EXP_SLTSTA_CC) { > > pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, > > PCI_EXP_SLTSTA_CC); > > + ctrl->cmd_busy = 0; > > + smp_mb(); > > return 1; > > } > > msleep(10); > > -- > > 2.19.1.6.gb485710b