On Mon, 24 Feb 2025, Feng Tang wrote: > There was problem reported by firmware developers that they received > two PCIe hotplug commands in very short intervals on an ARM server, > which doesn't comply with PCIe spec, and broke their state machine and > work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs > to wait at least 1 second for the command-complete event, before > resending the command or sending a new command. > > In the failure case, the first PCIe hotplug command firmware received > is from get_port_device_capability(), which sends command to disable > PCIe hotplug interrupts without waiting for its completion, and the > second command comes from pcie_enable_notification() of pciehp driver, > which enables hotplug interrupts again. > > Fix it by adding the necessary wait to comply with PCIe spec. > > Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization") > Originally-by: Liguang Zhang <zhangliguang@xxxxxxxxxxxxxxxxx> > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxxxxxxxxxx> > --- > drivers/pci/pci.h | 2 ++ > drivers/pci/pcie/portdrv.c | 19 +++++++++++++++++-- > 2 files changed, 19 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 4c94a589de4a..a1138ebc2689 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -760,6 +760,7 @@ static inline void pcie_ecrc_get_policy(char *str) { } > void pcie_reset_lbms_count(struct pci_dev *port); > int pcie_lbms_count(struct pci_dev *port, unsigned long *val); > int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms); > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev); > #else > static inline void pcie_reset_lbms_count(struct pci_dev *port) {} > static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val) > @@ -770,6 +771,7 @@ static inline int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms) > { > return 0; > } > +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {} > #endif > > struct pci_dev_reset_methods { > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index bb00ba45ee51..ca4f21dff486 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -230,6 +230,22 @@ int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms) > return ret; > } > > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev) > +{ > + u16 slot_ctrl = 0; Unnecessary initialization > + > + pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl); > + /* Bail out early if it is already disabled */ > + if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE))) > + return; > + > + pcie_capability_clear_word(dev, PCI_EXP_SLTCTL, > + PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE); Align to (. You might need to put the bits to own lines. > + > + if (pcie_poll_sltctl_cmd(dev, 1000)) > + pci_info(dev, "Timeout on disabling PCIe hot-plug interrupt\n"); > +} > + > /** > * get_port_device_capability - discover capabilities of a PCI Express port > * @dev: PCI Express port to examine > @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev) > * Disable hot-plug interrupts in case they have been enabled > * by the BIOS and the hot-plug service driver is not loaded. > */ > - pcie_capability_clear_word(dev, PCI_EXP_SLTCTL, > - PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE); > + pcie_disable_hp_interrupts_early(dev); Doesn't calling this here delay setup for all portdrv services, not just hotplug? And the delay can be relatively long. > } > > #ifdef CONFIG_PCIEAER > -- i.