On Tue, Feb 18, 2025 at 11:48:58AM +0800, Feng Tang wrote: > There was problem reported by firmware developers that they received > 2 pcie link control commands in very short intervals on an ARM server, > which doesn't comply with pcie spec, and broke their state machine and > work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs > to wait at least 1 second for the command-complete event, before > resending the cmd or sending a new cmd. s/link control/hotplug/ (also below) s/2/two/ s/pcie/PCIe/ (also below) > And the first link control command firmware received is from > get_port_device_capability(), which sends cmd to disable pcie hotplug > interrupts without waiting for its completion. > > Fix it by adding the necessary wait to comply with PCIe spec, referring > pcie_poll_cmd(). > > Also make the interrupt disabling not dependent on whether pciehp > service driver will be loaded as suggested by Lukas. This sounds like maybe it should be two separate patches. > Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization") > Originally-by: Liguang Zhang <zhangliguang@xxxxxxxxxxxxxxxxx> > Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxxxxxxxxxx> > --- > Changlog: > > since v1: > * Add the Originally-by for Liguang. The issue was found on a 5.10 kernel, > then 6.6. I was initially given a 5.10 kernel tar bar without git info to > debug the issue, and made the patch. Thanks to Guanghui who recently pointed > me to tree https://gitee.com/anolis/cloud-kernel which show the wait logic > in 5.10 was originally from Liguang, and never hit mainline. > * Make the irq disabling not dependent on wthether pciehp service driver > will be loaded (Lukas Wunner) > * Use read_poll_timeout() API to simply the waiting logic (Sathyanarayanan > Kuppuswamy) > * Add logic to skip irq disabling if it is already disabled. > > drivers/pci/pci.h | 2 ++ > drivers/pci/pcie/portdrv.c | 44 +++++++++++++++++++++++++++++++++----- > 2 files changed, 41 insertions(+), 5 deletions(-) > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 01e51db8d285..c1e234d1b81d 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -759,12 +759,14 @@ static inline void pcie_ecrc_get_policy(char *str) { } > #ifdef CONFIG_PCIEPORTBUS > void pcie_reset_lbms_count(struct pci_dev *port); > int pcie_lbms_count(struct pci_dev *port, unsigned long *val); > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev); > #else > static inline void pcie_reset_lbms_count(struct pci_dev *port) {} > static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val) > { > return -EOPNOTSUPP; > } > +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {} > #endif > > struct pci_dev_reset_methods { > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index 02e73099bad0..2470333bba2f 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -18,6 +18,7 @@ > #include <linux/string.h> > #include <linux/slab.h> > #include <linux/aer.h> > +#include <linux/iopoll.h> > > #include "../pci.h" > #include "portdrv.h" > @@ -205,6 +206,40 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask) > return 0; > } > > +static int pcie_wait_sltctl_cmd_raw(struct pci_dev *dev) > +{ > + u16 slot_status = 0; > + int ret, ret1, timeout_us; > + > + /* 1 second, according to PCIe spec 6.1, section 6.7.3.2 */ > + timeout_us = 1000000; > + ret = read_poll_timeout(pcie_capability_read_word, ret1, > + (slot_status & PCI_EXP_SLTSTA_CC), 10000, > + timeout_us, true, dev, PCI_EXP_SLTSTA, > + &slot_status); > + if (!ret) > + pcie_capability_write_word(dev, PCI_EXP_SLTSTA, > + PCI_EXP_SLTSTA_CC); > + > + return ret; Ugh. I really don't like the way this basically duplicates pcie_poll_cmd(). I don't have a great suggestion to fix it; maybe we need a way to build part of pciehp unconditionally. At the very least we need a comment here pointing to pcie_poll_cmd(). And IIUC this will add a one second delay for ports that don't need command completed events. I don't think that's fair to those ports. > +} > + > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev) > +{ > + u16 slot_ctrl = 0; > + > + pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl); > + /* Bail out early if it is already disabled */ > + if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE))) > + return; > + > + pcie_capability_clear_word(dev, PCI_EXP_SLTCTL, > + PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE); > + > + if (pcie_wait_sltctl_cmd_raw(dev)) > + pci_info(dev, "Timeout on disabling PCIE hot-plug interrupt\n"); s/PCIE/PCIe/ > +} > + > /** > * get_port_device_capability - discover capabilities of a PCI Express port > * @dev: PCI Express port to examine > @@ -222,16 +257,15 @@ static int get_port_device_capability(struct pci_dev *dev) > > if (dev->is_hotplug_bridge && > (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || > - pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM) && > - (pcie_ports_native || host->native_pcie_hotplug)) { > - services |= PCIE_PORT_SERVICE_HP; > + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) { > + if (pcie_ports_native || host->native_pcie_hotplug) > + services |= PCIE_PORT_SERVICE_HP; > > /* > * Disable hot-plug interrupts in case they have been enabled > * by the BIOS and the hot-plug service driver is not loaded. > */ > - pcie_capability_clear_word(dev, PCI_EXP_SLTCTL, > - PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE); > + pcie_disable_hp_interrupts_early(dev); > } > > #ifdef CONFIG_PCIEAER > -- > 2.43.5 >