On Mon, Mar 03, 2025 at 10:36:30AM +0800, Feng Tang wrote: > There was problem reported by firmware developers that they received > two PCIe hotplug commands in very short intervals on an ARM server, > which doesn't comply with PCIe spec, and broke their state machine and > work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs > to wait at least 1 second for the command-complete event, before > resending the command or sending a new command. > > In the failure case, the first PCIe hotplug command firmware received > is from get_port_device_capability(), which sends command to disable > PCIe hotplug interrupts without waiting for its completion, and the > second command comes from pcie_enable_notification() of pciehp driver, > which enables hotplug interrupts again. > > One solution is to add the necessary delay after the first command [1], > while Lukas proposed an optimization that if the pciehp driver will be > loaded soon and handle the interrupts, then the hotplug and the wait > are not needed and can be saved, for every root port. > > So fix it by only disabling the hotplug interrupts when pciehp driver > is not enabled. > > [1]. https://lore.kernel.org/lkml/20250224034500.23024-1-feng.tang@xxxxxxxxxxxxxxxxx/t/#u > > Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization") > Suggested-by: Lukas Wunner <lukas@xxxxxxxxx> > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxxxxxxxxxx> Reviewed-by: Lukas Wunner <lukas@xxxxxxxxx>