On Tue, Feb 04, 2025 at 10:14:10AM +0100, Lukas Wunner wrote: > On Tue, Feb 04, 2025 at 01:37:58PM +0800, Feng Tang wrote: > > There was a irq storm bug when testing "pci=nomsi" case, and the root > > cause is: 'nomsi' will disable MSI and let devices and root ports use > > legacy INTX inerrupt, and likely make several devices/ports share one > > interrupt. In the failure case, BIOS doesn't disable the PCIE hotplug > > interrupts, and actually asserts the command-complete interrupt. > > As MSI is disabled, ACPI initialization code will not enumerate root > > port's PCIE hotplug capability, and pciehp service driver wont' be > > enabled for the root port to handle that interrupt, later on when it is > > shared and enabled by other device driver like NVME or NIC, the "nobody > > care irq storm" happens. > > > > So disable the pcie hotplug CCIE/HPIE interrupt in early boot phase when > > MSI is not enbaled. > > So I think this issue should go away if disabling the interrupt > by portdrv is no longer conditional on > > (pcie_ports_native || host->native_pcie_hotplug) > > like I've just proposed here: > > https://lore.kernel.org/r/Z6HYuBDP6uvE1Sf4@xxxxxxxxx/ > > ... in which case this patch won't be necessary. Can you confirm that? Thanks for the suggestion! I will try to get the platform for test, and report back. As for the change, + if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE)) + pcie_capability_clear_word(dev, PCI_EXP_SLTCTL, + PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE); The CONFIG_HOTPLUG_PCI_PCIE is always enabled on our platform and many distros, I guess the check needs to be removed, which sees the 1 second waiting again, and need the waiting logic in 1/2 patch? Thanks, Feng > > You can split the change I've proposed into two patches if you like. > > Thanks, > > Lukas