On Thu, Feb 06, 2025 at 07:21:47AM +0100, Lukas Wunner wrote: > [to += Rafael, start of thread is here: > https://lore.kernel.org/all/Z6HcoUB3i51bzQDs@xxxxxxxxx/ > ] > > Hi Rafael, > > On Wed, Feb 05, 2025 at 11:58:04AM +0800, Feng Tang wrote: > > On Tue, Feb 04, 2025 at 10:23:45AM +0100, Lukas Wunner wrote: > > > On Tue, Feb 04, 2025 at 01:37:58PM +0800, Feng Tang wrote: > > > > There was a irq storm bug when testing "pci=nomsi" case, and the root > > > > cause is: 'nomsi' will disable MSI and let devices and root ports use > > > > legacy INTX inerrupt, and likely make several devices/ports share one > > > > interrupt. In the failure case, BIOS doesn't disable the PCIE hotplug > > > > interrupts, and actually asserts the command-complete interrupt. > > > > As MSI is disabled, ACPI initialization code will not enumerate root > > > > port's PCIE hotplug capability, and pciehp service driver wont' be > > > > enabled for the root port to handle that interrupt, later on when it is > > > > shared and enabled by other device driver like NVME or NIC, the "nobody > > > > care irq storm" happens. > > > > > > Is there a section in the PCI Firmware Spec which says ACPI doesn't > > > enumerate the hotplug capability if MSI is disabled? > > > > No, I didn't get it from spec, but found the logic by code reading > > during debugging the irq storm issue. The related code is about: > > > > #define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \ > > | OSC_PCI_ASPM_SUPPORT \ > > | OSC_PCI_CLOCK_PM_SUPPORT \ > > | OSC_PCI_MSI_SUPPORT) > > Commit 415e12b23792 ("PCI/ACPI: Request _OSC control once for each root > bridge (v3)") contains a change which doesn't seem to be explained in > the commit message: > > If the user passes "pci=nomsi" on the command line, Linux doesn't > request hotplug control (or any other control) from the platform. > So ACPI always remains responsible for hotplug in the "pci=nomsi" > case. > > The commit sought to fix a cpu hog issue: > > https://bugzilla.kernel.org/show_bug.cgi?id=29722 > > It's unclear to me if that bug was fixed by requesting _OSC only once, > as the commit message suggests, or if the addition of OSC_MSI_SUPPORT > to ACPI_PCIE_REQ_SUPPORT fixed the issue. > > Since the latter is not mentioned in the commit message, > it seems plausible to assume that the OSC_MSI_SUPPORT change > was unintentional. > > In any case it doesn't seem to make sense to not request any > control in the "pci=nomsi" case. > > It's also worth noting that the behavior is different on > Apple machines as they use a fixed _OSC set even for "pci=nomsi". > > I'm wondering if OSC_PCI_MSI_SUPPORT should simply be removed > from ACPI_PCIE_REQ_SUPPORT, but I'm worried that it may cause > reappearance of the cpu hog issue. Hi Lukas, I tried to remove OSC_PCI_MSI_SUPPORT from ACPI_PCIE_REQ_SUPPORT, but after negotiate_os_control(), the 'PCIeHotplug' control is still disabled in the control capability after ACPI query_osc, run_osc routines (I haven't figured out why yet), thus the pciehp severvice driver can't be loader. Thanks, Feng > Thoughts? > > Thanks, > > Lukas