On Fri, Aug 11, 2023 at 4:00 PM Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx> wrote: > > On Thu, Aug 10, 2023 at 6:51 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > > On Thu, Aug 10, 2023 at 04:17:21PM +0800, Kai-Heng Feng wrote: > > > On Thu, Aug 10, 2023 at 2:52 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > > On Fri, Jul 21, 2023 at 11:58:24AM +0800, Kai-Heng Feng wrote: > > > > > On Tue, Jul 18, 2023 at 7:17 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > > > > On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote: > > > > > > > PCIe services that share an IRQ with PME, such as AER or DPC, > > > > > > > may cause a spurious wakeup on system suspend. To prevent this, > > > > > > > disable the AER interrupt notification during the system suspend > > > > > > > process. > > > > > > > > > > > > I see that in this particular BZ dmesg log, PME, AER, and DPC do share > > > > > > the same IRQ, but I don't think this is true in general. > > > > > > > > > > > > Root Ports usually use MSI or MSI-X. PME and hotplug events use the > > > > > > Interrupt Message Number in the PCIe Capability, but AER uses the one > > > > > > in the AER Root Error Status register, and DPC uses the one in the DPC > > > > > > Capability register. Those potentially correspond to three distinct > > > > > > MSI/MSI-X vectors. > > > > > > > > > > > > I think this probably has nothing to do with the IRQ being *shared*, > > > > > > but just that putting the downstream component into D3cold, where the > > > > > > link state is L3, may cause the upstream component to log and signal a > > > > > > link-related error as the link goes completely down. > > > > > > > > > > That's quite likely a better explanation than my wording. > > > > > Assuming AER IRQ and PME IRQ are not shared, does system get woken up > > > > > by AER IRQ? > > > > > > > > Rafael could answer this better than I can, but > > > > Documentation/power/suspend-and-interrupts.rst says device interrupts > > > > are generally disabled during suspend after the "late" phase of > > > > suspending devices, i.e., > > > > > > > > dpm_suspend_noirq > > > > suspend_device_irqs <-- disable non-wakeup IRQs > > > > dpm_noirq_suspend_devices > > > > ... > > > > pci_pm_suspend_noirq # (I assume) > > > > pci_prepare_to_sleep > > > > > > > > I think the downstream component would be put in D3cold by > > > > pci_prepare_to_sleep(), so non-wakeup interrupts should be disabled by > > > > then. > > > > > > > > I assume PME would generally *not* be disabled since it's needed for > > > > wakeup, so I think any interrupt that shares the PME IRQ and occurs > > > > during suspend may cause a spurious wakeup. > > > > > > Yes, that's the case here. > > > > > > > If so, it's exactly as you said at the beginning: AER/DPC/etc sharing > > > > the PME IRQ may cause spurious wakeups, and we would have to disable > > > > those other interrupts at the source, e.g., by clearing > > > > PCI_ERR_ROOT_CMD_FATAL_EN etc (exactly as your series does). > > > > > > So is the series good to be merged now? > > > > If we merge as-is, won't we disable AER & DPC interrupts unnecessarily > > in the case where the link goes to D3hot? In that case, there's no > > reason to expect interrupts related to the link going down, but things > > like PTM messages still work, and they may cause errors that we should > > know about. > > Because the issue can be observed on D3hot as well [0]. > The root port device [0] is power managed by ACPI, so I wonder if it's > reasonable to disable AER & DPC for devices that power managed by > firmware? OK, I think the D3hot case is different to this one, so I'll work on next revision that only disable AER/DPC when power is really off. In additional to disabling interrupt, is it reasonable to disable AER and DPC service completely, so unwanted electric noise wont trigger a DPC reset? Kai-Heng > [0] https://bugzilla.kernel.org/show_bug.cgi?id=216295#c3 > > Kai-Heng > > > > > > > > > I don't think D0-D3hot should be relevant here because in all those > > > > > > states, the link should be active because the downstream config space > > > > > > remains accessible. So I'm not sure if it's possible, but I wonder if > > > > > > there's a more targeted place we could do this, e.g., in the path that > > > > > > puts downstream devices in D3cold. > > > > > > > > > > Let me try to work on this. > > > > > > > > > > Kai-Heng > > > > > > > > > > > > > > > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management", > > > > > > > TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2 > > > > > > > (D3cold with aux power) and L3 (D3cold) states. So disabling the AER > > > > > > > notification during suspend and re-enabling them during the resume process > > > > > > > should not affect the basic functionality. > > > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > > > > Reviewed-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> > > > > > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx> > > > > > > > --- > > > > > > > v6: > > > > > > > v5: > > > > > > > - Wording. > > > > > > > > > > > > > > v4: > > > > > > > v3: > > > > > > > - No change. > > > > > > > > > > > > > > v2: > > > > > > > - Only disable AER IRQ. > > > > > > > - No more check on PME IRQ#. > > > > > > > - Use helper. > > > > > > > > > > > > > > drivers/pci/pcie/aer.c | 22 ++++++++++++++++++++++ > > > > > > > 1 file changed, 22 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > > > > > > index 1420e1f27105..9c07fdbeb52d 100644 > > > > > > > --- a/drivers/pci/pcie/aer.c > > > > > > > +++ b/drivers/pci/pcie/aer.c > > > > > > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > > > > > > > return 0; > > > > > > > } > > > > > > > > > > > > > > +static int aer_suspend(struct pcie_device *dev) > > > > > > > +{ > > > > > > > + struct aer_rpc *rpc = get_service_data(dev); > > > > > > > + struct pci_dev *pdev = rpc->rpd; > > > > > > > + > > > > > > > + aer_disable_irq(pdev); > > > > > > > + > > > > > > > + return 0; > > > > > > > +} > > > > > > > + > > > > > > > +static int aer_resume(struct pcie_device *dev) > > > > > > > +{ > > > > > > > + struct aer_rpc *rpc = get_service_data(dev); > > > > > > > + struct pci_dev *pdev = rpc->rpd; > > > > > > > + > > > > > > > + aer_enable_irq(pdev); > > > > > > > + > > > > > > > + return 0; > > > > > > > +} > > > > > > > + > > > > > > > /** > > > > > > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > > > > > > * @dev: pointer to Root Port, RCEC, or RCiEP > > > > > > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > > > > > > > .service = PCIE_PORT_SERVICE_AER, > > > > > > > > > > > > > > .probe = aer_probe, > > > > > > > + .suspend = aer_suspend, > > > > > > > + .resume = aer_resume, > > > > > > > .remove = aer_remove, > > > > > > > }; > > > > > > > > > > > > > > -- > > > > > > > 2.34.1 > > > > > > >