Hi Mika, On Fri, Jan 21, 2022 at 6:55 PM Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> wrote: > > Hi Kai-Heng, > > On Fri, Jan 07, 2022 at 12:09:57PM +0800, Kai-Heng Feng wrote: > > Only from root ports of thunderbolt devices. > > > > The error occurs as soon as the root port is runtime suspended to D3cold. > > > > Runtime suspend the AER service can resolve the issue. I wonder if > > it's the right thing to do here? > > I think you are right here. It seems that AER "service driver" is > completely missing PM hooks. Probably because it is more used in server > type of systems where power management is not priority. Here is my previous attempt to suspend AER: https://lore.kernel.org/linux-pci/20210127173101.446940-1-kai.heng.feng@xxxxxxxxxxxxx/ > > > D3cold should also mean the PCI link is gone, disabling AER seems to > > be a reasonable approach. > > Indeed - I think AER might trigger here because the link does "down" / > low power state if left enabled while the root port enters D3. Something > like below hack should disable it over low power transitions: Ubuntu kernel has been carrying the patch for quite some time: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/unstable/commit/?id=e82f15f1a26273b004054a81ef45937fb1b632e5 > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 9fa1f97e5b27..64138cf82db8 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1432,6 +1432,22 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) > return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED; > } > > +static int aer_suspend(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + > + aer_disable_rootport(rpc); > + return 0; > +} > + > +static int aer_resume(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + > + aer_enable_rootport(rpc); > + return 0; > +} > + > static struct pcie_port_service_driver aerdriver = { > .name = "aer", > .port_type = PCIE_ANY_PORT, > @@ -1439,6 +1455,10 @@ static struct pcie_port_service_driver aerdriver = { > > .probe = aer_probe, > .remove = aer_remove, > + .suspend = aer_suspend, > + .resume = aer_resume, > + .runtime_suspend = aer_suspend, > + .runtime_resume = aer_resume, > }; This patch is exactly what I tested. Maybe only suspend/runtime_suspend AER when the target PM state is D3cold? PCIe spec doesn't say how to handle AER in Link L2/L3Ready/L3, but I think it's reasonable to suspend AER when power is loss. Let me come up with a patch with that idea. Kai-Heng > > /**