RE: Question about deadlock between AER and pceihp interrupts during resume from S3 with unplugged device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only]

>>>So with the patches applied, the link doesn't come up after resume, but if you then reset via sysfs, it >>>does come up, is that what you're saying?

Yes correct, if we reset via sysfs we are not seeing this, issue. I  attached lspci and dmesg logs with taking all three patches to Bugzilla.

We could confirm PCI_BRIDGE_CTL_BUS_RESET bit is set after resume, and once is PCI_BRIDGE_CTL_BUS_RESET set to 0 we are able to access the link.

Looks reset command doesn't complete properly due to some timing issues in pci_reset_secondary_bus , will comeback after analyzing more on this.

Best Regards,
Rahul


-----Original Message----- 
From: Lukas Wunner <lukas@xxxxxxxxx> 
Sent: Tuesday, February 15, 2022 12:32 PM
To: Kumar1, Rahul <Rahul.Kumar1@xxxxxxx>
Cc: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>; linux-pci@xxxxxxxxxxxxxxx; helgaas@xxxxxxxxxx; 
Antonovitch, Anatoli <Anatoli.Antonovitch@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: Re: Question about deadlock between AER and pceihp interrupts during resume from S3 with unplugged device

On Fri, Feb 11, 2022 at 02:42:21PM +0000, Kumar1, Rahul wrote:
> We can some changes we can see in lspci from working to non-working 
> case. Below are changes Link Speed =  8GT/s  -> 2.5GT/s.
> DLActive+   ->     DLActive-
> BWMgmt+   -> BWMgmt+
> PresDet+ -> PresDet+
> EqualizationComplete+ -> EqualizationComplete+
> 
> Also when we do reset via sysfs, we don't see this issue.
> 
> I have created bug here 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D215590&amp;data=04%7C01%7CRahul.
> Kumar1%40amd.com%7C6064d47163b545798e3508d9f051227c%7C3dd8961fe4884e60
> 8e11a82d994e183d%7C0%7C0%7C637805054005384810%7CUnknown%7CTWFpbGZsb3d8
> eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
> 000&amp;sdata=w4WTYpduf4brVLx14ADw7yh511Vjf5v5rVtXWjxU7AI%3D&amp;reser
> ved=0

So with the patches applied, the link doesn't come up after resume, but if you then reset via sysfs, it does come up, is that what you're saying?

The dmesg excerpt Andrey posted shows an AER splat after resume (even with the patches applied):

[   69.684921] pcieport 0000:00:01.1: AER: Root Port link has been reset
[   69.691438] pcieport 0000:00:01.1: AER: Device recovery failed
[   69.697327] pcieport 0000:00:01.1: AER: Multiple Uncorrected (Fatal) error received: 0000:00:01.0
[   69.706231] pcieport 0000:00:01.1: AER: can't find device of ID0008

I suspect the Root Port refuses to train the link due to that fatal error.  Perhaps Kai-Heng Feng's patch is incomplete and it needs to clear stale AER errors?  Or maybe it re-enables AER too early?

Could you attach lspci -vv output before/after suspend to the bugzilla?
And also attach full dmesg output with the patches applied?

Thanks,

Lukas




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux