On Tue, Oct 31, 2023 at 12:26:31PM +0000, Vidya Sagar wrote: > Hi folks, > > I would like to know your comments on the following scenario where > we are observing the root port logging errors because of the > enumeration flow being followed. > > DUT information: > - Has a root port and an endpoint connected to it > - Uses ECAM mechanism to access the configuration space > - Booted through ACPI flow > - Has a Firmware-First approach for handling the errors > - System is configured to treat Unsupported Requests as > AdvisoryNon-Fatal errors > > As we all know, when a configuration read request comes in for a > device number that is not implemented, a UR would be returned as per > the PCIe spec. > > As part of the enumeration flow on DUT, when the kernel reads offset > 0x0 of B:D:F=0:0:0, the root port responds with its valid Vendor-ID > and Device-ID values. But, when B:D:F=0:1:0 is probed, since there > is no device present there, the root port responds with an > Unsupported Request and simultaneously logs the same in the Device > Status register (i.e. bit-3). Because of it, there is a UR logged > in the Device Status register of the RP by the time enumeration is > complete. > > In the case of AER capability natively owned by the kernel, the AER > driver's init call would clear all such pending bits. > > Since we are going with the Firmware-First approach, and the system > is configured to treat Unsupported Requests as AdvisoryNon-Fatal > errors, only a correctable error interrupt can be raised to the > Firmware which takes care of clearing the corresponding status > registers. The firmware can't know about the UnsupReq bit being set > as the interrupt it received is for a correctable error hence it > clears only bits related to correctable error. > > All these events leave a freshly booted system with the following > bits set. > > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- (MAbort) > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- (UnsupReq) > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- (UnsupReq) > > Since the reason for UR is well understood at this point, I would > like to weigh in on the idea of clearing the aforementioned bits in > the root port once the enumeration is done particularly to cater to > the configurations where Firmware-First approach is in place. > Please let me know your comments on this approach. I think Secondary status (PCI_SEC_STATUS) is always owned by the OS and is not affected by _OSC negotiation, right? Linux does basically nothing with that today, but I think it *could* clear the "Received Master Abort" bit. I'm not very familiar with Advisory Non-Fatal errors. I'm curious about the UESta situation: why can't firmware know about UnsupReq being set? I assume PCI_ERR_COR_ADV_NFAT is the Correctable Error Status bit the firmware *does* see and clear. But isn't the whole point of Advisory Non-Fatal errors that an error that is logged as an Uncorrectable Error and that normally would be signaled with ERR_NONFATAL is signaled with ERR_COR instead? So doesn't PCI_ERR_COR_ADV_NFAT being set imply that some PCI_ERR_UNCOR_STATUS must be set as well? If so, I would think firmware *could* figure that out and clear the PCI_ERR_UNCOR_STATUS bit. Bjorn