Hello! On Tuesday 05 October 2021 10:57:18 Jeremy Linton wrote: > Hi, > > On 10/5/21 10:32 AM, Bjorn Helgaas wrote: > > On Thu, Aug 26, 2021 at 02:15:55AM -0500, Jeremy Linton wrote: > > > Additionally, some basic bus/device filtering exist to avoid sending > > > config transactions to invalid devices on the RP's primary or > > > secondary bus. A basic link check is also made to assure that > > > something is operational on the secondary side before probing the > > > remainder of the config space. If either of these constraints are > > > violated and a config operation is lost in the ether because an EP > > > doesn't respond an unrecoverable SERROR is raised. > > > > It's not "lost"; I assume the root port raises an error because it > > can't send a transaction over a link that is down. > > The problem is AFAIK because the root port doesn't do that. Interesting! Does it mean that PCIe Root Complex / Host Bridge (which I guess contains also logic for Root Port) does not signal transaction failure for config requests? Or it is just your opinion? Because I'm dealing with similar issues and I'm trying to find a way how to detect if some PCIe IP signal transaction error via AXI SLVERR response OR it just does not send any response back. So if you know some way how to check which one it is, I would like to know it too. > > > > Is "SERROR" an ARM64 thing? My guess is the root port would raise an > > Unsupported Request error or similar, and the root complex turns that > > into a system-specific SERROR? Yes, SError is arm64 specific. It is asynchronous CPU interrupt and syndrome code then contains what happened. > AFAIK, what is happening here the CPU core has an outstanding R/W request > for which it never receives a response from the root port. So basically its > an interconnect protocol violation that the CPU is complaining about rather > than something PCIe specific. Could you describe (ideally in commit message) which SError is triggered? Normally if kernel receive SError interrupt it also puts into dmesg or oops message also syndrome code which describe what kind of error / event occurred. It could help also to other understand what is happening there.