Hi,
On 10/5/21 2:43 PM, Pali Rohár wrote:
Hello!
On Tuesday 05 October 2021 10:57:18 Jeremy Linton wrote:
Hi,
On 10/5/21 10:32 AM, Bjorn Helgaas wrote:
On Thu, Aug 26, 2021 at 02:15:55AM -0500, Jeremy Linton wrote:
Additionally, some basic bus/device filtering exist to avoid sending
config transactions to invalid devices on the RP's primary or
secondary bus. A basic link check is also made to assure that
something is operational on the secondary side before probing the
remainder of the config space. If either of these constraints are
violated and a config operation is lost in the ether because an EP
doesn't respond an unrecoverable SERROR is raised.
It's not "lost"; I assume the root port raises an error because it
can't send a transaction over a link that is down.
The problem is AFAIK because the root port doesn't do that.
Interesting! Does it mean that PCIe Root Complex / Host Bridge (which I
guess contains also logic for Root Port) does not signal transaction
failure for config requests? Or it is just your opinion? Because I'm
dealing with similar issues and I'm trying to find a way how to detect
if some PCIe IP signal transaction error via AXI SLVERR response OR it
just does not send any response back. So if you know some way how to
check which one it is, I would like to know it too.
This is my _opinion_ based on what I've heard of some other IP
integration issues, and what i've seen poking at this one from the
perspective of a SW guy rather than a HW guy. So, basically worthless.
But, you should consider that most of these cores/interconnects aren't
aware of PCIe completion semantics so its the root ports responsibility
to say, gracefully translate a non-posted write that doesn't have a
completion for the interconnects its attached to, rather than tripping
something generic like a SLVERR.
Anyway, for this I would poke around the pile of exception registers,
with your specific processors manual handy because a lot of them are
implementation defined.
Is "SERROR" an ARM64 thing? My guess is the root port would raise an
Unsupported Request error or similar, and the root complex turns that
into a system-specific SERROR?
Yes, SError is arm64 specific. It is asynchronous CPU interrupt and
syndrome code then contains what happened.
AFAIK, what is happening here the CPU core has an outstanding R/W request
for which it never receives a response from the root port. So basically its
an interconnect protocol violation that the CPU is complaining about rather
than something PCIe specific.
Could you describe (ideally in commit message) which SError is
triggered? Normally if kernel receive SError interrupt it also puts into
dmesg or oops message also syndrome code which describe what kind of
error / event occurred. It could help also to other understand what is
happening there.