On 1/7/22 21:34, Bjorn Helgaas wrote:
On Fri, Jan 07, 2022 at 11:04:58AM +0100, Pali Rohár wrote:
Hello! You asked me in another email for comments to this email, so I'm
replying directly to this email...
On Tuesday 04 January 2022 10:02:18 Stefan Roese wrote:
Hi,
I'm trying to get the Kernel PCIe AER infrastructure to work on my
ZynqMP based system. E.g. handle the events (correctable, uncorrectable
etc). In my current tests, no AER interrupt is generated though. I'm
currently using the "surprise down error status" in the uncorrectable
error status register of the connected PCIe switch (PLX / Broadcom
PEX8718). Here the bit is correctly logged in the PEX switch
uncorrectable error status register but no interrupt is generated
to the root-port / system. And hence no AER message(s) reported.
I think the error should also be logged in the Root Port AER
Capability. And of course the interrupt enable bits in the Root Error
Command register would have to be set.
I'm seeing no change at all in the Root Port PCIe device after the
surprise down on one of the PCIe switch downstream ports via
"lspci -vvv".
Does any one of you have some ideas on what might be missing? Why are
these events not reported to the PCIe rootport driver via IRQ? Might
this be a problem of the missing MSI-X support of the ZynqMP? The AER
interrupt is connected as legacy IRQ:
cat /proc/interrupts | grep -i aer
58: 0 0 0 0 nwl_pcie:legacy 0 Level
PCIe PME, aerdrv
I guess this means whatever INTx the Root Port is using is connected
to IRQ 58? Can you tell whether that INTx works if a device below the
Root Port uses it? Or whether it is asserted for PMEs?
INTx works just fine for "normal" legacy interrupts, e.g. a PCIe
driver requesting a non-MSI interrupt.
Error events (correctable, non-fatal and fatal) are reported by PCIe
devices to the Root Complex via PCIe error messages (Message code of TLP
is set to Error Message) and not via interrupts. Root Port is then
responsible to "convert" these PCIe error messages to MSI(X) interrupt
and report it to the system. According to PCIe spec, AER is supported
only via MSI(X) interrupts, not legacy INTx.
Where does it say that? PCIe r5.0, sec 6.2.4.1.2 and 6.2.6, both
mention INTx, and the diagram in 6.2.6 even shows possible
platform-specific System Error signaling.
But I doubt Linux is smart enough to configure this correctly for
INTx. You could experiment by setting the AER control bits with
setpci.
There was some previous discussion, and it even mentions ZynqMP as a
device that has a dedicated non-MSI mechanism for AER signaling:
https://lore.kernel.org/linux-pci/1533141889-19962-1-git-send-email-bharat.kumar.gogada@xxxxxxxxxx/
https://lore.kernel.org/all/1464242406-20203-1-git-send-email-po.liu@xxxxxxx/T/#u
But I don't think it went anywhere.
It seems like maybe this *could* be made to work.
Thanks Bjorn for the reference. As already mentioned to Pali in the
other mail, Bharat from Xilinx has sent me a link to a newer, updated
patch series to use this "misc" interrupts for AER in the meantime:
https://lore.kernel.org/lkml/1542206878-24587-1-git-send-email-bharat.kumar.gogada@xxxxxxxxxx/
AFAICT, this patch series was not really reviewed. At least I can't find
any comments / replies.
I now applied this series (after some merge issues) to v5.16 and re-
tested with this new MISC interrupts for AER. Still no cigar. No
interrupt / AER event upon surprise down on the PEX switch received.
I might have missed something in the setup / configuration though. Is
my understanding correct, that I don't need to "manually" tune the SERR
in the Command register? And is my understanding correct that '0' / '-'
in the AER mask register enables this AER event?
Thanks,
Stefan