pcie-xilinx-nwl: Uncorrectable errors upon PCIe surprise removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

on our ZynqMP platform we are seeing uncorrectable errors when we try
to access the BAR of a PCIe device (NVMe drive) which was removed
(surprise removal):

[  255.743801] nwl-pcie fd0e0000.pcie: Slave error
[  255.745210] nwl-pcie fd0e0000.pcie: Non-Fatal Error in AER Capability
[  255.750714] nwl-pcie fd0e0000.pcie: Non-Fatal Error Detected
[  255.752523] nwl-pcie fd0e0000.pcie: Non-Fatal Error in AER Capability
[  255.753840] nwl-pcie fd0e0000.pcie: Non-Fatal Error Detected
[  255.755174] nwl-pcie fd0e0000.pcie: Non-Fatal Error in AER Capability
[  255.756706] nwl-pcie fd0e0000.pcie: Non-Fatal Error Detected
[  255.758168] nwl-pcie fd0e0000.pcie: Non-Fatal Error in AER Capability
...

Sometimes even accompanied (started) by a Kernel crash:

Internal error: synchronous external abort: 96000210 [#1] SMP

It seems that the "Slave error" (bit 4) can be cleared in
nwl_pcie_misc_handler() but both other "Non-Fatal" errors not.

I'm wondering now, if this situation can be resolved somehow, so that
the system "survives" such surprise removals without a crash. What we
really would like to see is, that reading from the unavailable PCI space
(BAR area) returns 0xffffffff as common for PCI.

So is this a known issue that accesses to BAR ranges of removed PCIe
devices result in such errors? If yes, why is this the case? Is there
perhaps a way to fully clear the error condition?

Thanks,
Stefan



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux