On Fri, Dec 22, 2017 at 01:02:28PM +0000, Bharat Kumar Gogada wrote: > Bjorn wrote: >> In the PCI config access path, the *_pcie_valid_device() functions in >> the dwc, altera, rockchip, and xilinx drivers all check whether the >> link is up. >> >> I think this is racy because the link may go down after we check but >> before we perform the config access. >> >> What would blow up if we removed the *_pcie_link_up() checks? >> >> I'd like to either remove the checks or add comments about why the >> race is acceptable. If we've covered this before, I apologize. >> Adding a comment will keep me from pestering you about this again in >> the future. > In both Xilinx driver cases when link is down, hardware responds by > AXI DECERR/SLVERR status which causes an exception, synchronous > external abort to CPU. This causes system to hang, so we need this > check for both of our drivers. We will add comments. This is a problem, and checking whether the link is up is a workaround but not a real solution. That means your system may hang if the link happens to go down at the wrong time. A real solution would be to handle the synchronous external abort so it doesn't cause a system hang. Yes, I agree that this is workaround. For pcie-xilinx.c for arm32, we can have fault handling similar to "imx6q_pcie_abort_handler" in drivers/pci/dwc/pci-imx6.c. Since this driver is same for Microblaze architecture also, it requires separate handling. For pcie-xilinx-nwl.c ARM64 as per link [1], linux kernel will hang for the above AXI responses. As of now arm64 RAS is still work in progress [2]. [1] https://www.spinics.net/lists/arm-kernel/msg624203.html [2] https://patchwork.kernel.org/patch/9973967/ The check can be removed, if above issues were addressed. Regards, Bharat