On Fri, Dec 22, 2017 at 11:28:15AM -0600, Bjorn Helgaas wrote: > On Fri, Dec 22, 2017 at 01:02:28PM +0000, Bharat Kumar Gogada wrote: > > Bjorn wrote: > >> In the PCI config access path, the *_pcie_valid_device() functions > >> in the dwc, altera, rockchip, and xilinx drivers all check whether > >> the link is up. > >> > >> I think this is racy because the link may go down after we check but > >> before we perform the config access. > >> > >> What would blow up if we removed the *_pcie_link_up() checks? > >> > >> I'd like to either remove the checks or add comments about why the > >> race is acceptable. If we've covered this before, I apologize. > >> Adding a comment will keep me from pestering you about this again in > >> the future. > > > In both Xilinx driver cases when link is down, hardware responds by > > AXI DECERR/SLVERR status which causes an exception, synchronous > > external abort to CPU. This causes system to hang, so we need this > > check for both of our drivers. We will add comments. > > This is a problem, and checking whether the link is up is a workaround > but not a real solution. That means your system may hang if the link > happens to go down at the wrong time. > > A real solution would be to handle the synchronous external abort > so it doesn't cause a system hang. I still have no idea why checking (in a broken way) that the link is up for _every_ given config access is a solution. If, say, the link is down in xilinx_pcie_init_port(), what would bring it up or, worded differently, why checking that the link is up for every config access instead of host bridge probe time make this code any safer ? It is not safe anyway, it would be good to understand though why the code was written this way so that we can change it appropriately. Thanks, Lorenzo