[+cc Jingoo, Gustavo (dwc maintainers), Ley (altera), Michal (xilinx)] On Fri, Apr 12, 2019 at 12:30:22PM +0530, Manikanta Maddireddy wrote: > On 12-Apr-19 1:45 AM, Bjorn Helgaas wrote: > > On Thu, Apr 11, 2019 at 10:33:47PM +0530, Manikanta Maddireddy wrote: > >> Add PCIe link up check in config read and write callback functions > >> before accessing endpoint config registers. > >> static int tegra_pcie_config_read(struct pci_bus *bus, unsigned int devfn, > >> int where, int size, u32 *value) > >> { > >> + struct tegra_pcie *pcie = bus->sysdata; > >> + struct pci_dev *bridge; > >> + struct tegra_pcie_port *port; > >> + > >> if (bus->number == 0) > >> return pci_generic_config_read32(bus, devfn, where, size, > >> value); > >> > >> + bridge = pcie_find_root_port(bus->self); > >> + > >> + list_for_each_entry(port, &pcie->ports, list) > >> + if (port->index + 1 == PCI_SLOT(bridge->devfn)) > >> + break; > >> + > >> + /* If there is no link, then there is no device */ > >> + if (!tegra_pcie_link_status(port)) { > > > > This is racy and you should avoid it if possible. The link could go down > > between calling tegra_pcie_link_status() and issuing the config read/write. > > > > If your driver is to be reliable, it must be able to handle any bad > > consequence of issuing that config read/write anyway, so I think it's > > better if it doesn't even bother checking whether the link is up. > > This change is made based on similar check present in dwc driver > dw_pcie_valid_device(), reasons for making this change in Tegra might > differ dwc. Yes, you won't be surprised to learn that I don't like the similar checks in dwc, altera, xilinx, and xilinx-nwl either :) I raise this issue every time I see it, but I can't remember if I've mentioned dwc specifically. We need to either eradicate this pattern of checking for link up, or include a comment about why it is absolutely necessary. > Intention here is to reduce the number of AER errors when device is > falling off the bus or going through hot reset. So racy condition here is > OK I'm not convinced about this. The issues you mention need to be solved in a generic way, not a tegra-specific way. We don't want to end up with code that silently avoids the config access 99.99% of the time, but once in a blue moon, we lose the race (the device stops responding after we've determined the link is up) and the access causes a mysterious AER error that we have no way to debug. > >> + *value = 0xffffffff; > >> + return PCIBIOS_DEVICE_NOT_FOUND; > >> + } > >> + > >> return pci_generic_config_read(bus, devfn, where, size, value); > >> }