On Mon, Apr 15, 2019 at 05:06:10PM +0530, Manikanta Maddireddy wrote: > On 12-Apr-19 8:20 PM, Bjorn Helgaas wrote: > > On Fri, Apr 12, 2019 at 12:30:22PM +0530, Manikanta Maddireddy wrote: > >> On 12-Apr-19 1:45 AM, Bjorn Helgaas wrote: > >>> On Thu, Apr 11, 2019 at 10:33:47PM +0530, Manikanta Maddireddy wrote: > >>>> Add PCIe link up check in config read and write callback functions > >>>> before accessing endpoint config registers. > >>>> static int tegra_pcie_config_read(struct pci_bus *bus, unsigned int devfn, > >>>> int where, int size, u32 *value) > >>>> { > >>>> + struct tegra_pcie *pcie = bus->sysdata; > >>>> + struct pci_dev *bridge; > >>>> + struct tegra_pcie_port *port; > >>>> + > >>>> if (bus->number == 0) > >>>> return pci_generic_config_read32(bus, devfn, where, size, > >>>> value); > >>>> > >>>> + bridge = pcie_find_root_port(bus->self); > >>>> + > >>>> + list_for_each_entry(port, &pcie->ports, list) > >>>> + if (port->index + 1 == PCI_SLOT(bridge->devfn)) > >>>> + break; > >>>> + > >>>> + /* If there is no link, then there is no device */ > >>>> + if (!tegra_pcie_link_status(port)) { > >>> This is racy and you should avoid it if possible. The link could go down > >>> between calling tegra_pcie_link_status() and issuing the config read/write. > >>> > >>> If your driver is to be reliable, it must be able to handle any bad > >>> consequence of issuing that config read/write anyway, so I think it's > >>> better if it doesn't even bother checking whether the link is up. > >> This change is made based on similar check present in dwc driver > >> dw_pcie_valid_device(), reasons for making this change in Tegra might > >> differ dwc. > > Yes, you won't be surprised to learn that I don't like the similar > > checks in dwc, altera, xilinx, and xilinx-nwl either :) I raise this > > issue every time I see it, but I can't remember if I've mentioned dwc > > specifically. > > > > We need to either eradicate this pattern of checking for link up, or > > include a comment about why it is absolutely necessary. > > This patch is created to address below scenario in our downstream kernel, > 1) Our platform has WiFi on one slot and GPU in another. > 2) During WiFi OFF, link is put in L2 and it goes through hot reset > when turning ON WiFi (since Tegra doesn't support hot-plug). > 3) Whenever x11 server is started it scans the PCIe bus for video devices. > Here PCIe configuration registers of all devices are read to find out > all available video devices. > 4) If "x11 server" started with WiFi OFF, then we are seeing "response > decoding error"(Tegra AFI module specific error). Probably happens with lspci too. I guess this is when you try to read the WiFi device config space? > Best solution we came up with is to have link up check in config access > callback functions. Can you check for "response decoding error" in the config accessor and return 0xffffffff if you see it? > >> Intention here is to reduce the number of AER errors when device is > >> falling off the bus or going through hot reset. So racy condition here is > >> OK > > > > I'm not convinced about this. The issues you mention need to be > > solved in a generic way, not a tegra-specific way. > > > > We don't want to end up with code that silently avoids the config > > access 99.99% of the time, but once in a blue moon, we lose the race > > (the device stops responding after we've determined the link is up) > > and the access causes a mysterious AER error that we have no way to > > debug. > > > >>>> + *value = 0xffffffff; > >>>> + return PCIBIOS_DEVICE_NOT_FOUND; > >>>> + } > >>>> + > >>>> return pci_generic_config_read(bus, devfn, where, size, value); > >>>> }