Re: [PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 21, 2018 at 04:25:08PM +0530, George Cherian wrote:
> On 02/21/2018 03:24 PM, Lukas Wunner wrote:
> > On Wed, Feb 21, 2018 at 02:58:13PM +0530, George Cherian wrote:
> > > I will explain the setup used
> > > To the Cavium ThunderX RC the following PLX device is connected.
> > > PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s)
> > > Switch
> > > There is no device connected downstream to the PLX switch.
> > > 
> > > AFAIU the pcie_port driver probes PLX and enters autosuspend after 100ms
> > > since pci_bridge_d3_possible() returns true.
> > > 
> > > And later pci_sysfs_init() ends up doing a config access of PLX which fails
> > > with a "synchronous external abort"

Thanks for the details!

This one *should* be fixed by this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization&id=bf6c089ee2ac67eb22c0ff0ac9cc7f9ccd619d90

Any chance you could try that out?

> > Then you're missing a pci_config_pm_runtime_get() in pci_sysfs_init() or
> > further down in the call stack, rather than a quirk which just papers
> > over the issue.
> 
> I have found another configuration where this fails.
> Following is the configuration
> 1) Connected a PCIe Intel i40 card under the root port.
> 2) unbind the i40 driver and bind with vfio-pci driver.
> 3) Run lspci in a loop. "lspci -s xx:xx.xx -vvv"
> 
> I get the same synchronous external abort.
> In this case the vfio-pci driver probe it moves the device (i40) to
> D3hot provided disable_idle_d3 is not set. lspci tries to do
> the config_access which fails with synchronous external abort when
> the root port transitions to D3hot.

This one sounds like we're missing something in this path:

  pci_read_config
    pci_config_pm_runtime_get
      if (parent)
        pm_runtime_get_sync
          __pm_runtime_resume(dev, RPM_GET_PUT)
            rpm_resume

It *looks* like rpm_resume() should resume parent devices, i.e., the
root port, but I don't know that code at all.  Maybe Rafael or Lukas
could confirm that?

pci_config_pm_runtime_get() knows that config space is always
accessible unless the device is in D3cold, so if the target device is
in D3hot, it will leave it there.  I assume that if/when rpm_resume()
resumes the parent bridges, it will resume them all the way to D0.

I'm *really* glad you're finding these issues, because on most
platforms we would just silently read invalid data (all ones) and the
caller would have no idea what's going wrong.

Bjorn



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux