Re: [PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173

George Cherian <gcherian@xxxxxxxxxxxxxxxxxx> · Wed, 21 Feb 2018 14:58:13 +0530

Hi Bjorn,

On 02/21/2018 12:30 AM, Bjorn Helgaas wrote:
[+cc Huang]

On Tue, Feb 20, 2018 at 02:54:33AM +0100, Lukas Wunner wrote:
On Mon, Feb 19, 2018 at 12:21:56PM +0100, Rafael J. Wysocki wrote:
On Friday, February 16, 2018 9:34:34 PM CET Bjorn Helgaas wrote:
On Fri, Feb 16, 2018 at 01:40:37PM +0100, Rafael J. Wysocki wrote:
On Friday, February 16, 2018 12:39:00 AM CET Bjorn Helgaas wrote:
On Thu, Feb 15, 2018 at 10:57:25PM +0100, Rafael J. Wysocki wrote:
On Wednesday, February 14, 2018 9:16:53 PM CET Bjorn Helgaas wrote:
I don't know how this runtime PM works, but maybe Rafael can help
us out.

This has nothing to do with runtime PM AFAICS.

The device seems to be in D3hot on boot, is that correct?

I actually don't know for sure what device caused the issue or what
state it was in.  George, if you could clarify that, it would be
helpful.

I will explain the setup used
To the Cavium ThunderX RC the following PLX device is connected.
PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 
GT/s) Switch
There is no device connected downstream to the PLX switch.

AFAIU the pcie_port driver probes PLX and enters autosuspend after 100ms 
since pci_bridge_d3_possible() returns true.

And later pci_sysfs_init() ends up doing a config access of PLX which 
fails with a "synchronous external abort"

My assumption (possibly incorrect) is that the issue happens when we
read config space of an endpoint behind a PCIe bridge that is in D3.
The PCIe portdrv binds to the bridge before the pci_sysfs_init()
late_initcall happens, and it enables runtime PM on the bridge, so
it's possible the bridge is in D3 before pci_sysfs_init() happens.

The PCI core assumes that unbound devices remain in D0
(see comments in pci_pm_runtime_resume() / pci_pm_runtime_suspend()).

That comment was added by 967577b06241 ("PCI/PM: Keep runtime PM
enabled for unbound PCI devices"), apparently because "some devices do
not work again after being put into D3".  That commit mentions
https://bugzilla.kernel.org/show_bug.cgi?id=48201 but I don't think
it's completely convincing about the requirement for keeping things in
D0.

Even if it's true that some devices don't work after being in D3, that
doesn't seem like sufficient reason to keep *all* devices on all
systems in D0.  We would normally use quirks to address device issues
like this.

I'm not sure what the question is to be honest.

My questions are basically "What does the PCI core need to do to make
sure a device is in D0 before it operates on it?  And where do we need
to do that?"

When scanning the bus and discovering the device is not in D0,
call pci_power_up().  This could probably go into pci_init_pm().
Once a driver binds to it, it may choose to runtime suspend it to
D3hot again.

s/pci_init_pm/pci_pm_init/

That's an interesting idea.  I hadn't thought of the case where a
device is not in D0 when we enumerate it.  We *do* make sure bridges
are in D0 before scanning behind them (pci_scan_bridge_extend() calls
pm_runtime_get_sync()).

Since enumeration should only perform config accesses, and devices
must respond to config accesses even when in D3hot, I guess I'm not
convinced that we need to power up everything (other than bridges)
during enumeration.

Bjorn