Re: [PATCH] PCI: Add CRS timeout for pci_device_is_present()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/6/2019 10:11 PM, Bjorn Helgaas wrote:
On Tue, Nov 05, 2019 at 11:55:45AM +0100, Rafael J. Wysocki wrote:
On Monday, November 4, 2019 6:39:04 PM CET Bjorn Helgaas wrote:
[+cc Andrew, Lukas]

On Tue, Oct 15, 2019 at 05:44:47PM +0530, Vidya Sagar wrote:
On 10/15/2019 4:40 PM, Sinan Kaya wrote:
...
I think the PCI core should be putting the device back D0 state as one
of the first actions before enumerating. Wake up could be a combination
of ACPI and/or PCI wake up depending on where your device sits in the
topology.

Yup. It is indeed doing it as part of pci_power_up() in pci.c file.
But, what is confusing to me is the order of the calls.
pci_power_up() has following calls in the same order.
	pci_raw_set_power_state(dev, PCI_D0);
	pci_update_current_state(dev, PCI_D0);
But, pci_raw_set_power_state() is accessing config space without calling
pci_device_is_present() whereas pci_update_current_state() which is called
later in the flow is calling pci_device_is_present()...!

A device should always respond to config reads unless it is in D3cold
or it is initializing after a reset.  IIUC you're doing a resume, not
a reset, so I think your device must be coming out of D3cold.  That's
typically done via ACPI, and I think we are missing some kind of delay
before our first config access:

   pci_power_up
     platform_pci_set_power_state(PCI_D0)    # eg, ACPI
     pci_raw_set_power_state
       pci_read_config_word(PCI_PM_CTRL)     # <-- first config access
       pci_write_config_word(PCI_PM_CTRL)
       pci_read_config_word(PCI_PM_CTRL)
     pci_update_current_state
       if (... || !pci_device_is_present())

Mika is working on some delays for the transition out of D3cold [1].
He's more concerned with a secondary bus behind a bridge, so I don't
think his patch addresses this case, but he's certainly familiar with
this area.

Huh, I'm really confused about this, too.  I don't
understand how resume ever works without any delay between
platform_pci_power_manageable() and the config reads in
pci_raw_set_power_state().  I must be missing something.

There is a delay in the runtime_d3cold case, see
__pci_start_power_transition().

I see the delay in __pci_start_power_transition(), but I don't see how
it's relevant.  It's only called by pci_set_power_state(), and while
many drivers call pci_set_power_state() from legacy .resume() methods,
the pci_pm_resume_noirq() path where Vidya is seeing problems doesn't
use it.

But overall platform_pci_power_manageable() only checks whether or
not the platform firmware can change the power state of the device.
If it can, it is expected to take care of any necessary delays while
doing that (because there may be delays required by this particular
instance of the platform firmware, beyond what is mandated by the
PCI spec, or there may not be any need to wait at all). ...

That sounds like a reasonable argument for why firmware should be
responsible for this delay, but I don't think that's very clear in the
ACPI spec, so I wouldn't be surprised if it got missed.

Based on Vidya's backtrace, I think the resume path with problems is
this:

   pci_pm_resume_noirq
     pci_pm_default_resume_early
       pci_power_up
         if (platform_pci_power_manageable(dev))
           platform_pci_set_power_state(dev, PCI_D0)  # <-- FW delay here?
         pci_raw_set_power_state
         pci_update_current_state
           pci_device_is_present        # <-- config read returns CRS

So I think your suggestion is that Vidya's firmware should be doing
the delay inside platform_pci_set_power_state()?

Vidya, you typically work on Tegra, so I assume this is on an arm64
system?  Does it have ACPI?  Do you have access to the firmware
developers to ask about who they expect to do the delays?
Yes. This is on arm64 (Tegra) and we don't have any ACPI or any other firmware
for that matter. PCIe is brought up directly in the kernel.


In any case, I'm not sure how useful it is to add delays for
everyone in the cases in which a specific system needs a delay
because of its own PM implementation limitations.  It may be better
to quirk such systems explicitly as long as there are not too many
quirks in there, or we'll end up adding more and more *implicit*
quirks in the form of general delays.

I agree, a general delay doesn't sound good.  Are you thinking
something like this?

   void pci_power_up(struct pci_dev *dev)
   {
     if (platform_pci_power_manageable(dev)) {
       platform_pci_set_power_state(dev, PCI_D0);
       if (dev->XXX)
         msleep(dev->XXX);
     }
     ...

We already have dev->d3_delay and d3cold_delay, so it's getting a bit
messy to keep them all straight.

Bjorn





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux