Re: [PATCH] PCI: Add CRS timeout for pci_device_is_present()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 05, 2019 at 11:55:45AM +0100, Rafael J. Wysocki wrote:
> On Monday, November 4, 2019 6:39:04 PM CET Bjorn Helgaas wrote:
> > [+cc Andrew, Lukas]
> > 
> > On Tue, Oct 15, 2019 at 05:44:47PM +0530, Vidya Sagar wrote:
> > > On 10/15/2019 4:40 PM, Sinan Kaya wrote:
> > > > ...
> > > > I think the PCI core should be putting the device back D0 state as one
> > > > of the first actions before enumerating. Wake up could be a combination
> > > > of ACPI and/or PCI wake up depending on where your device sits in the
> > > > topology.
> > >
> > > Yup. It is indeed doing it as part of pci_power_up() in pci.c file.
> > > But, what is confusing to me is the order of the calls.
> > > pci_power_up() has following calls in the same order.
> > > 	pci_raw_set_power_state(dev, PCI_D0);
> > > 	pci_update_current_state(dev, PCI_D0);
> > > But, pci_raw_set_power_state() is accessing config space without calling
> > > pci_device_is_present() whereas pci_update_current_state() which is called
> > > later in the flow is calling pci_device_is_present()...!
> > 
> > A device should always respond to config reads unless it is in D3cold
> > or it is initializing after a reset.  IIUC you're doing a resume, not
> > a reset, so I think your device must be coming out of D3cold.  That's
> > typically done via ACPI, and I think we are missing some kind of delay
> > before our first config access:
> > 
> >   pci_power_up
> >     platform_pci_set_power_state(PCI_D0)    # eg, ACPI
> >     pci_raw_set_power_state
> >       pci_read_config_word(PCI_PM_CTRL)     # <-- first config access
> >       pci_write_config_word(PCI_PM_CTRL)
> >       pci_read_config_word(PCI_PM_CTRL)
> >     pci_update_current_state
> >       if (... || !pci_device_is_present())
> > 
> > Mika is working on some delays for the transition out of D3cold [1].
> > He's more concerned with a secondary bus behind a bridge, so I don't
> > think his patch addresses this case, but he's certainly familiar with
> > this area.
> > 
> > Huh, I'm really confused about this, too.  I don't
> > understand how resume ever works without any delay between
> > platform_pci_power_manageable() and the config reads in
> > pci_raw_set_power_state().  I must be missing something.
> 
> There is a delay in the runtime_d3cold case, see
> __pci_start_power_transition().

I see the delay in __pci_start_power_transition(), but I don't see how
it's relevant.  It's only called by pci_set_power_state(), and while
many drivers call pci_set_power_state() from legacy .resume() methods,
the pci_pm_resume_noirq() path where Vidya is seeing problems doesn't
use it.

> But overall platform_pci_power_manageable() only checks whether or
> not the platform firmware can change the power state of the device.
> If it can, it is expected to take care of any necessary delays while
> doing that (because there may be delays required by this particular
> instance of the platform firmware, beyond what is mandated by the
> PCI spec, or there may not be any need to wait at all). ...

That sounds like a reasonable argument for why firmware should be
responsible for this delay, but I don't think that's very clear in the
ACPI spec, so I wouldn't be surprised if it got missed.

Based on Vidya's backtrace, I think the resume path with problems is
this:

  pci_pm_resume_noirq
    pci_pm_default_resume_early
      pci_power_up
        if (platform_pci_power_manageable(dev))
          platform_pci_set_power_state(dev, PCI_D0)  # <-- FW delay here?
        pci_raw_set_power_state
        pci_update_current_state
          pci_device_is_present        # <-- config read returns CRS

So I think your suggestion is that Vidya's firmware should be doing
the delay inside platform_pci_set_power_state()?

Vidya, you typically work on Tegra, so I assume this is on an arm64
system?  Does it have ACPI?  Do you have access to the firmware
developers to ask about who they expect to do the delays?

> In any case, I'm not sure how useful it is to add delays for
> everyone in the cases in which a specific system needs a delay
> because of its own PM implementation limitations.  It may be better
> to quirk such systems explicitly as long as there are not too many
> quirks in there, or we'll end up adding more and more *implicit*
> quirks in the form of general delays.

I agree, a general delay doesn't sound good.  Are you thinking
something like this?

  void pci_power_up(struct pci_dev *dev)
  {
    if (platform_pci_power_manageable(dev)) {
      platform_pci_set_power_state(dev, PCI_D0);
      if (dev->XXX)
        msleep(dev->XXX);
    }
    ...

We already have dev->d3_delay and d3cold_delay, so it's getting a bit
messy to keep them all straight.

Bjorn



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux