Re: [PATCH 3/3] PCI/DPC: Await readiness of secondary bus after reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 12, 2023 at 09:06:15PM -0600, Bjorn Helgaas wrote:
> On Thu, Jan 12, 2023 at 8:39 PM Yang Su <yang.su@xxxxxxxxxxxxxxxxx> wrote:
> > I also test pci_bridge_wait_for_secondary_bus in NVIDIA GPU T4
> > which bind vfio passthrough in VMM, I found the
> > pci_bridge_wait_for_secondary_bus can not wait the enough time
> > as pci spec requires, the reasons are described as below.
[...]
> I'll wait for you and Lukas to continue this conversation on the
> mailing list.

Yang Su reports that pci_bridge_secondary_bus_reset() is called for
an Nvidia T4 GPU.  That's an endpoint device but the function should
only ever be called for bridges.  It's unclear to me how that can happen.

The call stack looks like this:

vfio_pci_open()
  pci_set_power_state()
  pci_clear_master()
  pci_enable_device()
  pci_try_reset_function()
    pci_dev_trylock()
    pci_dev_save_and_disable()
    pci_dev_specific_reset()
    pcie_has_flr()
    pcie_af_flr()
    pci_read_config_word()
    pci_dev_reset_slot_function()
    pci_parent_bus_reset()
      pci_bridge_secondary_bus_reset()
        pcibios_reset_secondary_bus()
	  pci_reset_secondary_bus()
	    pci_read_config_word()
	    pci_write_config_word()
	    pci_write_config_word()
	    pcie_wait_for_link_delay()
	pci_dev_wait()
    pci_dev_restore()
    pci_cfg_access_unlock()
  pci_save_state()
  pci_store_saved_state()
  vfio_config_init()

Note that vfio_pci_open() was renamed to vfio_pci_open_device() with
commit 2cd8b14aaa66, which went into v5.15.  So apparently this call
stack is from an earlier kernel.

The GPU is located below a PEX9797 switch, which is connected to a
SkyLake-E Root Port.  So pci_bridge_secondary_bus_reset() should be
called for the Switch Downstream Port but Yang Su says it's called
for the GPU endpoint device.

pci_parent_bus_reset() finds the parent by following dev->bus->self.
I've suggested to Yang Su to replace that with pci_upstream_bridge(dev)
and see if it fixes the issue.  It does make a difference in SR-IOV
scenarios (see the comment in include/linux/pci.h above pci_is_root_bus())
though Yang Su reports that SR-IOV is not used on this machine,
only vfio passthrough.

I believe that endpoint devices don't have a Bridge Control Register
(Type 0 Config Space has Max_Lat / Min_Gnt instead), so setting the
Secondary Bus Reset bit should have no effect.  But apparently it
does have an effect because Yang Su is witnessing issues with the delay
after reset.

Specifically, Yang Su says that pci_bridge_wait_for_secondary_bus()
bails out because the GPU endpoint device fails the !pci_is_bridge()
check at the top of the function.  Also, the calculation of
pci_bus_max_d3cold_delay() fails because the GPU lacks a subordinate bus.
Apparently the unconditional 1 second delay in pci_reset_secondary_bus()
papers over the issue because it waits long enough for the GPU endpoint
device to come out of reset.

Maybe the information that pci_bridge_secondary_bus_reset() is executed
for an endpoint device is not correct and it's in fact executed for
the Downstream Port of the PEX9797 switch, but perhaps config space
of the port is broken and contains an incorrect Header Type field,
which would cause pci_is_bridge() to return false for it.  Though we
wouldn't scan the secondary bus below the switch in that case,
so the GPU wouldn't be enumerated.  I've requested "lspci -vvv" output
for the GPU and the Downstream Port off-list but haven't received it yet.

Thanks,

Lukas



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux