Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 19, 2021 at 02:00:51PM -0500, stuart hayes wrote:
> On 7/19/2021 10:10 AM, Lukas Wunner wrote:
> > Could you test if the below patch fixes the issue?
> 
> That does appear to fix the issue, thanks!  Without your patch, the PCIe
> devices under 64:02.0 disappear (the triggered bit is still set in the DPC
> capability).  With your patch, recovery is successful and all of the PCIe
> devices are still there.

Thanks for testing.

The test patch clears DLLSC because the Hot Reset that is propagated
down the hierarchy causes the link to flap.  I'm wondering though if
that's sufficient or if PDC needs to be cleared as well.  According
to PCIe Base Spec sec. 4.2.6, LTSSM transitions from "Hot Reset" state
to "Detect", then "Polling".  If I understand the table "Link Status
Mapped to the LTSSM" in the spec correctly, in-band presence is 0b
in Detect state, hence I'd expect PDC to flap as well as a result of
a Hot Reset being propagated down the hierarchy.

Does the hotplug port at 0000:68:00.0 support In-Band Presence Disable?
That would explain why only clearing DLLSC is sufficient.

The problem is, if PDC is cleared as well, we lose the ability to
detect that a device was hot-removed while the reset was ongoing,
which is unfortunate.

If an error is handled by aer_root_reset() (instead of dpc_reset_link())
and the reset is performed at a hotplug port, then pciehp_reset_slot()
is invoked:

aer_root_reset()
  pci_bus_error_reset()
    pci_slot_reset()
      pci_reset_hotplug_slot()
        pciehp_reset_slot()

pciehp_reset_slot() temporarily masks both DLLSC *and* PDC events,
then performs a Secondary Bus Reset at the hotplug port.

If there are further hotplug ports below that hotplug port
where the SBR is performed, my expectation is that the Hot Reset
is likewise propagated down the hierarchy (just as with DPC),
so those cascaded hotplug ports should also see their link go down.

In other words, the issue you're seeing isn't really DPC-specific.
However, the test patch should fix the issue for AER-handled errors
as well.  Do you agree with this analysis or did I miss anything?

Thanks,

Lukas



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux