This is a note to let you know that I've just added the patch titled PCI/DPC: Await readiness of secondary bus after reset to the 6.2-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: pci-dpc-await-readiness-of-secondary-bus-after-reset.patch and it can be found in the queue-6.2 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 53b54ad074de1896f8b021615f65b27f557ce874 Mon Sep 17 00:00:00 2001 From: Lukas Wunner <lukas@xxxxxxxxx> Date: Sun, 15 Jan 2023 09:20:33 +0100 Subject: PCI/DPC: Await readiness of secondary bus after reset From: Lukas Wunner <lukas@xxxxxxxxx> commit 53b54ad074de1896f8b021615f65b27f557ce874 upstream. pci_bridge_wait_for_secondary_bus() is called after a Secondary Bus Reset, but not after a DPC-induced Hot Reset. As a result, the delays prescribed by PCIe r6.0 sec 6.6.1 are not observed and devices on the secondary bus may be accessed before they're ready. One affected device is Intel's Ponte Vecchio HPC GPU. It comprises a PCIe switch whose upstream port is not immediately ready after reset. Because its config space is restored too early, it remains in D0uninitialized, its subordinate devices remain inaccessible and DPC recovery fails with messages such as: i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible) intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible) pcieport 0000:89:02.0: AER: device recovery failed Fix it. Link: https://lore.kernel.org/r/9f5ff00e1593d8d9a4b452398b98aa14d23fca11.1673769517.git.lukas@xxxxxxxxx Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@xxxxxxxxx> Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Reviewed-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- drivers/pci/pci.c | 3 --- drivers/pci/pci.h | 6 ++++++ drivers/pci/pcie/dpc.c | 4 ++-- 3 files changed, 8 insertions(+), 5 deletions(-) --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -167,9 +167,6 @@ static int __init pcie_port_pm_setup(cha } __setup("pcie_port_pm=", pcie_port_pm_setup); -/* Time to wait after a reset for device to become responsive */ -#define PCIE_RESET_READY_POLL_MS 60000 - /** * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children * @bus: pointer to PCI bus structure to search --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -70,6 +70,12 @@ struct pci_cap_saved_state *pci_find_sav * Reset (PCIe r6.0 sec 5.8). */ #define PCI_RESET_WAIT 1000 /* msec */ +/* + * Devices may extend the 1 sec period through Request Retry Status completions + * (PCIe r6.0 sec 2.3.1). The spec does not provide an upper limit, but 60 sec + * ought to be enough for any device to become responsive. + */ +#define PCIE_RESET_READY_POLL_MS 60000 /* msec */ void pci_update_current_state(struct pci_dev *dev, pci_power_t state); void pci_refresh_power_state(struct pci_dev *dev); --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -170,8 +170,8 @@ pci_ers_result_t dpc_reset_link(struct p pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, PCI_EXP_DPC_STATUS_TRIGGER); - if (!pcie_wait_for_link(pdev, true)) { - pci_info(pdev, "Data Link Layer Link Active not set in 1000 msec\n"); + if (pci_bridge_wait_for_secondary_bus(pdev, "DPC", + PCIE_RESET_READY_POLL_MS)) { clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags); ret = PCI_ERS_RESULT_DISCONNECT; } else { Patches currently in stable-queue which might be from lukas@xxxxxxxxx are queue-6.2/pci-hotplug-allow-marking-devices-as-disconnected-during-bind-unbind.patch queue-6.2/pci-dpc-await-readiness-of-secondary-bus-after-reset.patch queue-6.2/pci-pm-observe-reset-delay-irrespective-of-bridge_d3.patch queue-6.2/pci-unify-delay-handling-for-reset-and-resume.patch