Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC

stuart hayes <stuart.w.hayes@xxxxxxxxx> · Tue, 6 Jul 2021 17:15:02 -0500

On 6/26/2021 1:50 AM, Lukas Wunner wrote:
On Fri, Jun 25, 2021 at 03:38:41PM -0500, stuart hayes wrote:
I have a system that is failing to recover after an EDR event with (or
without...) this patch.  It looks like the problem is similar to what this
patch is trying to fix, except that on my system, the hotplug port is
downstream of the root port that has DPC, so the "link down" event on it is
not being ignored.  So the hotplug code disables the slot (which contains an
NVMe device on this system) while the nvme driver is trying to use it, which
results in a failed recovery and another EDR event, and the kernel ends up
with the DPC trigger status bit set in the root port, so everything
downstream is gone.

I added the hack below so the hotplug code will ignore the "link down"
events on the ports downstream of the root port during DPC recovery, and it
recovers no problem.  (I'm not proposing this as a correct fix.)

Please help me understand what's causing the Link Down event in the
first place:

With DPC, the hardware (only) disables the link on the port containing the
error.  Since that's the Root Port above the hotplug port in your case,
the link between the hotplug port and the NVMe drive should remain up.

Since your patch sets the PCI_DPC_RECOVERING flag during invocation
of the dev->driver->err_handler->slot_reset() hook, I assume that's
what's causing the Link Down.  However pcie_portdrv_slot_reset()
only restores and saves PCI config space, I don't think that's
causing a Link Down?

Is maybe nvme_slot_reset() causing the Link Down on the parent hotplug port?

Thanks,

Lukas


Sorry for the delayed response--I was out of town.

I believe the Link Down is happening because a hot reset is propagated 
down when the link is lost under the root port 64:02.0.  From the PCIe 
Base Spec 5.0, section 6.6.1 "conventional reset":

• For a Switch, the following must cause a hot reset to be sent on all 
Downstream Ports:
  ...
  ◦ The Data Link Layer of the Upstream Port reporting DL_Down status. 
In Switches that support Link speeds greater than 5.0 GT/s, the Upstream 
Port must direct the LTSSM of each Downstream Port to the Hot Reset 
state, but not hold the LTSSMs in that state. This permits each 
downstream Port to begin Link training immediately after its hot reset 
completes. This behavior is recommended for all Switches.
  ◦ Receiving a hot reset on the Upstream Port
(end of paste from pci spec)

For reference, here's the "lspci -t" output covering the root port 
64:02.0 that is getting the DPC... there are NVMe drives at 69:00.0, 
6a:00.0, 6c:00.0, and 6e:00.0, and a SAS controller at 79:00.0.

 +-[0000:64]-+-00.0
 |           +-00.1
 |           +-00.2
 |           +-00.4
 | 
\-02.0-[65-79]----00.0-[66-79]--+-00.0-[67-70]----00.0-[68-70]--+-00.0-[69]----00.0
 |                                           | 
      +-04.0-[6a]----00.0
 |                                           | 
      +-08.0-[6b]--
 |                                           | 
      +-0c.0-[6c]----00.0
 |                                           | 
      +-10.0-[6d]--
 |                                           | 
      +-14.0-[6e]----00.0
 |                                           | 
      +-18.0-[6f]--
 |                                           | 
      \-1c.0-[70]--
 | 
+-04.0-[71-76]----00.0-[72-76]--+-10.0-[73]--
 |                                           | 
      +-14.0-[74]--
 |                                           | 
      +-18.0-[75]--
 |                                           | 
      \-1c.0-[76]--
 |                                           +-08.0-[77-78]----00.0-[78]--
 |                                           \-1c.0-[79]----00.0

I put in some debug code to printk the config registers before the 
config space is restored.  Before I trigger the DPC, the slot status 
register at 68:00.0 reads 0x40 (presence detected), and after the DPC 
(but before restoring PCI config space for 68:00.0), it reads 0x140 (DLL 
state changed + presence detected).

Before config space is restored to 68:00.0, the command register is 0. 
After config space is restored, I see "pcieport 0000:68:00.0: pciehp: 
pending interrupts 0x0010 from Slot Status" followed by "...pciehp: 
Slot(211): Link Down".  So I assume as soon as it is able to (when its 
config space is restored), 68:00.0 sends the hotplug interrupt, which 
takes down 69:00.0.

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index b576aa890c76..dfd983c3c5bf 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -119,8 +132,10 @@ static int report_slot_reset(struct pci_dev *dev, void
*data)
  		!dev->driver->err_handler->slot_reset)
  		goto out;

+	set_bit(PCI_DPC_RECOVERING, &dev->priv_flags);
  	err_handler = dev->driver->err_handler;
  	vote = err_handler->slot_reset(dev);
+	clear_bit(PCI_DPC_RECOVERING, &dev->priv_flags);
  	*result = merge_result(*result, vote);
  out:
  	device_unlock(&dev->dev);