[PATCH v2 1/2] PCI: pciehp: Add support for async hotplug with native AER and DPC/EDR

Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx> · Tue, 18 Apr 2023 21:05:25 +0000

According to Section 6.7.6 of PCIe Base Specification [1], async removal
with DPC and EDR may be unexpected and may result in a Surprise Down error.
This error is just a side effect of hot remove. Most of the time, these
errors will be abstract to the OS as current systems rely on Firmware-First
model for AER and DPC, where the error handling (side effects of async
remove) and other necessary HW sequencing actions is taken care by the FW
and is abstract to the OS. However, FW-First model poses issues while
rolling out updates or fixing bugs as the servers need to be brought down
for firmware updates.

Add support for async hot-plug with native AER and DPC/EDR. Here, OS is
responsible for handling async add and remove along with handling of AER
and DPC events which are generated as a side-effect of async remove.

The implementation is as follows: On an async remove a DPC is triggered
along with a Presence Detect State change. Determine it's an async remove
by checking for DPC Trigger Status in DPC Status Register and Surprise Down
Error Status in AER Uncorrected Error Status to be non-zero. If true, treat
the DPC event as a side-effect of async remove, clear the error status
registers and continue with hot-plug tear down routines. If not, follow the
existing routine to handle AER and DPC errors.

Dmesg before:

  pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000
  pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected
  pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
  pcieport 0000:00:01.4:   device [1022:14ab] error status/mask=00000020/04004000
  pcieport 0000:00:01.4:    [ 5] SDES (First)
  nvme nvme2: frozen state error detected, reset controller
  pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec
  pcieport 0000:00:01.4: AER: subordinate device reset failed
  pcieport 0000:00:01.4: AER: device recovery failed
  pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
  nvme2n1: detected capacity change from 1953525168 to 0
  pci 0000:04:00.0: Removing from iommu group 49

Dmesg after:

 pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
 nvme1n1: detected capacity change from 1953525168 to 0
 pci 0000:04:00.0: Removing from iommu group 37

[1] PCI Express Base Specification Revision 6.0, Dec 16 2021.
    https://members.pcisig.com/wg/PCI-SIG/document/16609

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>
---
v2:
	Indentation is taken care. (Bjorn)
	Unrelevant dmesg logs are removed. (Bjorn)
	Rephrased commit message, to be clear on native vs FW-First
	handling. (Bjorn and Sathyanarayanan)
	Prefix changed from pciehp_ to dpc_. (Lukas)
	Clearing ARI and AtomicOp Requester are performed as a part of
	(de-)enumeration in pciehp_unconfigure_device(). (Lukas)
	Changed to clearing all optional capabilities in DEVCTL2.
	OS-First -> native. (Sathyanarayanan)

Please note that, I have provided explanation why I'm not setting the
Surprise Down bit in uncorrectable error mask register in AER.
https://lore.kernel.org/all/fba22d6b-c225-4b44-674b-2c62306135ed@xxxxxxx/

Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set
on an async remove and will not be cleared while the device is brought
down. I have included clearing them here in order to mask any kind of
appearance that there was an error and as well duplicating our BIOS
functionality. I can remove if its not necessary.

On AMD systems we observe Presence Detect State change along with DPC
event on an async remove. Hence, the errors observed are benign on AMD
systems and the device will be brought down normally with PDSC. But the
errors logged might confuse users.
---
 drivers/pci/pcie/dpc.c | 50 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index a5d7c69b764e..78559188b9ac 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -293,10 +293,60 @@ void dpc_process_error(struct pci_dev *pdev)
 	}
 }
 
+static void pci_clear_surpdn_errors(struct pci_dev *pdev)
+{
+	u16 reg16;
+	u32 reg32;
+
+	pci_read_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, &reg32);
+	pci_write_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, reg32);
+
+	pci_read_config_word(pdev, PCI_STATUS, &reg16);
+	pci_write_config_word(pdev, PCI_STATUS, reg16);
+
+	pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &reg16);
+	pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16);
+}
+
+static void dpc_handle_surprise_removal(struct pci_dev *pdev)
+{
+	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
+		return;
+
+	/*
+	 * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async
+	 * removal might be unexpected, errors might be reported as a side
+	 * effect of the event and software should handle them as an expected
+	 * part of this event.
+	 */
+	pci_aer_raw_clear_status(pdev);
+	pci_clear_surpdn_errors(pdev);
+
+	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
+			      PCI_EXP_DPC_STATUS_TRIGGER);
+}
+
+static bool dpc_is_surprise_removal(struct pci_dev *pdev)
+{
+	u16 status;
+
+	pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);
+
+	if (!(status & PCI_ERR_UNC_SURPDN))
+		return false;
+
+	dpc_handle_surprise_removal(pdev);
+
+	return true;
+}
+
 static irqreturn_t dpc_handler(int irq, void *context)
 {
 	struct pci_dev *pdev = context;
 
+	if (dpc_is_surprise_removal(pdev))
+		return IRQ_HANDLED;
+
 	dpc_process_error(pdev);
 
 	/* We configure DPC so it only triggers on ERR_FATAL */
-- 
2.17.1