Re: [PATCH v2 1/2] PCI/DPC: Run recovery on device that detected the error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2025/1/23 12:53, Sathyanarayanan Kuppuswamy 写道:

On 11/12/24 5:54 AM, Shuai Xue wrote:
The current implementation of pcie_do_recovery() assumes that the
recovery process is executed on the device that detected the error.
However, the DPC driver currently passes the error port that experienced
the DPC event to pcie_do_recovery().

Use the SOURCE ID register to correctly identify the device that detected the
error. By passing this error device to pcie_do_recovery(), subsequent
patches will be able to accurately access AER status of the error device.

When passing the error device, I assume pcie_do_recovery() will find the
upstream bride and run the recovery logic .


Yes, the pcie_do_recovery() will find the upstream bridge and walk bridges
potentially AER affected.


Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>
---

IMO, moving the "err_port" rename to a separate patch will make this change
more clear.  But it is up to you.

I see, I will add a separate patch.


  drivers/pci/pci.h      |  2 +-
  drivers/pci/pcie/dpc.c | 30 ++++++++++++++++++++++++------
  drivers/pci/pcie/edr.c | 35 ++++++++++++++++++-----------------
  3 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 14d00ce45bfa..0866f79aec54 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -521,7 +521,7 @@ struct rcec_ea {
  void pci_save_dpc_state(struct pci_dev *dev);
  void pci_restore_dpc_state(struct pci_dev *dev);
  void pci_dpc_init(struct pci_dev *pdev);
-void dpc_process_error(struct pci_dev *pdev);
+struct pci_dev *dpc_process_error(struct pci_dev *pdev);
  pci_ers_result_t dpc_reset_link(struct pci_dev *pdev);
  bool pci_dpc_recovered(struct pci_dev *pdev);
  #else
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 2b6ef7efa3c1..62a68cde4364 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -257,10 +257,17 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev,
      return 1;
  }
-void dpc_process_error(struct pci_dev *pdev)
+/**
+ * dpc_process_error - handle the DPC error status

Handling the DPC error status has nothing to do with finding
the error source. Why not add a new helper function?

As PCIe Spec,

    DPC Error Source ID - When the DPC Trigger Reason field indicates that DPC
    was triggered due to the reception of an ERR_NONFATAL or ERR_FATAL, this
    register contains the Requester ID of the received Message. Otherwise, the
    value of this register is undefined.

To find the error source, we need to

  - check the error reason from PCI_EXP_DPC_STATUS,
  - Identify the error device by PCI_EXP_DPC_SOURCE_ID for ERR_NONFATAL and
    ERR_FATAL reason.

The code will duplicate with dpc_process_error. Therefore, I directly reused
dpc_process_error.


+ * @pdev: the port that experienced the containment event
+ *
+ * Return the device that experienced the error.
detected the error?

Will change it.

+ */
+struct pci_dev *dpc_process_error(struct pci_dev *pdev)
  {
      u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
      struct aer_err_info info;
+    struct pci_dev *err_dev = NULL;

I don't think you need NULL initialization here.

Will remove it.


      pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
      pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source);
@@ -283,6 +290,13 @@ void dpc_process_error(struct pci_dev *pdev)
           "software trigger" :
           "reserved error");
+    if (reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE ||
+        reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE)
+        err_dev = pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus),
+                        PCI_BUS_NUM(source), source & 0xff);
+    else
+        err_dev = pci_dev_get(pdev);
+
      /* show RP PIO error detail information */
      if (pdev->dpc_rp_extensions &&
          reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_IN_EXT &&
@@ -295,6 +309,8 @@ void dpc_process_error(struct pci_dev *pdev)
          pci_aer_clear_nonfatal_status(pdev);
          pci_aer_clear_fatal_status(pdev);
      }
+
+    return err_dev;
  }
  static void pci_clear_surpdn_errors(struct pci_dev *pdev)
@@ -350,21 +366,23 @@ static bool dpc_is_surprise_removal(struct pci_dev *pdev)
  static irqreturn_t dpc_handler(int irq, void *context)
  {
-    struct pci_dev *pdev = context;
+    struct pci_dev *err_port = context, *err_dev = NULL;

NULL initialization is not needed.

Will remove it.

Thanks for valuable comments.

Best Regards,
Shuai




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux