在 2025/1/23 12:53, Sathyanarayanan Kuppuswamy 写道:
On 11/12/24 5:54 AM, Shuai Xue wrote:
The current implementation of pcie_do_recovery() assumes that the
recovery process is executed on the device that detected the error.
However, the DPC driver currently passes the error port that experienced
the DPC event to pcie_do_recovery().
Use the SOURCE ID register to correctly identify the device that detected the
error. By passing this error device to pcie_do_recovery(), subsequent
patches will be able to accurately access AER status of the error device.
When passing the error device, I assume pcie_do_recovery() will find the
upstream bride and run the recovery logic .
Yes, the pcie_do_recovery() will find the upstream bridge and walk bridges
potentially AER affected.
Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>
---
IMO, moving the "err_port" rename to a separate patch will make this change
more clear. But it is up to you.
I see, I will add a separate patch.
drivers/pci/pci.h | 2 +-
drivers/pci/pcie/dpc.c | 30 ++++++++++++++++++++++++------
drivers/pci/pcie/edr.c | 35 ++++++++++++++++++-----------------
3 files changed, 43 insertions(+), 24 deletions(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 14d00ce45bfa..0866f79aec54 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -521,7 +521,7 @@ struct rcec_ea {
void pci_save_dpc_state(struct pci_dev *dev);
void pci_restore_dpc_state(struct pci_dev *dev);
void pci_dpc_init(struct pci_dev *pdev);
-void dpc_process_error(struct pci_dev *pdev);
+struct pci_dev *dpc_process_error(struct pci_dev *pdev);
pci_ers_result_t dpc_reset_link(struct pci_dev *pdev);
bool pci_dpc_recovered(struct pci_dev *pdev);
#else
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 2b6ef7efa3c1..62a68cde4364 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -257,10 +257,17 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev,
return 1;
}
-void dpc_process_error(struct pci_dev *pdev)
+/**
+ * dpc_process_error - handle the DPC error status
Handling the DPC error status has nothing to do with finding
the error source. Why not add a new helper function?
As PCIe Spec,
DPC Error Source ID - When the DPC Trigger Reason field indicates that DPC
was triggered due to the reception of an ERR_NONFATAL or ERR_FATAL, this
register contains the Requester ID of the received Message. Otherwise, the
value of this register is undefined.
To find the error source, we need to
- check the error reason from PCI_EXP_DPC_STATUS,
- Identify the error device by PCI_EXP_DPC_SOURCE_ID for ERR_NONFATAL and
ERR_FATAL reason.
The code will duplicate with dpc_process_error. Therefore, I directly reused
dpc_process_error.
+ * @pdev: the port that experienced the containment event
+ *
+ * Return the device that experienced the error.
detected the error?
Will change it.
+ */
+struct pci_dev *dpc_process_error(struct pci_dev *pdev)
{
u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
struct aer_err_info info;
+ struct pci_dev *err_dev = NULL;
I don't think you need NULL initialization here.
Will remove it.
pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source);
@@ -283,6 +290,13 @@ void dpc_process_error(struct pci_dev *pdev)
"software trigger" :
"reserved error");
+ if (reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE ||
+ reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE)
+ err_dev = pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus),
+ PCI_BUS_NUM(source), source & 0xff);
+ else
+ err_dev = pci_dev_get(pdev);
+
/* show RP PIO error detail information */
if (pdev->dpc_rp_extensions &&
reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_IN_EXT &&
@@ -295,6 +309,8 @@ void dpc_process_error(struct pci_dev *pdev)
pci_aer_clear_nonfatal_status(pdev);
pci_aer_clear_fatal_status(pdev);
}
+
+ return err_dev;
}
static void pci_clear_surpdn_errors(struct pci_dev *pdev)
@@ -350,21 +366,23 @@ static bool dpc_is_surprise_removal(struct pci_dev *pdev)
static irqreturn_t dpc_handler(int irq, void *context)
{
- struct pci_dev *pdev = context;
+ struct pci_dev *err_port = context, *err_dev = NULL;
NULL initialization is not needed.
Will remove it.
Thanks for valuable comments.
Best Regards,
Shuai