Re: [RFC PATCH v1 2/2] PCI/AER: report fatal errors of RCiEP and EP if link recoverd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2024/11/7 00:39, Keith Busch 写道:
On Wed, Nov 06, 2024 at 05:03:39PM +0800, Shuai Xue wrote:
+int aer_get_device_fatal_error_info(struct pci_dev *dev, struct aer_err_info *info)
+{
+	int type = pci_pcie_type(dev);
+	int aer = dev->aer_cap;
+	u32 aercc;
+
+	pci_info(dev, "type :%d\n", type);
+
+	/* Must reset in this function */
+	info->status = 0;
+	info->tlp_header_valid = 0;
+	info->severity = AER_FATAL;
+
+	/* The device might not support AER */
+	if (!aer)
+		return 0;
+
+
+	if (type == PCI_EXP_TYPE_ENDPOINT || type == PCI_EXP_TYPE_RC_END) {
+		/* Link is healthy for IO reads now */
+		pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
+			&info->status);
+		pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_MASK,
+			&info->mask);
+		if (!(info->status & ~info->mask))
+			return 0;
+
+		/* Get First Error Pointer */
+		pci_read_config_dword(dev, aer + PCI_ERR_CAP, &aercc);
+		info->first_error = PCI_ERR_CAP_FEP(aercc);
+
+		if (info->status & AER_LOG_TLP_MASKS) {
+			info->tlp_header_valid = 1;
+			pcie_read_tlp_log(dev, aer + PCI_ERR_HEADER_LOG, &info->tlp);
+		}

This matches the uncorrectable handling in aer_get_device_error_info, so
perhaps a helper to reduce duplication.

Yes, will do.


+	}
+
+	return 1;
+}

Returning '1' even if type is root or downstream port?

  static inline void aer_process_err_devices(struct aer_err_info *e_info)
  {
  	int i;
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 31090770fffc..a74ae6a55064 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -196,6 +196,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
  	struct pci_dev *bridge;
  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
  	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
+	struct aer_err_info info;
/*
  	 * If the error was detected by a Root Port, Downstream Port, RCEC,
@@ -223,6 +224,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
  			pci_warn(bridge, "subordinate device reset failed\n");
  			goto failed;
  		}
+
+		/* Link recovered, report fatal errors on RCiEP or EP */
+		if (aer_get_device_fatal_error_info(dev, &info))
+			aer_print_error(dev, &info);

This will always print the error info even for root and downstream
ports, but you initialize "info" status and mask only if it's an EP or
RCiEP.

Got it. Will fix it.

Thank you for valuable comments.

Best Regards,
Shuai





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux