On 1/2/2024 12:27 PM, Ira Weiny wrote:
Smita Koralahalli wrote:
When PCIe AER is in FW-First, OS should process CXL Protocol errors from
CPER records. These CPER records obtained from GHES module, will rely on
a registered callback to be notified to the CXL subsystem in order to be
processed.
Call the existing cxl_cper_callback to notify the CXL subsystem on a
Protocol error.
The defined trace events cxl_aer_uncorrectable_error and
cxl_aer_correctable_error currently trace native CXL AER errors. Reuse
them to trace FW-First Protocol Errors.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>
[snip]
int cxl_cper_register_callback(cxl_cper_callback callback)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 37e1652afbc7..da516982a625 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -6,6 +6,7 @@
#include <linux/pci.h>
#include <linux/pci-doe.h>
#include <linux/aer.h>
+#include <linux/cper.h>
#include <cxlpci.h>
#include <cxlmem.h>
#include <cxl.h>
@@ -836,6 +837,51 @@ void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport)
}
EXPORT_SYMBOL_NS_GPL(cxl_setup_parent_dport, CXL);
+#define CXL_AER_UNCORRECTABLE 0
+#define CXL_AER_CORRECTABLE 1
Better defined as an enum?
Will change.
+
+int cper_severity_cxl_aer(int cper_severity)
My gut says that it would be better to hide this conversion in the
GHES/CPER code and send a more generic defined CXL_AER_* severity through.
Ok will change.
+{
+ switch (cper_severity) {
+ case CPER_SEV_RECOVERABLE:
+ case CPER_SEV_FATAL:
+ return CXL_AER_UNCORRECTABLE;
+ default:
+ return CXL_AER_CORRECTABLE;
+ }
+}
+
+void cxl_prot_err_trace_record(struct cxl_dev_state *cxlds,
+ struct cxl_cper_rec_data *data)
+{
+ struct cper_cxl_event_sn *dev_serial_num = &data->rec.hdr.dev_serial_num;
+ u32 status, fe;
+ int severity;
+
+ severity = cper_severity_cxl_aer(data->severity);
+
+ cxlds->serial = (((u64)dev_serial_num->upper_dw << 32) |
+ dev_serial_num->lower_dw);
This permanently overwrites the serial number read from PCI...
If the serial number does not match up or was not valid (per the check in
the previous patch) lets add a warning.
Sure will add.
Thanks,
Smita
AFAICT they should match.
Ira
[snip]