Hi Dan,
On 6/13/2024 10:47 AM, Smita Koralahalli wrote:
Hi Dan,
On 6/11/2024 5:07 PM, Dan Williams wrote:
Smita Koralahalli wrote:
When PCIe AER is in FW-First, OS should process CXL Protocol errors from
CPER records.
Reuse the existing work queue cxl_cper_work registered with GHES to
notify
the CXL subsystem on a Protocol error.
The defined trace events cxl_aer_uncorrectable_error and
cxl_aer_correctable_error currently trace native CXL AER errors. Reuse
them to trace FW-First Protocol Errors.
Signed-off-by: Smita Koralahalli
<Smita.KoralahalliChannabasappa@xxxxxxx>
---
drivers/acpi/apei/ghes.c | 14 ++++++++++++++
drivers/cxl/core/pci.c | 24 ++++++++++++++++++++++++
drivers/cxl/cxlpci.h | 3 +++
drivers/cxl/pci.c | 34 ++++++++++++++++++++++++++++++++--
include/linux/cxl-event.h | 1 +
5 files changed, 74 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1a58032770ee..a31bd91e9475 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -723,6 +723,20 @@ static void cxl_cper_handle_prot_err(struct
acpi_hest_generic_data *gdata)
if (cxl_cper_handle_prot_err_info(gdata, &wd.p_err))
return;
+
+ guard(spinlock_irqsave)(&cxl_cper_work_lock);
+
+ if (!cxl_cper_work)
+ return;
+
+ wd.event_type = CXL_CPER_EVENT_PROT_ERR;
+
+ if (!kfifo_put(&cxl_cper_fifo, wd)) {
+ pr_err_ratelimited("CXL CPER kfifo overflow\n");
+ return;
+ }
+
+ schedule_work(cxl_cper_work);
This seems wrong to unconditionally schedule the cxl_pci driver to look
at potentially "non-device" errors. With Terry's upcoming CXL switch
port error handling there will be a native path for those errors, but
until that arrives, I see no point in this code trying to convey
root/switch port errors to the endpoint driver.
I see okay. What are your recommendations on this? Just confine it to
CXL RCD, CXL SLD and CXL LD? And then extend it to ports once Terry
sends patches?
Also, I'm not sure about FMLD. Should we just drop it as of now?
Since, Terry sent his port error handling patches, shall I keep the
above check as is? That is schedule cxl_pci driver on all device and
port errors with mention to be rebased on Terry's.
I'm slightly doubtful on FMLD though.
Thanks,
Smita
[snip]