Re: [PATCH 3/4] acpi/ghes, cxl/pci: Trace FW-First CXL Protocol Errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dan,

On 6/13/2024 10:47 AM, Smita Koralahalli wrote:
Hi Dan,

On 6/11/2024 5:07 PM, Dan Williams wrote:
Smita Koralahalli wrote:
When PCIe AER is in FW-First, OS should process CXL Protocol errors from
CPER records.

Reuse the existing work queue cxl_cper_work registered with GHES to notify
the CXL subsystem on a Protocol error.

The defined trace events cxl_aer_uncorrectable_error and
cxl_aer_correctable_error currently trace native CXL AER errors. Reuse
them to trace FW-First Protocol Errors.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>
---
  drivers/acpi/apei/ghes.c  | 14 ++++++++++++++
  drivers/cxl/core/pci.c    | 24 ++++++++++++++++++++++++
  drivers/cxl/cxlpci.h      |  3 +++
  drivers/cxl/pci.c         | 34 ++++++++++++++++++++++++++++++++--
  include/linux/cxl-event.h |  1 +
  5 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1a58032770ee..a31bd91e9475 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -723,6 +723,20 @@ static void cxl_cper_handle_prot_err(struct acpi_hest_generic_data *gdata)
      if (cxl_cper_handle_prot_err_info(gdata, &wd.p_err))
          return;
+
+    guard(spinlock_irqsave)(&cxl_cper_work_lock);
+
+    if (!cxl_cper_work)
+        return;
+
+    wd.event_type = CXL_CPER_EVENT_PROT_ERR;
+
+    if (!kfifo_put(&cxl_cper_fifo, wd)) {
+        pr_err_ratelimited("CXL CPER kfifo overflow\n");
+        return;
+    }
+
+    schedule_work(cxl_cper_work);

This seems wrong to unconditionally schedule the cxl_pci driver to look
at potentially "non-device" errors. With Terry's upcoming CXL switch
port error handling there will be a native path for those errors, but
until that arrives, I see no point in this code trying to convey
root/switch port errors to the endpoint driver.

I see okay. What are your recommendations on this? Just confine it to CXL RCD, CXL SLD and CXL LD? And then extend it to ports once Terry sends patches?

Also, I'm not sure about FMLD. Should we just drop it as of now?


Since, Terry sent his port error handling patches, shall I keep the above check as is? That is schedule cxl_pci driver on all device and port errors with mention to be rebased on Terry's.

I'm slightly doubtful on FMLD though.

Thanks,
Smita

[snip]




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux