On Tue, 17 Dec 2024 02:20:32 +0000 Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx> wrote: > When PCIe AER is in FW-First, OS should process CXL Protocol errors from > CPER records. Introduce support for handling and logging CXL Protocol > errors. > > The defined trace events cxl_aer_uncorrectable_error and > cxl_aer_correctable_error trace native CXL AER endpoint errors, while > cxl_cper_trace_corr_prot_err and cxl_cper_trace_uncorr_prot_err > trace native CXL AER port errors. Reuse both sets to trace FW-First > protocol errors. > > Since the CXL code is required to be called from process context and > GHES is in interrupt context, use workqueues for processing. > > Similar to CXL CPER event handling, use kfifo to handle errors as it > simplifies queue processing by providing lock free fifo operations. > > Add the ability for the CXL sub-system to register a workqueue to > process CXL CPER protocol errors. > > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx> Hi Smita, A few really minor thing inline. Main one is this needs a rebase as the EXPORT_SYMBOL_NS_GPL() macros changed just after rc1 to require quoted strings. > #define CXL_CPER_FIFO_DEPTH 32 > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 740ac5d8809f..5bad24965e24 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -650,6 +650,68 @@ void read_cdat_data(struct cxl_port *port) > } > EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL); > +EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_uncorr_prot_err, CXL); Needs a rebase on rc2 or later. "CXL" - quotes now needed. > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 188412d45e0d..f6d467cd9232 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -1067,6 +1067,51 @@ static void cxl_cper_work_fn(struct work_struct *work) > } > static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn); > > +static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data) > +{ > + unsigned int devfn = PCI_DEVFN(data->prot_err.agent_addr.device, > + data->prot_err.agent_addr.function); > + struct pci_dev *pdev __free(pci_dev_put) = > + pci_get_domain_bus_and_slot(data->prot_err.agent_addr.segment, > + data->prot_err.agent_addr.bus, > + devfn); > + int port_type; > + > + if (!pdev) > + return; > + > + guard(device)(&pdev->dev); > + if (pdev->driver != &cxl_pci_driver) > + return; > + > + port_type = pci_pcie_type(pdev); > + if (port_type == PCI_EXP_TYPE_ROOT_PORT || > + port_type == PCI_EXP_TYPE_DOWNSTREAM || > + port_type == PCI_EXP_TYPE_UPSTREAM) { > + if (data->severity == AER_CORRECTABLE) > + cxl_cper_trace_corr_port_prot_err(pdev, data->ras_cap); > + else > + cxl_cper_trace_uncorr_port_prot_err(pdev, data->ras_cap); > + > + return; > + } > + > + if (data->severity == AER_CORRECTABLE) > + cxl_cper_trace_corr_prot_err(pdev, data->ras_cap); > + else > + cxl_cper_trace_uncorr_prot_err(pdev, data->ras_cap); > + No need for this blank line. > +}