Smita Koralahalli wrote: > On 11/26/2024 8:05 AM, Jonathan Cameron wrote: > > On Tue, 19 Nov 2024 00:39:15 +0000 > > Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx> wrote: > > > >> When PCIe AER is in FW-First, OS should process CXL Protocol errors from > >> CPER records. Introduce support for handling and logging CXL Protocol > >> errors. > >> > >> The defined trace events cxl_aer_uncorrectable_error and > >> cxl_aer_correctable_error trace native CXL AER endpoint errors, while > >> cxl_cper_trace_corr_prot_err and cxl_cper_trace_uncorr_prot_err > >> trace native CXL AER port errors. Reuse both sets to trace FW-First > >> protocol errors. > >> > >> Since the CXL code is required to be called from process context and > >> GHES is in interrupt context, use workqueues for processing. > >> > >> Similar to CXL CPER event handling, use kfifo to handle errors as it > >> simplifies queue processing by providing lock free fifo operations. > >> > >> Add the ability for the CXL sub-system to register a workqueue to > >> process CXL CPER protocol errors. > >> > >> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx> > > > > A few minor comments inline. > > > > Thanks > > > > Jonathan > > > >> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > >> index 4ede038a7148..c992b34c290b 100644 > >> --- a/drivers/cxl/core/pci.c > >> +++ b/drivers/cxl/core/pci.c > >> @@ -650,6 +650,56 @@ void read_cdat_data(struct cxl_port *port) > >> } > >> EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL); > >> > >> +void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev, bool flag, > >> + struct cxl_ras_capability_regs ras_cap) > >> +{ > >> + struct cxl_dev_state *cxlds; > >> + u32 status; > >> + > >> + status = ras_cap.cor_status & ~ras_cap.cor_mask; > >> + > >> + if (!flag) { > > > > As below. Name of flag is not very helpful when reading the code. > > Perhaps we can rename? > > Okay. May be flag -> is_device_error ? I had the same question about 'flag'. > > > >> + trace_cxl_port_aer_correctable_error(&pdev->dev, status); > >> + return; > >> + } > >> + > >> + cxlds = pci_get_drvdata(pdev); > >> + if (!cxlds) > >> + return; > >> + > >> + trace_cxl_aer_correctable_error(cxlds->cxlmd, status); > >> +} > >> +EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_corr_prot_err, CXL); > >> + > >> +void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev, bool flag, > >> + struct cxl_ras_capability_regs ras_cap) > >> +{ > >> + struct cxl_dev_state *cxlds; > >> + u32 status, fe; > >> + > >> + status = ras_cap.uncor_status & ~ras_cap.uncor_mask; > >> + > >> + if (hweight32(status) > 1) > >> + fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK, > >> + ras_cap.cap_control)); > >> + else > >> + fe = status; > >> + > >> + if (!flag) { > > > > Why does a bool named flag indicate it's a port error? > > I will rename it. > > Or may be use an enum to explicitly define the error type > (CXL_ERROR_TYPE_DEVICE and CXL_ERROR_TYPE_PORT). > > Or may be split the function into two distinct ones, one for port errors > and one for device errors. I would vote for 2 functions. Ira