Dan Williams wrote: > Smita Koralahalli wrote: > > On 12/15/2023 3:26 PM, Ira Weiny wrote: > > > If the firmware has configured CXL event support to be firmware first > > > the OS can process those events through CPER records. The CXL layer has > > > unique DPA to HPA knowledge and standard event trace parsing in place. > > > > > > CPER records contain Bus, Device, Function information which can be used > > > to identify the PCI device which is sending the event. > > > > > > Change pci driver registration to include registration for a CXL CPER > > > notifier to process the events through the trace subsystem. > > > > > > Define and use scoped based management to simplify the handling of the > > > pci device object. > > > > > > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx> > > > > > > --- > > > > [snip] > > > > > > > + switch (event_type) { > > > + case CXL_CPER_EVENT_GEN_MEDIA: > > > + trace_cxl_general_media(cxlmd, type, &gen_media_event_uuid, > > > + &event->gen_media); > > > + break; > > > + case CXL_CPER_EVENT_DRAM: > > > + trace_cxl_dram(cxlmd, type, &dram_event_uuid, &event->dram); > > > + break; > > > + case CXL_CPER_EVENT_MEM_MODULE: > > > + trace_cxl_memory_module(cxlmd, type, &mem_mod_event_uuid, > > > + &event->mem_module); > > > + break; > > > + } > > > +} > > > > Is default case needed here? > > Yeah, it looks like an uninitialized @type value can be passed through > the stack here. That was not my intention but yea. Added a generic trace with a null UUID. [snip] > > > +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0) > > > +static void cxl_cper_event_call(enum cxl_event_type ev_type, > > > + struct cxl_cper_event_rec *rec) > > > +{ > > > + struct cper_cxl_event_devid *device_id = &rec->hdr.device_id; > > > + struct pci_dev *pdev __free(pci_dev_put) = NULL; > > > + struct cxl_dev_state *cxlds = NULL; > > > + enum cxl_event_log_type log_type; > > > + unsigned int devfn; > > > + u32 hdr_flags; > > > + > > > + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num); > > > + pdev = pci_get_domain_bus_and_slot(device_id->segment_num, > > > + device_id->bus_num, devfn); > > > + if (!pdev) > > > + return; > > > + > > > + guard(device)(&pdev->dev); > > > + if (pdev->driver == &cxl_pci_driver) > > > + cxlds = pci_get_drvdata(pdev); > > > + if (!cxlds) > > > + return; > > > + > > > + /* Fabricate a log type */ > > > + hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags); > > > + log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags); > > > + > > > + cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type, &rec->event); > > > > Currently, when I run this, I see two trace events printed. One from > > here, and another as a non_standard_event from ghes. I think both should > > be unified? By the way, Smita, Thanks for testing! I really do appreciate it! > > > > I remember Dan pointing out to me this when I sent decoding for protocol > > errors and its still pending on me for protocol errors. > > Good point, so I think the responsibility to trace CXL events should > belong to ghes_do_proc() and ghes_print_estatus() can just ignore CXL > events. > > Notice how ghes_proc() sometimes skips ghes_print_estatus(), but > uncoditionally emits a trace event in ghes_do_proc()? To me that means > that the cper_estatus_print() inside ghes_print_estatus() can just defer > to the ghes code to do the hookup to the trace code. > > For example, ras_userspace_consumers() was introduced to skip emitting > events to the kernel log when the trace event might be handled. My > assumption is that was for historical reasons, but since CXL events are > new, just never emit them to the kernel log and always require the trace > path. > > I am open to other thoughts here, but it seems like ghes_do_proc() is > where the callback needs to be triggered. I see. Ok. I'll create a pre-patch which moves the protocol error first then I'll put the events in the ghes_do_proc() well. [snip] > > > + rc = cxl_cper_register_notifier(cxl_cper_event_call); > > Quick aside as I am reading through this, the "notifier" name is > misleading since this callback has nothing to do with the > include/linux/notifier.h API. Fair point. I debated 'callback' vs 'notifier'. I'll change it to callback as I think that is equally correct and as you say does clarify this is not a 'notifier'. Ira