Smita Koralahalli wrote: > On 10/2/2024 4:47 PM, Dan Williams wrote: > > Smita Koralahalli wrote: > >> UEFI v2.10 section N.2.13 defines a CPER record for CXL Protocol errors. > >> > >> Add GHES support to detect CXL CPER Protocol Error Record and Cache Error > >> Severity, Device ID, Device Serial number and CXL RAS capability struct in > >> struct cxl_cper_prot_err. Include this struct as a member of struct > >> cxl_cper_work_data. > >> > >> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx> > >> --- > >> v2: > >> Defined array of structures for Device ID and Serial number > >> comparison. > >> p_err -> rec/p_rec. > >> --- > >> drivers/acpi/apei/ghes.c | 10 +++ > >> drivers/firmware/efi/cper_cxl.c | 115 ++++++++++++++++++++++++++++++++ > >> include/cxl/event.h | 26 ++++++++ > >> 3 files changed, 151 insertions(+) > >> [..] > >> +static enum cxl_aer_err_type cper_severity_cxl_aer(int cper_severity) > >> +{ > >> + switch (cper_severity) { > >> + case CPER_SEV_RECOVERABLE: > >> + case CPER_SEV_FATAL: > >> + return CXL_AER_UNCORRECTABLE; > >> + default: > >> + return CXL_AER_CORRECTABLE; > >> + } > > > > Why does the CPER severity need to be converted to a new CXL specific > > enum value? > > I was just following up the same convention here as done for AER.. > cper_severity_to_aer(). I can change if there is no value in doing it. I think because PCIe and CXL are both using AER as the transport they can both use cper_severity_to_aer(), or at a minimum do not introduce 'enum cxl_aer_err_type' and instead use the existing AER_ values. [..] > > All CPER records without a device-id have already been dropped above, so > > why reject agent-types that do not require a device-id here? > > > > I think this agent_info[] scheme makes the code more difficult to read > > especially since agent_info() is only consulted a couple times. Just put > > a "switch (prot_err->agent_type)" in the code directly and skip the > > indirection. > > Hmm, I initially thought I would do switch case and then changed it to > if-else thinking that would look cleaner. > > What would you suggest? Just incorporate switch case similar to > cper_print_prot_err() here as well or clean up switch case in > cper_print_prot_err() and reuse the agent_info[] there? Even though it is more lines, I find cper_print_prot_err() easier to read because I do not need to switch back and forth to the agent_info definition. > This agent_info[] would include 3 fields then, two as above and another > for valid_cap. Maybe unify sbdf with valid_cap.. agent_info[] ends up being code-logic masquerading as a data-structure. Keep the logic all in one coherent readable block. [..] > > Hmm, 'struct cxl_cper_event_rec' follows the raw format of the record > > from the specification, and 'struct cxl_cper_sec_prot_err' (formerly > > cper_sec_prot_err) already exists, so why is this new intermediate data > > structure needed? > > Yeah, the intention was to extract only necessary info to be consumed by > cxl_pci driver and to be passed to cxl_cper_fifo. > > Going forward, while handling protocol errors separately, I can send the > entire cxl_cper_sec_prot_err. I think it defensible to send all of it. Recall that the real consumer is not cxl_pci it is rasdaemon in userspace. The kernel is not in a good position to filter what rasdaemon expects to find in the record based on the EFI specification.