On 2/14/24 11:46 AM, Jonathan Cameron wrote: > On Wed, 14 Feb 2024 10:41:00 -0600 > Ben Cheatham <benjamin.cheatham@xxxxxxx> wrote: > >> On 2/14/24 9:27 AM, Jonathan Cameron wrote: >>> On Thu, 8 Feb 2024 14:00:41 -0600 >>> Ben Cheatham <Benjamin.Cheatham@xxxxxxx> wrote: >>> >>>> Implement CXL helper functions in the EINJ module for getting/injecting >>>> available CXL protocol error types and export them to sysfs under >>>> kernel/debug/cxl. >>>> >>>> The kernel/debug/cxl/einj_types file will print the available CXL >>>> protocol errors in the same format as the available_error_types >>>> file provided by the EINJ module. The >>>> kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the >>>> error_type and error_inject files provided by the EINJ module, i.e.: >>>> writing an error type into $dport_dev/einj_inject will inject said error >>>> type into the CXL dport represented by $dport_dev. >>>> >>>> Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> >>>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@xxxxxxx> >>> Hi Ben, >>> >>> Sorry I've not looked at this sooner. >>> >>> Anyhow, some comments inline. Mostly looks good to me. >>> >>> Jonathan >>> >>>> --- >>>> Documentation/ABI/testing/debugfs-cxl | 22 ++++ >>>> MAINTAINERS | 1 + >>>> drivers/acpi/apei/einj.c | 158 ++++++++++++++++++++++++-- >>>> drivers/cxl/core/port.c | 39 +++++++ >>>> include/linux/einj-cxl.h | 45 ++++++++ >>>> 5 files changed, 255 insertions(+), 10 deletions(-) >>>> create mode 100644 include/linux/einj-cxl.h >>>> >>>> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl >>>> index fe61d372e3fa..bcd985cca66a 100644 >>>> --- a/Documentation/ABI/testing/debugfs-cxl >>>> +++ b/Documentation/ABI/testing/debugfs-cxl >>>> @@ -33,3 +33,25 @@ Description: >>>> device cannot clear poison from the address, -ENXIO is returned. >>>> The clear_poison attribute is only visible for devices >>>> supporting the capability. >>>> + >>>> +What: /sys/kernel/debug/cxl/einj_types >>>> +Date: January, 2024 >>>> +KernelVersion: v6.9 >>>> +Contact: linux-cxl@xxxxxxxxxxxxxxx >>>> +Description: >>>> + (RO) Prints the CXL protocol error types made available by >>>> + the platform in the format "0x<error number> <error type>". >>>> + The <error number> can be written to einj_inject to inject >>>> + <error type> into a chosen dport. >>> >>> I think it's a limited set, so docs could include what the error_type values can >>> be? From this description it's not obvious they are human readable strings. >>> >> >> It is a limited set, but that set has 6 variants. It may make the description >> a bit long to include all of them, but I could include an example string instead? >> If length isn't an issue then I can add them all in. > > Example works. > I tried adding them all in and it didn't make the description too long, so I went ahead and did that instead of an example. >> >>>> + >>>> +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject >>>> +Date: January, 2024 >>>> +KernelVersion: v6.9 >>>> +Contact: linux-cxl@xxxxxxxxxxxxxxx >>>> +Description: >>>> + (WO) Writing an integer to this file injects the corresponding >>>> + CXL protocol error into $dport_dev ($dport_dev will be a device >>>> + name from /sys/bus/pci/devices). The integer to type mapping for >>>> + injection can be found by reading from einj_types. If the dport >>>> + was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise >>>> + a CXL 2.0 error is injected. >>>> diff --git a/MAINTAINERS b/MAINTAINERS >>>> index 9104430e148e..02d7feb2ed1f 100644 >>>> --- a/MAINTAINERS >>>> +++ b/MAINTAINERS >>>> @@ -5246,6 +5246,7 @@ L: linux-cxl@xxxxxxxxxxxxxxx >>>> S: Maintained >>>> F: drivers/cxl/ >>>> F: include/uapi/linux/cxl_mem.h >>>> +F: include/linux/einj-cxl.h >>>> F: tools/testing/cxl/ >>>> >>>> COMPUTE EXPRESS LINK PMU (CPMU) >>>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c >>>> index 73dde21d3e89..9137cc01f791 100644 >>>> --- a/drivers/acpi/apei/einj.c >>>> +++ b/drivers/acpi/apei/einj.c >>>> @@ -21,6 +21,7 @@ >>>> #include <linux/nmi.h> >>>> #include <linux/delay.h> >>>> #include <linux/mm.h> >>>> +#include <linux/einj-cxl.h> >>>> #include <linux/platform_device.h> >>>> #include <asm/unaligned.h> >>>> >>>> @@ -37,6 +38,20 @@ >>>> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \ >>>> ACPI_EINJ_MEMORY_UNCORRECTABLE | \ >>>> ACPI_EINJ_MEMORY_FATAL) >>>> +#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE >>>> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12) >>>> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13) >>>> +#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14) >>>> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15) >>>> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16) >>>> +#define ACPI_EINJ_CXL_MEM_FATAL BIT(17) >>>> +#endif >>>> +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \ >>>> + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \ >>>> + ACPI_EINJ_CXL_CACHE_FATAL | \ >>>> + ACPI_EINJ_CXL_MEM_CORRECTABLE | \ >>>> + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \ >>>> + ACPI_EINJ_CXL_MEM_FATAL) >>>> >>>> /* >>>> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action. >>>> @@ -543,8 +558,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, >>>> if (type & ACPI5_VENDOR_BIT) { >>>> if (vendor_flags != SETWA_FLAGS_MEM) >>>> goto inject; >>>> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) >>>> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) { >>>> goto inject; >>>> + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) { >>>> + goto inject; >>>> + } >>>> >>>> /* >>>> * Disallow crazy address masks that give BIOS leeway to pick >>>> @@ -596,6 +614,9 @@ static const char * const einj_error_type_string[] = { >>>> "0x00000200\tPlatform Correctable\n", >>>> "0x00000400\tPlatform Uncorrectable non-fatal\n", >>>> "0x00000800\tPlatform Uncorrectable fatal\n", >>>> +}; >>>> + >>>> +static const char * const einj_cxl_error_type_string[] = { >>>> "0x00001000\tCXL.cache Protocol Correctable\n", >>>> "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n", >>>> "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n", >>>> @@ -621,29 +642,44 @@ static int available_error_type_show(struct seq_file *m, void *v) >>>> >>>> DEFINE_SHOW_ATTRIBUTE(available_error_type); >>>> >>>> -static int error_type_get(void *data, u64 *val) >>>> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v) >>>> { >>>> - *val = error_type; >>>> + int cxl_err, rc; >>>> + u32 available_error_type = 0; >>>> + >>>> + if (!einj_initialized) >>>> + return -ENXIO; >>>> + >>>> + rc = einj_get_available_error_type(&available_error_type); >>>> + if (rc) >>>> + return rc; >>>> + >>>> + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) { >>> >>> Trivial so feel free to ignore but, I'd stick to local styles and have pos >>> declared in more traditional c style. >>> >> >> Will do. >> >>>> + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos; >>> >>> Maybe clearer as >>> cxl_err = FIELD_PREP(CXL_ERROR_MASK, BIT(pos)); >>> >> >> I'll think about it. I think I agree with you, but I've seen a good amount of >> people who aren't familiar with the FIELD_* macros in which case it isn't much clearer. > > Lets teach them ;) > Sounds good! Thanks, Ben