On Wed, 14 Feb 2024 10:41:00 -0600 Ben Cheatham <benjamin.cheatham@xxxxxxx> wrote: > On 2/14/24 9:27 AM, Jonathan Cameron wrote: > > On Thu, 8 Feb 2024 14:00:41 -0600 > > Ben Cheatham <Benjamin.Cheatham@xxxxxxx> wrote: > > > >> Implement CXL helper functions in the EINJ module for getting/injecting > >> available CXL protocol error types and export them to sysfs under > >> kernel/debug/cxl. > >> > >> The kernel/debug/cxl/einj_types file will print the available CXL > >> protocol errors in the same format as the available_error_types > >> file provided by the EINJ module. The > >> kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the > >> error_type and error_inject files provided by the EINJ module, i.e.: > >> writing an error type into $dport_dev/einj_inject will inject said error > >> type into the CXL dport represented by $dport_dev. > >> > >> Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> > >> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@xxxxxxx> > > Hi Ben, > > > > Sorry I've not looked at this sooner. > > > > Anyhow, some comments inline. Mostly looks good to me. > > > > Jonathan > > > >> --- > >> Documentation/ABI/testing/debugfs-cxl | 22 ++++ > >> MAINTAINERS | 1 + > >> drivers/acpi/apei/einj.c | 158 ++++++++++++++++++++++++-- > >> drivers/cxl/core/port.c | 39 +++++++ > >> include/linux/einj-cxl.h | 45 ++++++++ > >> 5 files changed, 255 insertions(+), 10 deletions(-) > >> create mode 100644 include/linux/einj-cxl.h > >> > >> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl > >> index fe61d372e3fa..bcd985cca66a 100644 > >> --- a/Documentation/ABI/testing/debugfs-cxl > >> +++ b/Documentation/ABI/testing/debugfs-cxl > >> @@ -33,3 +33,25 @@ Description: > >> device cannot clear poison from the address, -ENXIO is returned. > >> The clear_poison attribute is only visible for devices > >> supporting the capability. > >> + > >> +What: /sys/kernel/debug/cxl/einj_types > >> +Date: January, 2024 > >> +KernelVersion: v6.9 > >> +Contact: linux-cxl@xxxxxxxxxxxxxxx > >> +Description: > >> + (RO) Prints the CXL protocol error types made available by > >> + the platform in the format "0x<error number> <error type>". > >> + The <error number> can be written to einj_inject to inject > >> + <error type> into a chosen dport. > > > > I think it's a limited set, so docs could include what the error_type values can > > be? From this description it's not obvious they are human readable strings. > > > > It is a limited set, but that set has 6 variants. It may make the description > a bit long to include all of them, but I could include an example string instead? > If length isn't an issue then I can add them all in. Example works. > > >> + > >> +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject > >> +Date: January, 2024 > >> +KernelVersion: v6.9 > >> +Contact: linux-cxl@xxxxxxxxxxxxxxx > >> +Description: > >> + (WO) Writing an integer to this file injects the corresponding > >> + CXL protocol error into $dport_dev ($dport_dev will be a device > >> + name from /sys/bus/pci/devices). The integer to type mapping for > >> + injection can be found by reading from einj_types. If the dport > >> + was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise > >> + a CXL 2.0 error is injected. > >> diff --git a/MAINTAINERS b/MAINTAINERS > >> index 9104430e148e..02d7feb2ed1f 100644 > >> --- a/MAINTAINERS > >> +++ b/MAINTAINERS > >> @@ -5246,6 +5246,7 @@ L: linux-cxl@xxxxxxxxxxxxxxx > >> S: Maintained > >> F: drivers/cxl/ > >> F: include/uapi/linux/cxl_mem.h > >> +F: include/linux/einj-cxl.h > >> F: tools/testing/cxl/ > >> > >> COMPUTE EXPRESS LINK PMU (CPMU) > >> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c > >> index 73dde21d3e89..9137cc01f791 100644 > >> --- a/drivers/acpi/apei/einj.c > >> +++ b/drivers/acpi/apei/einj.c > >> @@ -21,6 +21,7 @@ > >> #include <linux/nmi.h> > >> #include <linux/delay.h> > >> #include <linux/mm.h> > >> +#include <linux/einj-cxl.h> > >> #include <linux/platform_device.h> > >> #include <asm/unaligned.h> > >> > >> @@ -37,6 +38,20 @@ > >> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \ > >> ACPI_EINJ_MEMORY_UNCORRECTABLE | \ > >> ACPI_EINJ_MEMORY_FATAL) > >> +#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE > >> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12) > >> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13) > >> +#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14) > >> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15) > >> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16) > >> +#define ACPI_EINJ_CXL_MEM_FATAL BIT(17) > >> +#endif > >> +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \ > >> + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \ > >> + ACPI_EINJ_CXL_CACHE_FATAL | \ > >> + ACPI_EINJ_CXL_MEM_CORRECTABLE | \ > >> + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \ > >> + ACPI_EINJ_CXL_MEM_FATAL) > >> > >> /* > >> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action. > >> @@ -543,8 +558,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, > >> if (type & ACPI5_VENDOR_BIT) { > >> if (vendor_flags != SETWA_FLAGS_MEM) > >> goto inject; > >> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) > >> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) { > >> goto inject; > >> + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) { > >> + goto inject; > >> + } > >> > >> /* > >> * Disallow crazy address masks that give BIOS leeway to pick > >> @@ -596,6 +614,9 @@ static const char * const einj_error_type_string[] = { > >> "0x00000200\tPlatform Correctable\n", > >> "0x00000400\tPlatform Uncorrectable non-fatal\n", > >> "0x00000800\tPlatform Uncorrectable fatal\n", > >> +}; > >> + > >> +static const char * const einj_cxl_error_type_string[] = { > >> "0x00001000\tCXL.cache Protocol Correctable\n", > >> "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n", > >> "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n", > >> @@ -621,29 +642,44 @@ static int available_error_type_show(struct seq_file *m, void *v) > >> > >> DEFINE_SHOW_ATTRIBUTE(available_error_type); > >> > >> -static int error_type_get(void *data, u64 *val) > >> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v) > >> { > >> - *val = error_type; > >> + int cxl_err, rc; > >> + u32 available_error_type = 0; > >> + > >> + if (!einj_initialized) > >> + return -ENXIO; > >> + > >> + rc = einj_get_available_error_type(&available_error_type); > >> + if (rc) > >> + return rc; > >> + > >> + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) { > > > > Trivial so feel free to ignore but, I'd stick to local styles and have pos > > declared in more traditional c style. > > > > Will do. > > >> + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos; > > > > Maybe clearer as > > cxl_err = FIELD_PREP(CXL_ERROR_MASK, BIT(pos)); > > > > I'll think about it. I think I agree with you, but I've seen a good amount of > people who aren't familiar with the FIELD_* macros in which case it isn't much clearer. Lets teach them ;)