On 2/10/24 12:31 AM, Dan Williams wrote: > Dan Williams wrote: >> Ben Cheatham wrote: >>> >>> >>> On 2/8/24 2:00 PM, Ben Cheatham wrote: >>>> Implement CXL helper functions in the EINJ module for getting/injecting >>>> available CXL protocol error types and export them to sysfs under >>>> kernel/debug/cxl. >>>> >>>> The kernel/debug/cxl/einj_types file will print the available CXL >>>> protocol errors in the same format as the available_error_types >>>> file provided by the EINJ module. The >>>> kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the >>>> error_type and error_inject files provided by the EINJ module, i.e.: >>>> writing an error type into $dport_dev/einj_inject will inject said error >>>> type into the CXL dport represented by $dport_dev. >>>> >>>> Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> >>>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@xxxxxxx> >>>> --- >>>> Documentation/ABI/testing/debugfs-cxl | 22 ++++ >>>> MAINTAINERS | 1 + >>>> drivers/acpi/apei/einj.c | 158 ++++++++++++++++++++++++-- >>>> drivers/cxl/core/port.c | 39 +++++++ >>>> include/linux/einj-cxl.h | 45 ++++++++ >>>> 5 files changed, 255 insertions(+), 10 deletions(-) >>>> create mode 100644 include/linux/einj-cxl.h >>>> >>>> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl >>>> index fe61d372e3fa..bcd985cca66a 100644 >>>> --- a/Documentation/ABI/testing/debugfs-cxl >>>> +++ b/Documentation/ABI/testing/debugfs-cxl >>>> @@ -33,3 +33,25 @@ Description: >>>> device cannot clear poison from the address, -ENXIO is returned. >>>> The clear_poison attribute is only visible for devices >>>> supporting the capability. >>>> + >>>> +What: /sys/kernel/debug/cxl/einj_types >>>> +Date: January, 2024 >>>> +KernelVersion: v6.9 >>>> +Contact: linux-cxl@xxxxxxxxxxxxxxx >>>> +Description: >>>> + (RO) Prints the CXL protocol error types made available by >>>> + the platform in the format "0x<error number> <error type>". >>>> + The <error number> can be written to einj_inject to inject >>>> + <error type> into a chosen dport. >>>> + >>>> +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject >>>> +Date: January, 2024 >>>> +KernelVersion: v6.9 >>>> +Contact: linux-cxl@xxxxxxxxxxxxxxx >>>> +Description: >>>> + (WO) Writing an integer to this file injects the corresponding >>>> + CXL protocol error into $dport_dev ($dport_dev will be a device >>>> + name from /sys/bus/pci/devices). The integer to type mapping for >>>> + injection can be found by reading from einj_types. If the dport >>>> + was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise >>>> + a CXL 2.0 error is injected. >>>> diff --git a/MAINTAINERS b/MAINTAINERS >>>> index 9104430e148e..02d7feb2ed1f 100644 >>>> --- a/MAINTAINERS >>>> +++ b/MAINTAINERS >>>> @@ -5246,6 +5246,7 @@ L: linux-cxl@xxxxxxxxxxxxxxx >>>> S: Maintained >>>> F: drivers/cxl/ >>>> F: include/uapi/linux/cxl_mem.h >>>> +F: include/linux/einj-cxl.h >>>> F: tools/testing/cxl/ >>>> >>>> COMPUTE EXPRESS LINK PMU (CPMU) >>>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c >>>> index 73dde21d3e89..9137cc01f791 100644 >>>> --- a/drivers/acpi/apei/einj.c >>>> +++ b/drivers/acpi/apei/einj.c >>>> @@ -21,6 +21,7 @@ >>>> #include <linux/nmi.h> >>>> #include <linux/delay.h> >>>> #include <linux/mm.h> >>>> +#include <linux/einj-cxl.h> >>>> #include <linux/platform_device.h> >>>> #include <asm/unaligned.h> >>>> >>>> @@ -37,6 +38,20 @@ >>>> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \ >>>> ACPI_EINJ_MEMORY_UNCORRECTABLE | \ >>>> ACPI_EINJ_MEMORY_FATAL) >>>> +#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE >>>> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12) >>>> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13) >>>> +#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14) >>>> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15) >>>> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16) >>>> +#define ACPI_EINJ_CXL_MEM_FATAL BIT(17) >>>> +#endif >>>> +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \ >>>> + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \ >>>> + ACPI_EINJ_CXL_CACHE_FATAL | \ >>>> + ACPI_EINJ_CXL_MEM_CORRECTABLE | \ >>>> + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \ >>>> + ACPI_EINJ_CXL_MEM_FATAL) >>>> >>>> /* >>>> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action. >>>> @@ -543,8 +558,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, >>>> if (type & ACPI5_VENDOR_BIT) { >>>> if (vendor_flags != SETWA_FLAGS_MEM) >>>> goto inject; >>>> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) >>>> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) { >>>> goto inject; >>>> + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) { >>>> + goto inject; >>>> + } >>>> >>>> /* >>>> * Disallow crazy address masks that give BIOS leeway to pick >>>> @@ -596,6 +614,9 @@ static const char * const einj_error_type_string[] = { >>>> "0x00000200\tPlatform Correctable\n", >>>> "0x00000400\tPlatform Uncorrectable non-fatal\n", >>>> "0x00000800\tPlatform Uncorrectable fatal\n", >>>> +}; >>>> + >>>> +static const char * const einj_cxl_error_type_string[] = { >>>> "0x00001000\tCXL.cache Protocol Correctable\n", >>>> "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n", >>>> "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n", >>>> @@ -621,29 +642,44 @@ static int available_error_type_show(struct seq_file *m, void *v) >>>> >>>> DEFINE_SHOW_ATTRIBUTE(available_error_type); >>>> >>>> -static int error_type_get(void *data, u64 *val) >>>> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v) >>>> { >>>> - *val = error_type; >>>> + int cxl_err, rc; >>>> + u32 available_error_type = 0; >>>> + >>>> + if (!einj_initialized) >>>> + return -ENXIO; >>>> + >>>> + rc = einj_get_available_error_type(&available_error_type); >>>> + if (rc) >>>> + return rc; >>>> + >>>> + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) { >>>> + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos; >>>> + >>>> + if (available_error_type & cxl_err) >>>> + seq_puts(m, einj_cxl_error_type_string[pos]); >>>> + } >>>> >>>> return 0; >>>> } >>>> +EXPORT_SYMBOL_NS_GPL(einj_cxl_available_error_type_show, CXL); >>>> >>>> -static int error_type_set(void *data, u64 val) >>>> +static int validate_error_type(u64 type) >>>> { >>>> + u32 tval, vendor, available_error_type = 0; >>>> int rc; >>>> - u32 available_error_type = 0; >>>> - u32 tval, vendor; >>>> >>>> /* Only low 32 bits for error type are valid */ >>>> - if (val & GENMASK_ULL(63, 32)) >>>> + if (type & GENMASK_ULL(63, 32)) >>>> return -EINVAL; >>>> >>>> /* >>>> * Vendor defined types have 0x80000000 bit set, and >>>> * are not enumerated by ACPI_EINJ_GET_ERROR_TYPE >>>> */ >>>> - vendor = val & ACPI5_VENDOR_BIT; >>>> - tval = val & 0x7fffffff; >>>> + vendor = type & ACPI5_VENDOR_BIT; >>>> + tval = type & 0x7fffffff; >>>> >>>> /* Only one error type can be specified */ >>>> if (tval & (tval - 1)) >>>> @@ -652,9 +688,105 @@ static int error_type_set(void *data, u64 val) >>>> rc = einj_get_available_error_type(&available_error_type); >>>> if (rc) >>>> return rc; >>>> - if (!(val & available_error_type)) >>>> + if (!(type & available_error_type)) >>>> return -EINVAL; >>>> } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int cxl_dport_get_sbdf(struct pci_dev *dport_dev, u64 *sbdf) >>>> +{ >>>> + struct pci_bus *pbus; >>>> + struct pci_host_bridge *bridge; >>>> + u64 seg = 0, bus; >>>> + >>>> + pbus = dport_dev->bus; >>>> + bridge = pci_find_host_bridge(pbus); >>>> + >>>> + if (!bridge) >>>> + return -ENODEV; >>>> + >>>> + if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET) >>>> + seg = bridge->domain_nr << 24; >>>> + >>>> + bus = pbus->number << 16; >>>> + *sbdf = seg | bus | dport_dev->devfn; >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +int einj_cxl_inject_rch_error(u64 rcrb, u64 type) >>>> +{ >>>> + u64 param1 = 0, param2 = 0, param4 = 0; >>>> + u32 flags; >>>> + int rc; >>>> + >>>> + if (!einj_initialized) >>>> + return -ENXIO; >>>> + >>>> + /* Only CXL error types can be specified */ >>>> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT)) >>>> + return -EINVAL; >>>> + >>>> + rc = validate_error_type(type); >>>> + if (rc) >>>> + return rc; >>>> + >>>> + param1 = (u64) rcrb; >>>> + param2 = 0xfffffffffffff000; >>>> + flags = 0x2; >>>> + >>>> + return einj_error_inject(type, flags, param1, param2, 0, param4); >>>> +} >>>> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_rch_error, CXL); >>>> + >>>> +int einj_cxl_inject_error(struct pci_dev *dport, u64 type) >>>> +{ >>>> + u64 param1 = 0, param2 = 0, param4 = 0; >>>> + u32 flags; >>>> + int rc; >>>> + >>>> + if (!einj_initialized) >>>> + return -ENXIO; >>>> + >>>> + /* Only CXL error types can be specified */ >>>> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT)) >>>> + return -EINVAL; >>>> + >>>> + rc = validate_error_type(type); >>>> + if (rc) >>>> + return rc; >>>> + >>>> + rc = cxl_dport_get_sbdf(dport, ¶m4); >>>> + if (rc) >>>> + return rc; >>>> + >>>> + flags = 0x4; >>>> + >>>> + return einj_error_inject(type, flags, param1, param2, 0, param4); >>>> +} >>>> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_error, CXL); >>>> + >>>> +static int error_type_get(void *data, u64 *val) >>>> +{ >>>> + *val = error_type; >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int error_type_set(void *data, u64 val) >>>> +{ >>>> + int rc; >>>> + >>>> + /* CXL error types have to be injected from cxl debugfs */ >>>> + if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT)) >>>> + return -EINVAL; >>>> + >>>> + rc = validate_error_type(val); >>>> + if (rc) >>>> + return rc; >>>> + >>>> error_type = val; >>>> >>>> return 0; >>>> @@ -690,6 +822,12 @@ static int einj_check_table(struct acpi_table_einj *einj_tab) >>>> return 0; >>>> } >>>> >>>> +bool einj_is_initialized(void) >>>> +{ >>>> + return einj_initialized; >>>> +} >>>> +EXPORT_SYMBOL_GPL(einj_is_initialized); >>>> + >>>> static int __init einj_probe(struct platform_device *pdev) >>>> { >>>> int rc; >>>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c >>>> index 8c00fd6be730..c52c92222be5 100644 >>>> --- a/drivers/cxl/core/port.c >>>> +++ b/drivers/cxl/core/port.c >>>> @@ -3,6 +3,7 @@ >>>> #include <linux/platform_device.h> >>>> #include <linux/memregion.h> >>>> #include <linux/workqueue.h> >>>> +#include <linux/einj-cxl.h> >>>> #include <linux/debugfs.h> >>>> #include <linux/device.h> >>>> #include <linux/module.h> >>>> @@ -797,6 +798,37 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport, >>>> return rc; >>>> } >>>> >>>> +DEFINE_SHOW_ATTRIBUTE(einj_cxl_available_error_type); >>>> + >>>> +static int cxl_einj_inject(void *data, u64 type) >>>> +{ >>>> + struct cxl_dport *dport = data; >>>> + >>>> + if (dport->rch) >>>> + return einj_cxl_inject_rch_error(dport->rcrb.base, type); >>>> + >>>> + return einj_cxl_inject_error(to_pci_dev(dport->dport_dev), type); >>>> +} >>>> +DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, "%llx\n"); >>>> + >>>> +static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport) >>>> +{ >>>> + struct dentry *dir; >>>> + >>>> + /* >>>> + * dport_dev needs to be a PCIe port for CXL 2.0+ ports because >>>> + * EINJ expects a dport SBDF to be specified for 2.0 error injection. >>>> + */ >>>> + if (!einj_is_initialized() || >>>> + (!dport->rch && !dev_is_pci(dport->dport_dev))) >>>> + return; >>>> + >>>> + dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev)); >>>> + >>>> + debugfs_create_file("einj_inject", 0200, dir, dport, >>>> + &cxl_einj_inject_fops); >>>> +} >>>> + >>>> static struct cxl_port *__devm_cxl_add_port(struct device *host, >>>> struct device *uport_dev, >>>> resource_size_t component_reg_phys, >>>> @@ -1144,6 +1176,8 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev, >>>> if (dev_is_pci(dport_dev)) >>>> dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev)); >>>> >>>> + cxl_debugfs_create_dport_dir(dport); >>>> + >>>> return dport; >>>> } >>>> >>>> @@ -2220,6 +2254,11 @@ static __init int cxl_core_init(void) >>>> >>>> cxl_debugfs = debugfs_create_dir("cxl", NULL); >>>> >>>> + if (einj_is_initialized()) { >>>> + debugfs_create_file("einj_types", 0400, cxl_debugfs, NULL, >>>> + &einj_cxl_available_error_type_fops); >>>> + } >>>> + >>>> cxl_mbox_init(); >>>> >>>> rc = cxl_memdev_init(); >>>> diff --git a/include/linux/einj-cxl.h b/include/linux/einj-cxl.h >>>> new file mode 100644 >>>> index 000000000000..af57cc8580a6 >>>> --- /dev/null >>>> +++ b/include/linux/einj-cxl.h >>>> @@ -0,0 +1,45 @@ >>>> +/* SPDX-License-Identifier: GPL-2.0-or-later */ >>>> +/* >>>> + * CXL protocol Error INJection support. >>>> + * >>>> + * Copyright (c) 2023 Advanced Micro Devices, Inc. >>>> + * All Rights Reserved. >>>> + * >>>> + * Author: Ben Cheatham <benjamin.cheatham@xxxxxxx> >>>> + */ >>>> +#ifndef CXL_EINJ_H >>>> +#define CXL_EINJ_H >>>> + >>>> +#include <linux/pci.h> >>>> + >>>> +#if IS_ENABLED(CONFIG_ACPI_APEI_EINJ) >>> >>> I was testing some other work using this kernel and was getting a linker error >>> when I had CXL_BUS set to 'y' and ACPI_APEI_EINJ set to 'm'. I went ahead and >>> solved it by changing the line above to use IS_REACHABLE() instead. The change >>> shouldn't actually affect the functionality of the patch since this is in a build >>> configuration where the CXL driver isn't using the EINJ module since CONFIG_CXL_EINJ >>> is set to n due to Kconfig restraints. >>> >>> I can submit another version of this series with this fix, but I don't think it >>> makes much sense for a 1 line change, so I've but a diff below with the change >>> for whoever ends up putting this in their tree: >> >> I don't think this is the right fix. CONFIG_CXL_EINJ should be disabled when >> CONFIG_CXL_BUS is built-in. Looks like the polarity the dependency is >> backwards. The problem with IS_REACHABLE() is it removes the simple >> debug option of "send me your .config" to see why some functionality is >> not turned on. >> >> Could you test out flipping the dependency from: >> >> depends on ACPI_APEI_EINJ >= CXL_BUS >> >> ...to: >> >> depends on ACPI_APEI_EINJ <= CXL_BUS > > Wait, no y == 2 and m == 1, so: > > depends on ACPI_APEI_EINJ >= CXL_BUS > > ...is correct, but you need IS_ENABLED(CONFIG_CXL_EINJ), not > IS_REACHABLE to guard those exports. Ok, I'll test it out today and let you know if it works. Thanks, Ben