> > Couple of oddities: > > > > 1) I built as a module (CONFIG_ACPI_APEI_EINJ=m) like I normally do. > > But this was autoloaded and EINJ initialized during boot: > > > > [ 33.909111] EINJ: Error INJection is initialized. > > In the current code it should only load if cxl_core.ko is also loaded. > > Can you share the output of lsmod to maybe see which module loaded that > dependency? > > > I'm wondering if that might be a problem for anyone that likes to > > leave the einj module not loaded until they have some need to > > inject errors. > > That is a behavior change of this approach. Is it a problem? > > If it is I would say that we need to break out a new cxl_einj.ko module > that when it loads walks the CXL topology and creates the debugfs files. > Otherwise my assumption is that CONFIG_CXL_EINJ=y means that cxl_core.ko > loads einj.ko unconditionally. > > I would save that work for a clear description of why einj.ko should not > be resident. Personally, it would save me having to type "modprobe einj" to run tests (and answer e-mails from validation folks telling they missed this step). But others might feels this is unwanted. It looks like some distros build kernels with CONFIG_ACPI_APEI_EINJ=m. On the other hand, EINJ should be under control of a BIOS option that defaults to "off". So production systems won't enable it. But perhaps there will be a pr_warn() or pr_err() during boot. One of these will likely trip: pr_warn("EINJ table not found.\n"); pr_err("Failed to get EINJ table: %s\n", acpi_format_exception(status)); pr_warn(FW_BUG "Invalid EINJ table.\n"); pr_err("Error collecting EINJ resources.\n"); > > > 2) Even though my system doesn't have any CXL support, I found this: > > > > # cat /sys/kernel/debug/cxl/einj_types > > 0x00001000 CXL.cache Protocol Correctable > > > > What does this mean? > > Strange, does: > > /sys/kernel/debug/einj/available_error_type > > ...show the same even before these patches? I.e. maybe this pre-CXL BIOS was > using the 0x1000 encoding when it should not? I added a printk() to show the raw value returned by my BIOS: 0x80001038 So your guess is correct. By BIOS is setting 0x1000 bit when it shouldn't. > > > Using ras-tools I injected some DDR memory errors. So legacy > > functionality still works OK. -Tony