Tony Luck wrote: > On Wed, Feb 14, 2024 at 02:07:06PM -0600, Ben Cheatham wrote: > > v12 Changes: > > - Rebase onto v6.8-rc4 > > - Squash Kconfig patch into patch 2/3 (Jonathan) > > - Change CONFIG_CXL_EINJ from "depends on ACPI_APEI_EINJ >= CXL_BUS" > > to "depends on ACPI_APEI_EINJ = CXL_BUS" > > - Drop "ACPI, APEI" part of commit message title and use just EINJ > > instead (Dan) > > - Add protocol error types to "einj_types" documentation (Jonathan) > > - Change 0xffff... constants to use GENMASK() > > - Drop param* variables and use constants instead in cxl error > > inject functions (Jonathan) > > - Add is_cxl_error_type() helper function in einj.c (Jonathan) > > - Remove a stray function declaration in einj-cxl.h (Jonathan) > > - Comment #else/#endifs with corresponding #if/#ifdef in > > einj-cxl.h (Jonathan) > > > > v11 Changes: > > - Drop patch 2/6 (Add CXL protocol error defines) and put the > > defines in patch 4/6 instead (Dan) > > - Add Dan's reviewed-by > > > > The new CXL error types will use the Memory Address field in the > > SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1 > > compliant memory-mapped downstream port. The value of the memory address > > will be in the port's MMIO range, and it will not represent physical > > (normal or persistent) memory. > > > > Add the functionality for injecting CXL 1.1 errors to the EINJ module, > > but not through the EINJ legacy interface under /sys/kernel/debug/apei/einj. > > Instead, make the error types available under /sys/kernel/debug/cxl. > > This allows for validating the MMIO address for a CXL 1.1 error type > > while also not making the user responsible for finding it. > > I tried this series on an Intel Icelake (which as far as I know > doesn't support CXL). Thanks Tony! > Couple of oddities: > > 1) I built as a module (CONFIG_ACPI_APEI_EINJ=m) like I normally do. > But this was autoloaded and EINJ initialized during boot: > > [ 33.909111] EINJ: Error INJection is initialized. In the current code it should only load if cxl_core.ko is also loaded. Can you share the output of lsmod to maybe see which module loaded that dependency? > I'm wondering if that might be a problem for anyone that likes to > leave the einj module not loaded until they have some need to > inject errors. That is a behavior change of this approach. Is it a problem? If it is I would say that we need to break out a new cxl_einj.ko module that when it loads walks the CXL topology and creates the debugfs files. Otherwise my assumption is that CONFIG_CXL_EINJ=y means that cxl_core.ko loads einj.ko unconditionally. I would save that work for a clear description of why einj.ko should not be resident. > 2) Even though my system doesn't have any CXL support, I found this: > > # cat /sys/kernel/debug/cxl/einj_types > 0x00001000 CXL.cache Protocol Correctable > > What does this mean? Strange, does: /sys/kernel/debug/einj/available_error_type ...show the same even before these patches? I.e. maybe this pre-CXL BIOS was using the 0x1000 encoding when it should not? > Using ras-tools I injected some DDR memory errors. So legacy > functionality still works OK. > > -Tony