Re: [PATCH v12 0/3] cxl, EINJ: Update EINJ for CXL error types

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tony Luck wrote:
> On Wed, Feb 14, 2024 at 02:07:06PM -0600, Ben Cheatham wrote:
> > v12 Changes:
> > 	- Rebase onto v6.8-rc4
> > 	- Squash Kconfig patch into patch 2/3 (Jonathan)
> > 	- Change CONFIG_CXL_EINJ from "depends on ACPI_APEI_EINJ >= CXL_BUS"
> > 	  to "depends on ACPI_APEI_EINJ = CXL_BUS"
> > 	- Drop "ACPI, APEI" part of commit message title and use just EINJ
> > 	instead (Dan)
> > 	- Add protocol error types to "einj_types" documentation (Jonathan)
> > 	- Change 0xffff... constants to use GENMASK()
> > 	- Drop param* variables and use constants instead in cxl error
> > 	  inject functions (Jonathan)
> > 	- Add is_cxl_error_type() helper function in einj.c (Jonathan)
> > 	- Remove a stray function declaration in einj-cxl.h (Jonathan)
> > 	- Comment #else/#endifs with corresponding #if/#ifdef in
> > 	einj-cxl.h (Jonathan)
> > 
> > v11 Changes:
> > 	- Drop patch 2/6 (Add CXL protocol error defines) and put the
> > 	  defines in patch 4/6 instead (Dan)
> > 	- Add Dan's reviewed-by
> > 
> > The new CXL error types will use the Memory Address field in the
> > SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
> > compliant memory-mapped downstream port. The value of the memory address
> > will be in the port's MMIO range, and it will not represent physical
> > (normal or persistent) memory.
> > 
> > Add the functionality for injecting CXL 1.1 errors to the EINJ module,
> > but not through the EINJ legacy interface under /sys/kernel/debug/apei/einj.
> > Instead, make the error types available under /sys/kernel/debug/cxl.
> > This allows for validating the MMIO address for a CXL 1.1 error type
> > while also not making the user responsible for finding it.
> 
> I tried this series on an Intel Icelake (which as far as I know
> doesn't support CXL).

Thanks Tony!

> Couple of oddities:
> 
> 1) I built as a module (CONFIG_ACPI_APEI_EINJ=m) like I normally do.
>    But this was autoloaded and EINJ initialized during boot:
> 
> [   33.909111] EINJ: Error INJection is initialized.

In the current code it should only load if cxl_core.ko is also loaded.

Can you share the output of lsmod to maybe see which module loaded that
dependency?

> I'm wondering if that might be a problem for anyone that likes to
> leave the einj module not loaded until they have some need to
> inject errors.

That is a behavior change of this approach. Is it a problem?

If it is I would say that we need to break out a new cxl_einj.ko module
that when it loads walks the CXL topology and creates the debugfs files.
Otherwise my assumption is that CONFIG_CXL_EINJ=y means that cxl_core.ko
loads einj.ko unconditionally.

I would save that work for a clear description of why einj.ko should not
be resident.

> 2) Even though my system doesn't have any CXL support, I found this:
> 
> # cat /sys/kernel/debug/cxl/einj_types
> 0x00001000      CXL.cache Protocol Correctable
> 
> What does this mean?

Strange, does:

/sys/kernel/debug/einj/available_error_type

...show the same even before these patches? I.e. maybe this pre-CXL BIOS was
using the 0x1000 encoding when it should not?

> Using ras-tools I injected some DDR memory errors. So legacy
> functionality still works OK.
> 
> -Tony







[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux