RE: [PATCH v12 0/3] cxl, EINJ: Update EINJ for CXL error types

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > Couple of oddities:
> >
> > 1) I built as a module (CONFIG_ACPI_APEI_EINJ=m) like I normally do.
> >    But this was autoloaded and EINJ initialized during boot:
> >
> > [   33.909111] EINJ: Error INJection is initialized.
>
> In the current code it should only load if cxl_core.ko is also loaded.
>
> Can you share the output of lsmod to maybe see which module loaded that
> dependency?
>
> > I'm wondering if that might be a problem for anyone that likes to
> > leave the einj module not loaded until they have some need to
> > inject errors.
>
> That is a behavior change of this approach. Is it a problem?
>
> If it is I would say that we need to break out a new cxl_einj.ko module
> that when it loads walks the CXL topology and creates the debugfs files.
> Otherwise my assumption is that CONFIG_CXL_EINJ=y means that cxl_core.ko
> loads einj.ko unconditionally.
>
> I would save that work for a clear description of why einj.ko should not
> be resident.

Personally, it would save me having to type "modprobe einj" to run tests (and
answer e-mails from validation folks telling they missed this step).

But others might feels this is unwanted. It looks like some distros build kernels
with CONFIG_ACPI_APEI_EINJ=m.

On the other hand, EINJ should be under control of a BIOS option that
defaults to "off". So production systems won't enable it.

But perhaps there will be a pr_warn() or pr_err() during boot. One of these will likely trip:

	pr_warn("EINJ table not found.\n");
	pr_err("Failed to get EINJ table: %s\n", acpi_format_exception(status));
	pr_warn(FW_BUG "Invalid EINJ table.\n");
	pr_err("Error collecting EINJ resources.\n");

>
> > 2) Even though my system doesn't have any CXL support, I found this:
> >
> > # cat /sys/kernel/debug/cxl/einj_types
> > 0x00001000      CXL.cache Protocol Correctable
> >
> > What does this mean?
>
> Strange, does:
>
> /sys/kernel/debug/einj/available_error_type
>
> ...show the same even before these patches? I.e. maybe this pre-CXL BIOS was
> using the 0x1000 encoding when it should not?

I added a printk() to show the raw value returned by my BIOS: 0x80001038

So your guess is correct. By BIOS is setting 0x1000 bit when it shouldn't.
>
> > Using ras-tools I injected some DDR memory errors. So legacy
> > functionality still works OK.

-Tony





[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux