Re: [PATCH v12 0/3] cxl, EINJ: Update EINJ for CXL error types

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/14/24 8:53 PM, Dan Williams wrote:
> Tony Luck wrote:
>> On Wed, Feb 14, 2024 at 02:07:06PM -0600, Ben Cheatham wrote:
>>> v12 Changes:
>>> 	- Rebase onto v6.8-rc4
>>> 	- Squash Kconfig patch into patch 2/3 (Jonathan)
>>> 	- Change CONFIG_CXL_EINJ from "depends on ACPI_APEI_EINJ >= CXL_BUS"
>>> 	  to "depends on ACPI_APEI_EINJ = CXL_BUS"
>>> 	- Drop "ACPI, APEI" part of commit message title and use just EINJ
>>> 	instead (Dan)
>>> 	- Add protocol error types to "einj_types" documentation (Jonathan)
>>> 	- Change 0xffff... constants to use GENMASK()
>>> 	- Drop param* variables and use constants instead in cxl error
>>> 	  inject functions (Jonathan)
>>> 	- Add is_cxl_error_type() helper function in einj.c (Jonathan)
>>> 	- Remove a stray function declaration in einj-cxl.h (Jonathan)
>>> 	- Comment #else/#endifs with corresponding #if/#ifdef in
>>> 	einj-cxl.h (Jonathan)
>>>
>>> v11 Changes:
>>> 	- Drop patch 2/6 (Add CXL protocol error defines) and put the
>>> 	  defines in patch 4/6 instead (Dan)
>>> 	- Add Dan's reviewed-by
>>>
>>> The new CXL error types will use the Memory Address field in the
>>> SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
>>> compliant memory-mapped downstream port. The value of the memory address
>>> will be in the port's MMIO range, and it will not represent physical
>>> (normal or persistent) memory.
>>>
>>> Add the functionality for injecting CXL 1.1 errors to the EINJ module,
>>> but not through the EINJ legacy interface under /sys/kernel/debug/apei/einj.
>>> Instead, make the error types available under /sys/kernel/debug/cxl.
>>> This allows for validating the MMIO address for a CXL 1.1 error type
>>> while also not making the user responsible for finding it.
>>
>> I tried this series on an Intel Icelake (which as far as I know
>> doesn't support CXL).
> 
> Thanks Tony!
> 
>> Couple of oddities:
>>
>> 1) I built as a module (CONFIG_ACPI_APEI_EINJ=m) like I normally do.
>>    But this was autoloaded and EINJ initialized during boot:
>>
>> [   33.909111] EINJ: Error INJection is initialized.
> 
> In the current code it should only load if cxl_core.ko is also loaded.
> 
> Can you share the output of lsmod to maybe see which module loaded that
> dependency?
> 
>> I'm wondering if that might be a problem for anyone that likes to
>> leave the einj module not loaded until they have some need to
>> inject errors.
> 
> That is a behavior change of this approach. Is it a problem?
> 
> If it is I would say that we need to break out a new cxl_einj.ko module
> that when it loads walks the CXL topology and creates the debugfs files.
> Otherwise my assumption is that CONFIG_CXL_EINJ=y means that cxl_core.ko
> loads einj.ko unconditionally.
> 
> I would save that work for a clear description of why einj.ko should not
> be resident.
> 
>> 2) Even though my system doesn't have any CXL support, I found this:
>>
>> # cat /sys/kernel/debug/cxl/einj_types
>> 0x00001000      CXL.cache Protocol Correctable
>>
>> What does this mean?
> 
> Strange, does:
> 
> /sys/kernel/debug/einj/available_error_type
> 
> ...show the same even before these patches? I.e. maybe this pre-CXL BIOS was
> using the 0x1000 encoding when it should not?
> 

Dan's already alluded to this, but to elaborate: This series doesn't do anything
different than before when getting available error types, it just puts the CXL types
into the "einj_types" file instead. So it's possible that your platform doesn't
have CXL support, but it is reporting a CXL error type for EINJ. This could be
a BIOS error, or it could be that the BIOS is pre ACPIv6.5 and was using 0x1000
for a different error type and the kernel is interpreting it as a CXL error type.
If you think something else is happening, I'd love to hear about it!

Thanks,
Ben

>> Using ras-tools I injected some DDR memory errors. So legacy
>> functionality still works OK.
>>
>> -Tony
> 
> 
> 




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux