Re: [PATCH v5 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/26/25 9:21 AM, Dave Jiang wrote:
> v5:
> - Update couple dev_dbg() emits. (Alison)
> - Add hpa_alias emits for poison events. (Alison)
> - Drop cxlr_hpa_cache_alias() and opencode the one invocation. (Alison)
> - See individual patches for detailed changes.

Applied to cxl/next

> 
> v4:
> - Add alias adjustment for cxl_dpa_to_hpa() (Alison)
> - Add check of adjusted region start against CFMWS (Alison)
> - Use ULLONG_MAX consistently. (Alison)
> - Use hpa_alias0 consistently. (Alison)
> - Move devm_add_action_or_reset() to devm_cxl_add_mce_notifier(). (Ming)
> - See individual patches for detailed changes.
> 
> v3:
> - Drop region to nid function, deadcode.
> - Set hpa_alias default to ~0ULL to indicate no alias. (Jonathan)
> - Add endpoint check for mce handler. (Ming)
> - Add mce notifier unregister. (Ming)
> 
> v2:
> - Fix 0-day issues
> - Fix checking of cache flag. (Ming)
> - Add comment about cache range vs CFMWS. (Ming)
> - Update EXPORT_SYMBOL_(). (Jonathan)
> - Fix various code comments. (Jonathan)
> - Emit hpa_alias0 instead of hpa_alias. (Jonathan)
> - Introduce CONFIG_CXL_MCE to address kernel build dep issues.
> 
> v1:
> - Drop RFC prefix
> - Drop MMIO hole discovery. Will implement if there's real world implementation.
> - Drop MCE_PRI_CXL. Use MCE_PRI_UC. (Boris)
> - Minor refactors and grammar fixes. (Jonathan)
> - Rename 'mode' to 'address_mode'. (Jonathan)
> 
> RFCv2:
> - Dropped 1/6 (ACPICA definition merged)
> - Change UNKNOWN to RESERVED for cache definition. (Jonathan)
> - Fix spelling errors (Jonathan)
> - Rename region_res_match_range() to region_res_match_cxl_range(). (Jonathan)
> - Add warning when cache is not 1:1 with backing region. (Jonathan)
> - Code and comments cleanup. (Jonathan)
> - Make MCE code access in CXL arch independent. (Jonathan)
> - Fixup 0-day reports.
> 
> Certain systems provide an exclusive caching memory configurations where a
> 1:1 layout of DRAM and far memory (FM) such as CXL memory is utilized. In
> this configuration, the memory region is provided as a single memory region
> to the OS. For example such as below:
> 
>              128GB DRAM                         128GB CXL memory
> |------------------------------------|------------------------------------|
> 
> The kernel sees the region as a 256G system memory region. Data can reside
> in either DRAM or FM with no replication. Hot data is swapped into DRAM by
> the hardware behind the scenes.
> 
> This kernel series introduces code to enumerate the side cache by the kernel
> when configured in a exclusive-cache configuration. It also adds RAS support
> to deal with the aliased memory addresses.
> 
> A new ECN [1] to ACPI HMAT table was introduced and was approved to describe
> the "extended-linear" addressing for direct-mapped memory-side caches. A
> reserved field in the Memory Side Cache Information Structure of HMAT is
> redefined as "Address Mode" where a value of 1 is defined as Extended-linear
> mode. This value is valid if the cache is direct mapped. "It indicates that
> the associated address range (SRAT.MemoryAffinityStructure.Length) is
> comprised of the backing store capacity extended by the cache capacity." By
> augmenting the HMAT and SRAT parsing code, this new information can be stored
> by the HMAT handling code.
> 
> Current CXL region enumeration code is not enlightened with the side cache
> configuration and therefore only presents the region size as the size of the
> CXL region. Add support to allow CXL region enumeration code to query the HMAT 
> handling code and retrieve the information regarding the side cache and adjust
> the region size accordingly. This should allow the CXL CLI to display the
> full region size rather than just the CXL only region size.
> 
> There are 3 sources where the kernel may be notified that error is detected for
> memory.
> 1. CXL DRAM event. This is a CXL event that is generated when an error is
>    detected by the CXL device patrol or demand scrubber. The trace_event is
>    augmented to display the aliased System Phyiscal Address (SPA) in addition
>    to the alerted address.  However, reporting of memory failure is TBD until
>    the discussion [2] of failure reporting is settled upstream.
> 2. UCNA event from DRAM patrol or demand scrubber. This should eventually go
>    through the MCE callback chain.
> 3. MCE from kernel consume poison.
> 
> It is possible that all 3 sources may report at the same time and all report
> at the error.
> 
> For 2 and 3, a MCE notifier callback is registered by the CXL on a per device
> basis. The callback will determine if the reported address is in one of the
> special regions and offline the aliased address if that is the case.
> 
> [1]: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch/
> [2]: https://lore.kernel.org/linux-cxl/20240808151328.707869-2-ruansy.fnst@xxxxxxxxxxx/
> 
> ---
> 
> Dave Jiang (4):
>       acpi: numa: Add support to enumerate and store extended linear address mode
>       acpi/hmat / cxl: Add extended linear cache support for CXL
>       cxl: Add extended linear cache address alias emission for cxl events
>       cxl: Add mce notifier to emit aliased address for extended linear cache
> 
>  Documentation/ABI/stable/sysfs-devices-node |   6 +++
>  arch/x86/mm/pat/set_memory.c                |   1 +
>  drivers/acpi/numa/hmat.c                    |  44 +++++++++++++++++++
>  drivers/base/node.c                         |   2 +
>  drivers/cxl/Kconfig                         |   4 ++
>  drivers/cxl/core/Makefile                   |   2 +
>  drivers/cxl/core/acpi.c                     |  11 +++++
>  drivers/cxl/core/core.h                     |   3 ++
>  drivers/cxl/core/mbox.c                     |  20 +++++++--
>  drivers/cxl/core/mce.c                      |  65 +++++++++++++++++++++++++++
>  drivers/cxl/core/mce.h                      |  20 +++++++++
>  drivers/cxl/core/region.c                   | 114 +++++++++++++++++++++++++++++++++++++++++++++---
>  drivers/cxl/core/trace.h                    |  31 ++++++++-----
>  drivers/cxl/cxl.h                           |   8 ++++
>  drivers/cxl/cxlmem.h                        |   2 +
>  include/linux/acpi.h                        |  11 +++++
>  include/linux/node.h                        |   7 +++
>  tools/testing/cxl/Kbuild                    |   2 +
>  18 files changed, 332 insertions(+), 21 deletions(-)
> 
>  base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319
> 





[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux