[PATCH 00/46] CXL PMEM Region Provisioning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



tl;dr: 46 patches is way too many patches to review in one sitting. Jump
to the PATCH SUMMARY below to find a subset of interest to jump into.

The series is also posted on the 'preview' branch [1]. Note that branch
rebases, the tip of that branch at time of posting is:

7e5ad5cb1580 cxl/region: Introduce cxl_pmem_region objects

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=preview

---

Until the CXL 2.0 definition arrived there was little reason for OS
drivers to care about CXL memory expanders. Similar to DDR they just
implemented a physical address range that was described to the OS by
platform firmware (EFI Memory Map + ACPI SRAT/SLIT/HMAT etc). The CXL
2.0 definition adds support for PMEM, hotplug, switch topologies, and
device-interleaving which exceeds the limits of what can be reasonably
abstracted by EFI + ACPI mechanisms. As a result, Linux needs a native
capability to provision new CXL regions.

The term "region" is the same term that originated in the LIBNVDIMM
implementation to describe a host physical / system physical address
range. For PMEM a region is a persistent memory range that can be
further sub-divided into namespaces. For CXL there are three
classifications of regions:
- PMEM: set up by CXL native tooling and persisted in CXL region labels

- RAM: set up dynamically by CXL native tooling after hotplug events, or
  leftover capacity not mapped by platform firmware. Any persistent
  configuration would come from set up scripts / configuration files in
  usersapce.

- System RAM: set up by platform firmware and described by EFI + ACPI
  metadata, these regions are static.

For now, these patches implement just PMEM regions without region label
support. Note though that the infrastructure routines like
cxl_region_attach() and cxl_region_setup_targets() are building blocks
for region-label support, provisioning RAM regions, and enumerating
System RAM regions.

The general flow for provisioning a CXL region is to:
- Find a device or set of devices with available device-physical-address
  (DPA) capacity

- Find a platform CXL window that has free capacity to map a new region
  and that is able to target the devices in the previous step.

- Allocate DPA according to the CXL specification rules of sequential
  enabling of decoders by id and when a device hosts multiple decoders
  make sure that lower-id decoders map lower HPA and higher-id decoders
  map higher HPA.

- Assign endpoint decoders to a region and validate that the switching
  topology supports the requested configuration. Recall that
  interleaving is governed by modulo or xormap math that constrains which
  device can support which positions in a given region interleave.

- Program all the decoders an all endpoints and participating switches
  to bring the new address range online.

Once the range is online then existing drivers like LIBNVDIMM or
device-dax can manage the memory range as if the ACPI BIOS had conveyed
its parameters at boot.

This patch kit is the result of significant amounts of path finding work
[2] and long discussions with Ben. Thank you Ben for all that work!
Where the patches in this kit go in a different design direction than
the RFC, the authorship is changed and a Co-developed-by is added mainly
so I get blamed for the bad decisions and not Ben. The major updates
from that last posting are:

- all CXL resources are reflected in full in iomem_resource

- host-physical-address (HPA) range allocation moves to a
  devm_request_free_mem_region() derivative

- locking moves to two global rwsems, one for DPA / endpoint decoders
  and one for HPA / regions.

- the existing port scanning path is augmented to cache more topology
  information rather than recreate it at region creation time

[2]: https://lore.kernel.org/r/20220413183720.2444089-1-ben.widawsky@xxxxxxxxx

PATCH SUMMARY

If you want to jump straight to the meat of the new infrastructure start
reading at patch 34.

- Patch 34 through 42 is the bulk of the new infrastructure that is
  needed to stand up a new region regardless of whether it is PMEM, or
  RAM.

- Patch 33 is a new core facility for allocating physical address space.
  It is a straightforward extension of devm_request_free_mem_region().

- Patch 9 uses insert_resource_expand_to_fit() to inform the new
  allocator mentioned above about which address ranges are busy / free.

- Patch 46 is the support that takes a CXL PMEM region and turns it into
  a LIBNVDIMM region. Patches 43-45 are just prep work for patch 46.

- Patch 16 - 20 is the infrastructure to mangage DPA capacity, including
  enumerating the DPA that platform firmware may have already allocated
  to a System RAM region. They also enable DPA allocations to be manipulated
  separate from the case when the decoder is assigned to a given region.
  This separation of allocation and region assignment is necessary for
  enumerating regions from region labels where labels within and across
  devices may disagree. Userspace in that situation may need to jump in
  and sort out the allocation conflicts.

- Patches 21 - 24 are updates to cxl_test to put this new implementation
  through its paces with a x8 device region creation test. Recall that
  cxl_test is a way to ship canned CXL configurations in the kernel
  alongside new CXL subsystem code to supplement testing that can be done
  with real devices or QEMU emulation. Note cxl_test just implements
  device topology and ABI, it does not test the PCI-related aspects of
  the implementation.

- Patches 25 - 29 are enhancements to the port enumeration code to cache
  and improve the lookup of topology metadata that is relevant for
  region provisioning.

- Patches 30 - 32 are some straightforward pre-work for exporting
  decoder settings via sysfs.

- Patch 1 - 8, 10 - 15 are some miscellaneous fixes and refactorings
  that should be straightforward to review.

[PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention
[PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port
[PATCH 03/46] cxl/hdm: Use local hdm variable
[PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range
[PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders
[PATCH 06/46] cxl/core: Drop is_cxl_decoder()
[PATCH 07/46] cxl: Introduce cxl_to_{ways,granularity}
[PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder'
[PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource
[PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources
[PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources
[PATCH 12/46] cxl/mem: Convert partition-info to resources
[PATCH 13/46] cxl/hdm: Require all decoders to be enumerated
[PATCH 14/46] cxl/hdm: Enumerate allocated DPA
[PATCH 15/46] cxl/Documentation: List attribute permissions
[PATCH 16/46] cxl/hdm: Add 'mode' attribute to decoder objects
[PATCH 17/46] cxl/hdm: Track next decoder to allocate
[PATCH 18/46] cxl/hdm: Add support for allocating DPA to an endpoint decoder
[PATCH 19/46] cxl/debug: Move debugfs init to cxl_core_init()
[PATCH 20/46] cxl/mem: Add a debugfs version of 'iomem' for DPA, 'dpamem'
[PATCH 21/46] tools/testing/cxl: Move cxl_test resources to the top of memory
[PATCH 22/46] tools/testing/cxl: Expand CFMWS windows
[PATCH 23/46] tools/testing/cxl: Add partition support
[PATCH 24/46] tools/testing/cxl: Fix decoder default state
[PATCH 25/46] cxl/port: Record dport in endpoint references
[PATCH 26/46] cxl/port: Record parent dport when adding ports
[PATCH 27/46] cxl/port: Move 'cxl_ep' references to an xarray per port
[PATCH 28/46] cxl/port: Move dport tracking to an xarray
[PATCH 29/46] cxl/port: Cache CXL host bridge data
[PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity
[PATCH 31/46] cxl/hdm: Initialize decoder type for memory expander devices
[PATCH 32/46] cxl/mem: Enumerate port targets before adding endpoints
[PATCH 33/46] resource: Introduce alloc_free_mem_region()
[PATCH 34/46] cxl/region: Add region creation support
[PATCH 35/46] cxl/region: Add a 'uuid' attribute
[PATCH 36/46] cxl/region: Add interleave ways attribute
[PATCH 37/46] cxl/region: Allocate host physical address (HPA) capacity to new regions
[PATCH 38/46] cxl/region: Enable the assignment of endpoint decoders to regions
[PATCH 39/46] cxl/acpi: Add a host-bridge index lookup mechanism
[PATCH 40/46] cxl/region: Attach endpoint decoders
[PATCH 41/46] cxl/region: Program target lists
[PATCH 42/46] cxl/hdm: Commit decoder state to hardware
[PATCH 43/46] cxl/region: Add region driver boiler plate
[PATCH 44/46] cxl/pmem: Delete unused nvdimm attribute
[PATCH 45/46] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge
[PATCH 46/46] cxl/region: Introduce cxl_pmem_region objects

---

 Documentation/ABI/testing/sysfs-bus-cxl         |  271 +++
 Documentation/driver-api/cxl/memory-devices.rst |   11 
 drivers/cxl/Kconfig                             |    8 
 drivers/cxl/acpi.c                              |  198 ++-
 drivers/cxl/core/Makefile                       |    1 
 drivers/cxl/core/core.h                         |   52 +
 drivers/cxl/core/hdm.c                          |  663 ++++++++
 drivers/cxl/core/mbox.c                         |   95 +
 drivers/cxl/core/memdev.c                       |    4 
 drivers/cxl/core/pci.c                          |    8 
 drivers/cxl/core/pmem.c                         |    4 
 drivers/cxl/core/port.c                         |  678 ++++++---
 drivers/cxl/core/region.c                       | 1797 +++++++++++++++++++++++
 drivers/cxl/cxl.h                               |  294 +++-
 drivers/cxl/cxlmem.h                            |   39 
 drivers/cxl/mem.c                               |   49 -
 drivers/cxl/pci.c                               |    2 
 drivers/cxl/pmem.c                              |  256 +++
 drivers/nvdimm/region_devs.c                    |   28 
 include/linux/ioport.h                          |    2 
 include/linux/libnvdimm.h                       |    5 
 kernel/resource.c                               |  181 ++
 mm/Kconfig                                      |    5 
 tools/testing/cxl/Kbuild                        |    1 
 tools/testing/cxl/test/cxl.c                    |  123 +-
 tools/testing/cxl/test/mem.c                    |   53 -
 tools/testing/cxl/test/mock.c                   |    8 
 27 files changed, 4300 insertions(+), 536 deletions(-)
 create mode 100644 drivers/cxl/core/region.c

base-commit: f50974eee5c4a5de1e4f1a3d873099f170df25f8



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux