tl;dr: 46 patches is way too many patches to review in one sitting. Jump to the PATCH SUMMARY below to find a subset of interest to jump into. The series is also posted on the 'preview' branch [1]. Note that branch rebases, the tip of that branch at time of posting is: 7e5ad5cb1580 cxl/region: Introduce cxl_pmem_region objects [1]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=preview --- Until the CXL 2.0 definition arrived there was little reason for OS drivers to care about CXL memory expanders. Similar to DDR they just implemented a physical address range that was described to the OS by platform firmware (EFI Memory Map + ACPI SRAT/SLIT/HMAT etc). The CXL 2.0 definition adds support for PMEM, hotplug, switch topologies, and device-interleaving which exceeds the limits of what can be reasonably abstracted by EFI + ACPI mechanisms. As a result, Linux needs a native capability to provision new CXL regions. The term "region" is the same term that originated in the LIBNVDIMM implementation to describe a host physical / system physical address range. For PMEM a region is a persistent memory range that can be further sub-divided into namespaces. For CXL there are three classifications of regions: - PMEM: set up by CXL native tooling and persisted in CXL region labels - RAM: set up dynamically by CXL native tooling after hotplug events, or leftover capacity not mapped by platform firmware. Any persistent configuration would come from set up scripts / configuration files in usersapce. - System RAM: set up by platform firmware and described by EFI + ACPI metadata, these regions are static. For now, these patches implement just PMEM regions without region label support. Note though that the infrastructure routines like cxl_region_attach() and cxl_region_setup_targets() are building blocks for region-label support, provisioning RAM regions, and enumerating System RAM regions. The general flow for provisioning a CXL region is to: - Find a device or set of devices with available device-physical-address (DPA) capacity - Find a platform CXL window that has free capacity to map a new region and that is able to target the devices in the previous step. - Allocate DPA according to the CXL specification rules of sequential enabling of decoders by id and when a device hosts multiple decoders make sure that lower-id decoders map lower HPA and higher-id decoders map higher HPA. - Assign endpoint decoders to a region and validate that the switching topology supports the requested configuration. Recall that interleaving is governed by modulo or xormap math that constrains which device can support which positions in a given region interleave. - Program all the decoders an all endpoints and participating switches to bring the new address range online. Once the range is online then existing drivers like LIBNVDIMM or device-dax can manage the memory range as if the ACPI BIOS had conveyed its parameters at boot. This patch kit is the result of significant amounts of path finding work [2] and long discussions with Ben. Thank you Ben for all that work! Where the patches in this kit go in a different design direction than the RFC, the authorship is changed and a Co-developed-by is added mainly so I get blamed for the bad decisions and not Ben. The major updates from that last posting are: - all CXL resources are reflected in full in iomem_resource - host-physical-address (HPA) range allocation moves to a devm_request_free_mem_region() derivative - locking moves to two global rwsems, one for DPA / endpoint decoders and one for HPA / regions. - the existing port scanning path is augmented to cache more topology information rather than recreate it at region creation time [2]: https://lore.kernel.org/r/20220413183720.2444089-1-ben.widawsky@xxxxxxxxx PATCH SUMMARY If you want to jump straight to the meat of the new infrastructure start reading at patch 34. - Patch 34 through 42 is the bulk of the new infrastructure that is needed to stand up a new region regardless of whether it is PMEM, or RAM. - Patch 33 is a new core facility for allocating physical address space. It is a straightforward extension of devm_request_free_mem_region(). - Patch 9 uses insert_resource_expand_to_fit() to inform the new allocator mentioned above about which address ranges are busy / free. - Patch 46 is the support that takes a CXL PMEM region and turns it into a LIBNVDIMM region. Patches 43-45 are just prep work for patch 46. - Patch 16 - 20 is the infrastructure to mangage DPA capacity, including enumerating the DPA that platform firmware may have already allocated to a System RAM region. They also enable DPA allocations to be manipulated separate from the case when the decoder is assigned to a given region. This separation of allocation and region assignment is necessary for enumerating regions from region labels where labels within and across devices may disagree. Userspace in that situation may need to jump in and sort out the allocation conflicts. - Patches 21 - 24 are updates to cxl_test to put this new implementation through its paces with a x8 device region creation test. Recall that cxl_test is a way to ship canned CXL configurations in the kernel alongside new CXL subsystem code to supplement testing that can be done with real devices or QEMU emulation. Note cxl_test just implements device topology and ABI, it does not test the PCI-related aspects of the implementation. - Patches 25 - 29 are enhancements to the port enumeration code to cache and improve the lookup of topology metadata that is relevant for region provisioning. - Patches 30 - 32 are some straightforward pre-work for exporting decoder settings via sysfs. - Patch 1 - 8, 10 - 15 are some miscellaneous fixes and refactorings that should be straightforward to review. [PATCH 01/46] tools/testing/cxl: Fix cxl_hdm_decode_init() calling convention [PATCH 02/46] cxl/port: Keep port->uport valid for the entire life of a port [PATCH 03/46] cxl/hdm: Use local hdm variable [PATCH 04/46] cxl/core: Rename ->decoder_range ->hpa_range [PATCH 05/46] cxl/core: Drop ->platform_res attribute for root decoders [PATCH 06/46] cxl/core: Drop is_cxl_decoder() [PATCH 07/46] cxl: Introduce cxl_to_{ways,granularity} [PATCH 08/46] cxl/core: Define a 'struct cxl_switch_decoder' [PATCH 09/46] cxl/acpi: Track CXL resources in iomem_resource [PATCH 10/46] cxl/core: Define a 'struct cxl_root_decoder' for tracking CXL window resources [PATCH 11/46] cxl/core: Define a 'struct cxl_endpoint_decoder' for tracking DPA resources [PATCH 12/46] cxl/mem: Convert partition-info to resources [PATCH 13/46] cxl/hdm: Require all decoders to be enumerated [PATCH 14/46] cxl/hdm: Enumerate allocated DPA [PATCH 15/46] cxl/Documentation: List attribute permissions [PATCH 16/46] cxl/hdm: Add 'mode' attribute to decoder objects [PATCH 17/46] cxl/hdm: Track next decoder to allocate [PATCH 18/46] cxl/hdm: Add support for allocating DPA to an endpoint decoder [PATCH 19/46] cxl/debug: Move debugfs init to cxl_core_init() [PATCH 20/46] cxl/mem: Add a debugfs version of 'iomem' for DPA, 'dpamem' [PATCH 21/46] tools/testing/cxl: Move cxl_test resources to the top of memory [PATCH 22/46] tools/testing/cxl: Expand CFMWS windows [PATCH 23/46] tools/testing/cxl: Add partition support [PATCH 24/46] tools/testing/cxl: Fix decoder default state [PATCH 25/46] cxl/port: Record dport in endpoint references [PATCH 26/46] cxl/port: Record parent dport when adding ports [PATCH 27/46] cxl/port: Move 'cxl_ep' references to an xarray per port [PATCH 28/46] cxl/port: Move dport tracking to an xarray [PATCH 29/46] cxl/port: Cache CXL host bridge data [PATCH 30/46] cxl/hdm: Add sysfs attributes for interleave ways + granularity [PATCH 31/46] cxl/hdm: Initialize decoder type for memory expander devices [PATCH 32/46] cxl/mem: Enumerate port targets before adding endpoints [PATCH 33/46] resource: Introduce alloc_free_mem_region() [PATCH 34/46] cxl/region: Add region creation support [PATCH 35/46] cxl/region: Add a 'uuid' attribute [PATCH 36/46] cxl/region: Add interleave ways attribute [PATCH 37/46] cxl/region: Allocate host physical address (HPA) capacity to new regions [PATCH 38/46] cxl/region: Enable the assignment of endpoint decoders to regions [PATCH 39/46] cxl/acpi: Add a host-bridge index lookup mechanism [PATCH 40/46] cxl/region: Attach endpoint decoders [PATCH 41/46] cxl/region: Program target lists [PATCH 42/46] cxl/hdm: Commit decoder state to hardware [PATCH 43/46] cxl/region: Add region driver boiler plate [PATCH 44/46] cxl/pmem: Delete unused nvdimm attribute [PATCH 45/46] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge [PATCH 46/46] cxl/region: Introduce cxl_pmem_region objects --- Documentation/ABI/testing/sysfs-bus-cxl | 271 +++ Documentation/driver-api/cxl/memory-devices.rst | 11 drivers/cxl/Kconfig | 8 drivers/cxl/acpi.c | 198 ++- drivers/cxl/core/Makefile | 1 drivers/cxl/core/core.h | 52 + drivers/cxl/core/hdm.c | 663 ++++++++ drivers/cxl/core/mbox.c | 95 + drivers/cxl/core/memdev.c | 4 drivers/cxl/core/pci.c | 8 drivers/cxl/core/pmem.c | 4 drivers/cxl/core/port.c | 678 ++++++--- drivers/cxl/core/region.c | 1797 +++++++++++++++++++++++ drivers/cxl/cxl.h | 294 +++- drivers/cxl/cxlmem.h | 39 drivers/cxl/mem.c | 49 - drivers/cxl/pci.c | 2 drivers/cxl/pmem.c | 256 +++ drivers/nvdimm/region_devs.c | 28 include/linux/ioport.h | 2 include/linux/libnvdimm.h | 5 kernel/resource.c | 181 ++ mm/Kconfig | 5 tools/testing/cxl/Kbuild | 1 tools/testing/cxl/test/cxl.c | 123 +- tools/testing/cxl/test/mem.c | 53 - tools/testing/cxl/test/mock.c | 8 27 files changed, 4300 insertions(+), 536 deletions(-) create mode 100644 drivers/cxl/core/region.c base-commit: f50974eee5c4a5de1e4f1a3d873099f170df25f8