Re: [PATCH v4 2/4] acpi/hmat / cxl: Add extended linear cache support for CXL

Alison Schofield <alison.schofield@xxxxxxxxx> · Tue, 25 Feb 2025 12:00:41 -0800

On Mon, Feb 24, 2025 at 11:21:00AM -0700, Dave Jiang wrote:
> The current cxl region size only indicates the size of the CXL memory
> region without accounting for the extended linear cache size. Retrieve the
> cache size from HMAT and append that to the cxl region size for the cxl
> region range that matches the SRAT range that has extended linear cache
> enabled.
> 
> The SRAT defines the whole memory range that includes the extended linear
> cache and the CXL memory region. The new HMAT ECN/ECR to the Memory Side
> Cache Information Structure defines the size of the extended linear cache
> size and matches to the SRAT Memory Affinity Structure by the memory
> proxmity domain. Add a helper to match the cxl range to the SRAT memory
> range in order to retrieve the cache size.
> 
> There are several places that checks the cxl region range against the
> decoder range. Use new helper to check between the two ranges and address
> the new cache size.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> Signed-off-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> ---
> v4:
> - Add adjustment for cxl_dpa_to_hpa() (Alison)
> - Add check of adjusted region start against CFMWS. (Alison)
> - Update dev_warn() to be more precise. (Alison)
> ---
>  drivers/acpi/numa/hmat.c  | 39 +++++++++++++++++++
>  drivers/cxl/core/Makefile |  1 +
>  drivers/cxl/core/acpi.c   | 11 ++++++
>  drivers/cxl/core/core.h   |  3 ++
>  drivers/cxl/core/region.c | 80 ++++++++++++++++++++++++++++++++++++---
>  drivers/cxl/cxl.h         |  2 +
>  include/linux/acpi.h      | 11 ++++++
>  tools/testing/cxl/Kbuild  |  1 +
>  8 files changed, 143 insertions(+), 5 deletions(-)
>  create mode 100644 drivers/cxl/core/acpi.c
> 
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 2630511937f5..9d9052258e92 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -108,6 +108,45 @@ static struct memory_target *find_mem_target(unsigned int mem_pxm)
>  	return NULL;
>  }
>  
> +/**
> + * hmat_get_extended_linear_cache_size - Retrieve the extended linear cache size
> + * @backing_res: resource from the backing media
> + * @nid: node id for the memory region
> + * @cache_size: (Output) size of extended linear cache.
> + *
> + * Return: 0 on success. Errno on failure.
> + *
> + */
> +int hmat_get_extended_linear_cache_size(struct resource *backing_res, int nid,
> +					resource_size_t *cache_size)
> +{
> +	unsigned int pxm = node_to_pxm(nid);
> +	struct memory_target *target;
> +	struct target_cache *tcache;
> +	struct resource *res;
> +
> +	target = find_mem_target(pxm);
> +	if (!target)
> +		return -ENOENT;

Remember this -ENOENT 

> +
> +	list_for_each_entry(tcache, &target->caches, node) {
> +		if (tcache->cache_attrs.address_mode !=
> +				NODE_CACHE_ADDR_MODE_EXTENDED_LINEAR)
> +			continue;
> +
> +		res = &target->memregions;
> +		if (!resource_contains(res, backing_res))
> +			continue;
> +
> +		*cache_size = tcache->cache_attrs.size;
> +		return 0;
> +	}
> +
> +	*cache_size = 0;
> +	return 0;
> +}

snip

> +static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
> +					    struct resource *res)
> +{
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> +	struct cxl_region_params *p = &cxlr->params;
> +	int nid = phys_to_target_node(res->start);
> +	resource_size_t size, cache_size, start;
> +	int rc;
> +
> +	size = resource_size(res);
> +	if (!size)
> +		return -EINVAL;
> +
> +	rc = cxl_acpi_get_extended_linear_cache_size(res, nid, &cache_size);
> +	if (rc)
> +		return rc;

Remember this - passing thru the -ENOENT

> +
> +	if (!cache_size)
> +		return 0;
> +
> +	if (size != cache_size) {
> +		dev_warn(&cxlr->dev, "Extended Linear Cache is not 1:1, unsupported!");

maybe emit the numbers, so in the next step we know what the mismatch was,
and how much cache goes unused.

> +		return -EOPNOTSUPP;

EOPNOTSUPP is only other possible rc

> +	}
> +
> +	/*
> +	 * Move the start of the range to where the cache range starts. The
> +	 * implementation assumes that the cache range is in front of the
> +	 * CXL range. This is not dictated by the HMAT spec but is how the
> +	 * current known implementation is configured.
> +	 *
> +	 * The cache range is expected to be within the CFMWS. The adjusted
> +	 * res->start should not be less than cxlrd->res->start.
> +	 */
> +	start = res->start - cache_size;
> +	if (start < cxlrd->res->start)
> +		return -ENXIO;
> +
> +	res->start = start;
> +	p->cache_size = cache_size;
> +
> +	return 0;
> +}
> +
>  /* Establish an empty region covering the given HPA range */
>  static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>  					   struct cxl_endpoint_decoder *cxled)
> @@ -3270,6 +3328,18 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>  
>  	*res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
>  				    dev_name(&cxlr->dev));
> +
> +	rc = cxl_extended_linear_cache_resize(cxlr, res);
> +	if (rc) {
> +		/*
> +		 * Failing to support extended linear cache region resize does not
> +		 * prevent the region from functioning. Only causes cxl list showing
> +		 * incorrect region size.
> +		 */
> +		dev_warn(cxlmd->dev.parent,
> +			 "Extended linear cache calculation failed.\n");
> +	}
> +

Do you want to handle ENOENT same as EOPNOTSUPP?
Include rc in error message. 

I'm still confused about this message and the state behind it.

Is this saying the region functions but only at it's lesser cxl resource
size, or is it saying the region functions at its extended size?

Is there any 'thanks for the cache but we cannot use it' type
response to acpi. Wondering if it can be reclaimed.

snip