Re: [PATCH] cxl: Update Soft Reserved resources upon region creation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nathan Fontenot wrote:
> Update handling of SOFT RESERVE iomem resources that intersect with
> CXL region resources to remove the intersections from the SOFT RESERVE
> resources. The current approach of leaving the SOFT RESERVE
> resource as is can cause failures during hotplug replace of CXL
> devices because the resource is not available for reuse after
> teardown of the CXL device.
> 
> The approach is to trim out any pieces of SOFT RESERVE resources
> that intersect CXL regions. To do this, first set aside any SOFT RESERVE
> resources that intersect with a CFMWS into a separate resource tree
> during e820__reserve_resources_late() that would have been otherwise
> added to the iomem resource tree.
> 
> As CXL regions are created the cxl resource created for the new
> region is used to trim intersections from the SOFT RESERVE
> resources that were previously set aside.
> 
> Once CXL device probe has completed ant remaining SOFT RESERVE resources
> remaining are added to the iomem resource tree. As each resource
> is added to the oiomem resource tree a new notifier chain is invoked
> to notify the dax driver of newly added SOFT RESERVE resources so that
> the dax driver can consume them.

Hi Nathan, this patch hit on all the mechanisms I would expect, but upon
reading it there is an opportunity to zoom out and do something blunter
than the surgical precision of this current proposal.

In other words, I appreciate the consideration of potential corner
cases, but for overall maintainability this should aim to be an all or
nothing approach.

Specifically, at the first sign of trouble, any CXL sub-driver probe
failure or region enumeration timeout, that the entire CXL topology be
torn down (trigger the equivalent of ->remove() on the ACPI0017 device),
and the deferred Soft Reserved ranges registered as if cxl_acpi was not
present (implement a fallback equivalent to hmem_register_devices()).

No need to trim resources as regions arrive, just tear down everything
setup in the cxl_acpi_probe() path with devres_release_all().

So, I am thinking export a flag from the CXL core that indicates whether
any conflict with platform-firmware established CXL regions has
occurred.

Read that flag from an cxl_acpi-driver-launched deferred workqueue that
is awaiting initial device probing to quiesce. If that flag indicates a
CXL enumeration failure then trigger devres_release_all() on the
ACPI0017 platform device and follow that up by walking the deferred Soft
Reserve resources to register raw (unparented by CXL regions) dax
devices.

Some more comments below:

> Signed-off-by: Nathan Fontenot <nathan.fontenot@xxxxxxx>
> ---
>  arch/x86/kernel/e820.c    |  17 ++++-
>  drivers/cxl/core/region.c |   8 +-
>  drivers/cxl/port.c        |  15 ++++
>  drivers/dax/hmem/device.c |  13 ++--
>  drivers/dax/hmem/hmem.c   |  15 ++++
>  drivers/dax/hmem/hmem.h   |  11 +++
>  include/linux/dax.h       |   4 -
>  include/linux/ioport.h    |   6 ++
>  kernel/resource.c         | 155 +++++++++++++++++++++++++++++++++++++-
>  9 files changed, 229 insertions(+), 15 deletions(-)
>  create mode 100644 drivers/dax/hmem/hmem.h
> 
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index 4893d30ce438..cab82e9324a5 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1210,14 +1210,23 @@ static unsigned long __init ram_alignment(resource_size_t pos)
>  
>  void __init e820__reserve_resources_late(void)
>  {
> -	int i;
>  	struct resource *res;
> +	int i;
>  
> +	/*
> +	 * Prior to inserting SOFT_RESERVED resources we want to check for an
> +	 * intersection with potential CXL resources. Any SOFT_RESERVED resources
> +	 * that do intersect a potential CXL resource are set aside so they
> +	 * can be trimmed to accommodate CXL resource intersections and added to
> +	 * the iomem resource tree after the CXL drivers have completed their
> +	 * device probe.

Perhaps shorten to "see hmem_register_devices() and cxl_acpi_probe() for
deferred initialization of Soft Reserved ranges"

> +	 */
>  	res = e820_res;
> -	for (i = 0; i < e820_table->nr_entries; i++) {
> -		if (!res->parent && res->end)
> +	for (i = 0; i < e820_table->nr_entries; i++, res++) {
> +		if (res->desc == IORES_DESC_SOFT_RESERVED)
> +			insert_soft_reserve_resource(res);

I would only expect this deferral to happen when CONFIG_DEV_DAX_HMEM
and/or CONFIG_CXL_REGION  is enabled. It also needs to catch Soft
Reserved deferral on other, non-e820 based, archs. So, maybe this hackery
should be done internal to insert_resource_*(). Something like all
insert_resource() of IORES_DESC_SOFT_RESERVED is deferred until a flag
is flipped allowing future insertion attempts to succeed in adding them
to the ioresource_mem tree.

Not that I expect this problem will ever effect more than just CXL, but
it is already the case that Soft Reserved is set for more than just CXL
ranges, and who know what other backend Soft Reserved consumer drivers
might arrive later.

When CXL or HMEM parses the deferred entries they can take
responsibility for injecting the Soft Reserved entries. That achieves
continuity of the /proc/iomem contents across kernel versions while
giving those endpoint drivers the ability to unregister those resources.

> +		else if (!res->parent && res->end)
>  			insert_resource_expand_to_fit(&iomem_resource, res);
> -		res++;
>  	}
>  
>  	/*
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 21ad5f242875..c458a6313b31 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3226,6 +3226,12 @@ static int match_region_by_range(struct device *dev, void *data)
>  	return rc;
>  }
>  
> +static int insert_region_resource(struct resource *parent, struct resource *res)
> +{
> +	trim_soft_reserve_resources(res);
> +	return insert_resource(parent, res);
> +}

Per above, lets not do dynamic trimming, it's all or nothing CXL memory
enumeration if the driver is trying and failing to parse any part of the
BIOS-established CXL configuration.

Yes, this could result in regressions in the other direction, but my
expectation is that the vast majority of CXL memory present at boot is
meant to be indistinguishable from DDR. In other words the current
default of "lose access to memory upon CXL enumeration failure that is
otherwise fully described by the EFI Memory Map" is the wrong default
policy.

> +
>  /* Establish an empty region covering the given HPA range */
>  static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>  					   struct cxl_endpoint_decoder *cxled)
> @@ -3272,7 +3278,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>  
>  	*res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
>  				    dev_name(&cxlr->dev));
> -	rc = insert_resource(cxlrd->res, res);
> +	rc = insert_region_resource(cxlrd->res, res);
>  	if (rc) {
>  		/*
>  		 * Platform-firmware may not have split resources like "System
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index d7d5d982ce69..4461f2a80d72 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -89,6 +89,20 @@ static int cxl_switch_port_probe(struct cxl_port *port)
>  	return -ENXIO;
>  }
>  
> +static void cxl_sr_update(struct work_struct *w)
> +{
> +	merge_soft_reserve_resources();
> +}
> +
> +DECLARE_DELAYED_WORK(cxl_sr_work, cxl_sr_update);
> +
> +static void schedule_soft_reserve_update(void)
> +{
> +	int timeout = 5 * HZ;
> +
> +	mod_delayed_work(system_wq, &cxl_sr_work, timeout);
> +}

For cases where there is Soft Reserved CXL backed memory it should be
sufficient to just wait for initial device probing to complete. So I
would just have cxl_acpi_probe() call wait_for_device_probe() in a
workqueue, rather than try to guess at a timeout. If anything, waiting
for driver core deferred probing timeout seems a good time to ask "are
we missing any CXL memory ranges?".

> +
>  static int cxl_endpoint_port_probe(struct cxl_port *port)
>  {
>  	struct cxl_endpoint_dvsec_info info = { .port = port };
> @@ -140,6 +154,7 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
>  	 */
>  	device_for_each_child(&port->dev, root, discover_region);
>  
> +	schedule_soft_reserve_update();
>  	return 0;
>  }
>  
> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> index f9e1a76a04a9..c45791ad4858 100644
> --- a/drivers/dax/hmem/device.c
> +++ b/drivers/dax/hmem/device.c
> @@ -4,6 +4,7 @@
>  #include <linux/module.h>
>  #include <linux/dax.h>
>  #include <linux/mm.h>
> +#include "hmem.h"
>  
>  static bool nohmem;
>  module_param_named(disable, nohmem, bool, 0444);
> @@ -17,6 +18,9 @@ static struct resource hmem_active = {
>  	.flags = IORESOURCE_MEM,
>  };
>  
> +struct platform_device *hmem_pdev;
> +EXPORT_SYMBOL_GPL(hmem_pdev);
> +
>  int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
>  {
>  	struct resource *res;
> @@ -35,7 +39,6 @@ EXPORT_SYMBOL_GPL(walk_hmem_resources);
>  
>  static void __hmem_register_resource(int target_nid, struct resource *res)
>  {
> -	struct platform_device *pdev;
>  	struct resource *new;
>  	int rc;
>  
> @@ -51,15 +54,15 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
>  	if (platform_initialized)
>  		return;
>  
> -	pdev = platform_device_alloc("hmem_platform", 0);
> -	if (!pdev) {
> +	hmem_pdev = platform_device_alloc("hmem_platform", 0);
> +	if (!hmem_pdev) {
>  		pr_err_once("failed to register device-dax hmem_platform device\n");
>  		return;
>  	}
>  
> -	rc = platform_device_add(pdev);
> +	rc = platform_device_add(hmem_pdev);
>  	if (rc)
> -		platform_device_put(pdev);
> +		platform_device_put(hmem_pdev);
>  	else
>  		platform_initialized = true;

So, I don't think anyone actually cares which device parents a dax
device. It would be cleaner if cxl_acpi registered the Soft Reserved dax
devices that the hmem driver was told to skip.

That change eliminates the need for a notifier to trigger the hmem
driver to add devices after a CXL enumeration failure.

[ .. trim all the fine grained resource handling and notifier code .. ]

The end result of this effort is that the Linux CXL subsystem will
aggressively complain and refuse to run with platforms and devices that
deviate from common expectations. That gives space for Soft Reserved
generic support to fill some gaps while quirks, hacks, and workarounds
are developed to compensate for these deviations. Otherwise it has been
a constant drip of "what in the world is that platform doing?", and the
current policy of "try to depend on standard CXL enumeration" is too
fragile.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux