Re: [PATCH] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 8, 2021 at 6:45 PM <alison.schofield@xxxxxxxxx> wrote:
>
> From: Alison Schofield <alison.schofield@xxxxxxxxx>
>
> During NUMA init, CXL memory defined in the SRAT Memory Affinity
> subtable may be assigned to a NUMA node. Since there is no
> requirement that the SRAT be comprehensive for CXL memory another
> mechanism is needed to assign NUMA nodes to CXL memory not identified
> in the SRAT.
>
> Use the CXL Fixed Memory Window Structure's (CFMWS) of the ACPI CXL
> Early Discovery Table (CEDT) to find all CXL memory ranges. Create a
> NUMA node for each range that is not already assigned to a NUMA node.
> Add a memblk attaching its host physical address range to the node.
>
> Note that these ranges may not actually map any memory at boot time.
> They may describe persistent capacity or may be present to enable
> hot-plug.
>
> Consumers can use phys_to_target_node() to discover the NUMA node.
>
> Signed-off-by: Alison Schofield <alison.schofield@xxxxxxxxx>
> ---
>  drivers/acpi/numa/srat.c | 58 ++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/acpi.c       |  8 +++---
>  2 files changed, 63 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index b8795fc49097..568e033e6c3f 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -300,6 +300,61 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  }
>  #endif /* defined(CONFIG_X86) || defined (CONFIG_ARM64) */
>
> +/* Add a NUMA node and memblk for each node-less CFMWS */
> +static int __init acpi_parse_cfmws(struct acpi_table_header *acpi_cedt)
> +{
> +       struct acpi_cedt_cfmws *cfmws;
> +       acpi_size len, cur = 0;
> +       void *cedt_subtable;
> +       int i, pxm, node;
> +       u64 start, end;
> +
> +       /* Use fake PXM values starting after the max PXM found in the SRAT */
> +       for (i = 0; i < MAX_PXM_DOMAINS - 1; i++)
> +               if (node_to_pxm_map[i] > pxm)
> +                       pxm = node_to_pxm_map[i];
> +       pxm++;
> +
> +       len = acpi_cedt->length - sizeof(*acpi_cedt);
> +       cedt_subtable = acpi_cedt + 1;
> +
> +       while (cur < len) {
> +               struct acpi_cedt_header *c = cedt_subtable + cur;
> +
> +               if (c->type != ACPI_CEDT_TYPE_CFMWS)
> +                       goto next;
> +
> +               cfmws = cedt_subtable + cur;
> +               if (cfmws->header.length < sizeof(*cfmws)) {
> +                       pr_warn_once("CFMWS entry skipped:invalid length:%u\n",
> +                                    cfmws->header.length);
> +                       goto next;
> +               }
> +
> +               start = cfmws->base_hpa;
> +               end = cfmws->base_hpa + cfmws->window_size;
> +
> +               /* Skip if the HPA is already assigned to a NUMA node */

Per below note about SRAT parsing dependency, perhaps change this to:

/* Skip if the SRAT already described the NUMA details for this HPA */


> +               node = phys_to_target_node(start);
> +               if (node != NUMA_NO_NODE)
> +                       goto next;
> +
> +               node = acpi_map_pxm_to_node(pxm);
> +               if (node == NUMA_NO_NODE) {
> +                       pr_err("ACPI NUMA: Too many proximity domains.\n");

I think this might be too harsh especially for nodes that are only
there to support potential future CXL hotplug.

Let's just report and continue for now because the CXL driver itself
can decide on a fallback node to hot-add CXL memory that the BIOS did
not setup.

> +                       return -EINVAL;
> +               }
> +               if (numa_add_memblk(node, start, end) < 0) {
> +                       pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
> +                               node, start, end);

I think this error message is most likely to trigger in the case that
MAX_NUMNODES is exhausted, so the address range information should be
reported in the acpi_map_pxm_to_node() failure case.

> +               }
> +               pxm++;
> +next:
> +               cur += c->length;
> +       }
> +       return 0;
> +}
> +
>  static int __init acpi_parse_slit(struct acpi_table_header *table)
>  {
>         struct acpi_table_slit *slit = (struct acpi_table_slit *)table;
> @@ -478,6 +533,9 @@ int __init acpi_numa_init(void)
>         /* SLIT: System Locality Information Table */
>         acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
>
> +       /* CEDT: CXL Early Discovery Table */

It might be worth a comment here that this *must* come after SRAT
parsing so that the CXL ranges that the BIOS already incorporated into
the system memory map can be skipped.

> +       acpi_table_parse(ACPI_SIG_CEDT, acpi_parse_cfmws);
> +
>         if (cnt < 0)
>                 return cnt;
>         else if (!parsed_numa_memblks)
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index dadc7f64b9ff..3798841c3418 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -144,9 +144,11 @@ static void cxl_add_cfmws_decoders(struct device *dev,
>                                 cfmws->window_size - 1);
>                         goto next;
>                 }
> -               dev_dbg(dev, "add: %s range %#llx-%#llx\n",
> -                       dev_name(&cxld->dev), cfmws->base_hpa,
> -                       cfmws->base_hpa + cfmws->window_size - 1);
> +               dev_dbg(dev, "add:%s node:%d range:%#llx-%#llx restrict:%#x\n",
> +                       dev_name(&cxld->dev),
> +                       phys_to_target_node(cxld->range.start),
> +                       cxld->range.start, cxld->range.end,
> +                       cfmws->restrictions);
>  next:
>                 cur += c->length;
>         }
> --
> 2.31.1
>



[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux