Re: [PATCH] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 11, 2021 at 2:52 PM Alison Schofield
<alison.schofield@xxxxxxxxx> wrote:
>
> Thanks for the review Ira!
>
> On Mon, Oct 11, 2021 at 10:13:16AM -0700, Ira Weiny wrote:
> > On Fri, Oct 08, 2021 at 06:53:39PM -0700, Schofield, Alison wrote:
> > > From: Alison Schofield <alison.schofield@xxxxxxxxx>
> > >
> > > During NUMA init, CXL memory defined in the SRAT Memory Affinity
> > > subtable may be assigned to a NUMA node. Since there is no
> > > requirement that the SRAT be comprehensive for CXL memory another
> > > mechanism is needed to assign NUMA nodes to CXL memory not identified
> > > in the SRAT.
> > >
> > > Use the CXL Fixed Memory Window Structure's (CFMWS) of the ACPI CXL
> > > Early Discovery Table (CEDT) to find all CXL memory ranges. Create a
> > > NUMA node for each range that is not already assigned to a NUMA node.
> > > Add a memblk attaching its host physical address range to the node.
> > >
> > > Note that these ranges may not actually map any memory at boot time.
> > > They may describe persistent capacity or may be present to enable
> > > hot-plug.
> > >
> > > Consumers can use phys_to_target_node() to discover the NUMA node.
> > >
> > > Signed-off-by: Alison Schofield <alison.schofield@xxxxxxxxx>
> > > ---
> > >  drivers/acpi/numa/srat.c | 58 ++++++++++++++++++++++++++++++++++++++++
> > >  drivers/cxl/acpi.c       |  8 +++---
> > >  2 files changed, 63 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> > > index b8795fc49097..568e033e6c3f 100644
> > > --- a/drivers/acpi/numa/srat.c
> > > +++ b/drivers/acpi/numa/srat.c
> > > @@ -300,6 +300,61 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
> > >  }
> > >  #endif /* defined(CONFIG_X86) || defined (CONFIG_ARM64) */
> > >
> > > +/* Add a NUMA node and memblk for each node-less CFMWS */
> >
> > IMO this comment does not make sense for a function called 'parse cfmws'.  I
> > would just drop it.
> >
>
> Agree. Dropping comment.
>
> You made me stare at this func name more, and it can be better.
> I'll try this: acpi_cxl_cfmws_init()
> to be more like the others in this file, doing similar work.
>
> > > +static int __init acpi_parse_cfmws(struct acpi_table_header *acpi_cedt)
> > > +{
>
> snip
>
>
> > > +           if (numa_add_memblk(node, start, end) < 0) {
> > > +                   pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
> > > +                           node, start, end);
> >
> > Is there anything which could go wrong if the block is not added to the numa
> > node?
> >
> > Ira
> >
>
> I intended, but failed, to mimic the adding of nodes & memblocks based
> on the SRAT Memory Affinity table. There, a failure from numa_add_memblk()
> causes the entire SRAT parsing to fail, and IIUC entire acpi_numa init fails.
>
> FYI: numa_add_memblk() only completely fails (-EINVAL) if we've used up all
> the NR_NODE_MEMBLKS (which is set to MAX_NUMNODE*2)

If you're bumping up against that limit then this would have already
failed above at the acpi_map_pxm_to_node() call, right?

> My first guess would be to follow the pattern and fail the entire
> acpi_numa init. ie make acpi_numa = -1; I'll pursue that.

I think CFMWS ranges for future hotplug are in a different class than
memory currently present and marked online by the SRAT. So as long as
this implementation can determine that the range is for future
hotplug, which I believe the phys_to_target_node() pre-check handles,
then I think it is safe to note the failure and continue. With the
expectation that the CXL driver needs to handle the NUMA_NO_NODE case
at memory hot-add time.

> Now, back to your specific question: "Is there anything which could
> go wrong if the block is not added to the numa node?"
>
> numa_add_memblk(_to) can actually return success without adding a memblock.
> Like this:
> /* whine about and ignore invalid blks */
> if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
>         pr_warn("Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
>                 nid, start, end - 1);
>         return 0;
> }
>
> So, I'm going to make an assumption that it can be handled, and see if
> someone else chimes in here with more knowledge of why we can quietly
> ignore.

I wouldn't say quietly ignore as the failure here means that the
system cannot accurately place CXL memory for best performance.
There's already so many other ways that CXL memory dynamic setup can
go sideways that the lack of a dedicated NUMA node is not necessarily
fatal... in my opinion. It is a judgment call, but I think Robustness
Principle nudges it towards: "continue and try to make the best of
it".



[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux