[ add Mike, see "[Mike]" note below... ] Alison Schofield wrote: > On Sat, Jun 03, 2023 at 04:53:13PM -0700, Dan Williams wrote: > > alison.schofield@ wrote: > > > From: Alison Schofield <alison.schofield@xxxxxxxxx> > > > > > > numa_fill_memblks() fills in the gaps in numa_meminfo memblks > > > over an HPA address range. > > > > > > The initial use case is the ACPI driver that needs to extend > > > SRAT defined proximity domains to an entire CXL CFMWS Window[1]. > > > > I feel like this demands more explanation because the "need" is not > > apparent. In fact its a Linux policy choice not a requirement. The next > > patch has some of this, but this story is needed earlier for someone > > that reads this patch first. Something like: > > > > Hi Dan, > > Thanks for the review :) > > Sure, I can add the story below to make the 'need' for this function > more apparent, as well as s/needs/want so as not to conflate need with > requirement. > > > --- > > > > The CFWMS is an ACPI data structure that indicates *potential* locations > > where CXL memory can be placed. It is the playground where the CXL > > driver has free reign to establish regions. That space can be populated > > by BIOS created regions, or driver created regions, after hotplug or > > other reconfiguration. > > > > When the BIOS creates a region in a CXL Window it additionally describes > > that subset of the Window range in the other typical ACPI tables SRAT, > > SLIT, and HMAT. The rationale for the BIOS not pre-describing the entire > > CXL Window in SRAT, SLIT, and HMAT is that it can not predict the > > future. I.e. there is nothing stopping higher or lower performance > > devices being placed in the same Window. Compare that to ACPI memory > > hotplug that just onlines additional capacity in the proximity domain > > with little freedom for dynamic performance differentiation. > > > > That leaves the OS with a choice, should unpopulated window capacity > > match the proximity domain of an existing region, or should it allocate > > a new one? This patch takes the simple position of minimizing proximity > > domain proliferation and reuse any proximity domain intersection for the > > entire Window. If the Window has no intersections then allocate a new > > proximity domain. Note that SRAT, SLIT and HMAT information can be > > enumerated dynamically in a standard way from device provided data. > > Think of CXL as the end of ACPI needing to describe memory attributes, > > CXL offers a standard discovery model for performance attributes, but > > Linux still needs to interoperate with the old regime. > > > > --- > > > > > > > > The APCI driver expects to use numa_fill_memblks() while parsing > > > > s/APCI/ACPI/ > > > > Again, the ACPI code does not have any expectation, this is pure OS > > policy decision about how to handle undescribed memory. > > > > The intent was to show the pending use case, perhaps 'wants to' use > this function to enact a purely OS policy decision! Sounds good, yeah I tend to read "need" as a requirement and assume that Linux is out of spec or something breaks if it does not do the needed thing. > > > > > the CFMWS. Extending the memblks created during SRAT parsing, to > > > cover the entire CFMWS Window, is desirable because everything in > > > a CFMWS Window is expected to be of a similar performance class. > > > > > > Requires CONFIG_NUMA_KEEP_MEMINFO. > > > > Not sure this adds anything to the description. > > > > > > > > [1] A CXL CFMWS Window represents a contiguous CXL memory resource, > > > aka an HPA range. The CFMWS (CXL Fixed Memory Window Structure) is > > > part of the ACPI CEDT (CXL Early Discovery Table). > > > > > > Signed-off-by: Alison Schofield <alison.schofield@xxxxxxxxx> > > > --- > > > arch/x86/include/asm/sparsemem.h | 2 + > > > arch/x86/mm/numa.c | 82 ++++++++++++++++++++++++++++++++ > > > include/linux/numa.h | 7 +++ > > > 3 files changed, 91 insertions(+) > > > > > > diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h > > > index 64df897c0ee3..1be13b2dfe8b 100644 > > > --- a/arch/x86/include/asm/sparsemem.h > > > +++ b/arch/x86/include/asm/sparsemem.h > > > @@ -37,6 +37,8 @@ extern int phys_to_target_node(phys_addr_t start); > > > #define phys_to_target_node phys_to_target_node > > > extern int memory_add_physaddr_to_nid(u64 start); > > > #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid > > > +extern int numa_fill_memblks(u64 start, u64 end); > > > +#define numa_fill_memblks numa_fill_memblks > > > > What is this for? The other defines are due to being an arch-specific > > API and the #define is how the arch declares that it has a local version > > to replace the generic one. > > That define, along with the numa.h change below, are to support builds of > CONFIG_ARM64 and CONFIG_LOONGARCH, both include the caller acpi_parse_cfmws(), > of numa_fill_memblks(). [Mike] Hmm, ok, but this is piling onto the maintenance burden of x86 not getting onboard with memblock for numa info yet. At a minimum that avoidance of touching the ARM64 and LOONGARCH cases needs to be called out, but it would be useful to have a discussion about the options here with questions like: - What's blocking x86 from switching to memblock? - Or, does the memblock API support what numa_fill_memblks() wants to do? I.e. add a real numa_fill_memblks() implementation to drivers/base/arch_numa.c rather than skip SRAT based fixups for the generic case. Last I remember it was the conceptual disconnect of x86 not marking Reserved ranges as memory like other architectures: https://lore.kernel.org/all/20200708091520.GE128651@xxxxxxxxxx/ ...but its been a while since this last came up and I have not been following memblock developments. Maybe the anwser is the same in the end, add x86-specific numa_fill_memblks, but this is as good a time as any to revisit carrying that burden. [..] snipped the rest, looks like we are aligned there.