alison.schofield@ wrote: > From: Alison Schofield <alison.schofield@xxxxxxxxx> > > numa_fill_memblks() fills in the gaps in numa_meminfo memblks > over an HPA address range. > > The ACPI driver will use numa_fill_memblks() to implement a new Linux > policy that prescribes extending proximity domains in a portion of a > CFMWS window to the entire window. > > Dan Williams offered this explanation of the policy: > A CFWMS is an ACPI data structure that indicates *potential* locations > where CXL memory can be placed. It is the playground where the CXL > driver has free reign to establish regions. That space can be populated > by BIOS created regions, or driver created regions, after hotplug or > other reconfiguration. > > When BIOS creates a region in a CXL Window it additionally describes > that subset of the Window range in the other typical ACPI tables SRAT, > SLIT, and HMAT. The rationale for BIOS not pre-describing the entire > CXL Window in SRAT, SLIT, and HMAT is that it can not predict the > future. I.e. there is nothing stopping higher or lower performance > devices being placed in the same Window. Compare that to ACPI memory > hotplug that just onlines additional capacity in the proximity domain > with little freedom for dynamic performance differentiation. > > That leaves the OS with a choice, should unpopulated window capacity > match the proximity domain of an existing region, or should it allocate > a new one? This patch takes the simple position of minimizing proximity > domain proliferation by reusing any proximity domain intersection for > the entire Window. If the Window has no intersections then allocate a > new proximity domain. Note that SRAT, SLIT and HMAT information can be > enumerated dynamically in a standard way from device provided data. > Think of CXL as the end of ACPI needing to describe memory attributes, > CXL offers a standard discovery model for performance attributes, but > Linux still needs to interoperate with the old regime. > > Reported-by: Derick Marks <derick.w.marks@xxxxxxxxx> > Suggested-by: Dan Williams <dan.j.williams@xxxxxxxxx> > Signed-off-by: Alison Schofield <alison.schofield@xxxxxxxxx> > Tested-by: Derick Marks <derick.w.marks@xxxxxxxxx> [..] > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > index 2aadb2019b4f..fa82141d1a04 100644 > --- a/arch/x86/mm/numa.c > +++ b/arch/x86/mm/numa.c [..] > @@ -961,4 +962,90 @@ int memory_add_physaddr_to_nid(u64 start) > return nid; > } > EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > + > +static int __init cmp_memblk(const void *a, const void *b) > +{ > + const struct numa_memblk *ma = *(const struct numa_memblk **)a; > + const struct numa_memblk *mb = *(const struct numa_memblk **)b; > + > + if (ma->start != mb->start) > + return (ma->start < mb->start) ? -1 : 1; > + > + /* Caller handles duplicate start addresses */ > + return 0; This can be simplified to: static int __init cmp_memblk(const void *a, const void *b) { const struct numa_memblk *ma = *(const struct numa_memblk **)a; const struct numa_memblk *mb = *(const struct numa_memblk **)b; return ma->start - mb->start; } > +} > + > +static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata; > + > +/** > + * numa_fill_memblks - Fill gaps in numa_meminfo memblks > + * @start: address to begin fill > + * @end: address to end fill > + * > + * Find and extend numa_meminfo memblks to cover the @start-@end > + * HPA address range, such that the first memblk includes @start, > + * the last memblk includes @end, and any gaps in between are > + * filled. > + * > + * RETURNS: > + * 0 : Success > + * NUMA_NO_MEMBLK : No memblk exists in @start-@end range > + */ > + > +int __init numa_fill_memblks(u64 start, u64 end) > +{ > + struct numa_memblk **blk = &numa_memblk_list[0]; > + struct numa_meminfo *mi = &numa_meminfo; > + int count = 0; > + u64 prev_end; > + > + /* > + * Create a list of pointers to numa_meminfo memblks that > + * overlap start, end. Exclude (start == bi->end) since > + * end addresses in both a CFMWS range and a memblk range > + * are exclusive. > + * > + * This list of pointers is used to make in-place changes > + * that fill out the numa_meminfo memblks. > + */ Thanks for this comment, looks good. > + for (int i = 0; i < mi->nr_blks; i++) { > + struct numa_memblk *bi = &mi->blk[i]; > + > + if (start < bi->end && end >= bi->start) { > + blk[count] = &mi->blk[i]; > + count++; > + } > + } > + if (!count) > + return NUMA_NO_MEMBLK; > + > + /* Sort the list of pointers in memblk->start order */ > + sort(&blk[0], count, sizeof(blk[0]), cmp_memblk, NULL); > + > + /* Make sure the first/last memblks include start/end */ > + blk[0]->start = min(blk[0]->start, start); > + blk[count - 1]->end = max(blk[count - 1]->end, end); > + > + /* > + * Fill any gaps by tracking the previous memblks end address, > + * prev_end, and backfilling to it if needed. Avoid filling > + * overlapping memblks by making prev_end monotonically non- > + * decreasing. I am not immediately understanding the use of the term monotonically non-decreasing here? I think the first sentence of this comment is enough, or am I missing a nuance? > + */ > + prev_end = blk[0]->end; > + for (int i = 1; i < count; i++) { > + struct numa_memblk *curr = blk[i]; > + > + if (prev_end >= curr->start) { > + if (prev_end < curr->end) > + prev_end = curr->end; > + } else { > + curr->start = prev_end; > + prev_end = curr->end; > + } > + } > + return 0; > +} > +EXPORT_SYMBOL_GPL(numa_fill_memblks); This export is not needed. The only caller of this is drivers/acpi/numa/srat.c which is only ever built-in, not a module.