On Mon, Oct 28, 2024 at 07:24:54PM +0200, Mike Rapoport wrote: > On Tue, Oct 22, 2024 at 05:34:50PM -0400, Gregory Price wrote: > > Capacity is stranded when CFMWS regions are not aligned to block size. > > On x86, block size increases with capacity (2G blocks @ 64G capacity). > > > > Use CFMWS base/size to report memory block size alignment advice. > > > > After the alignment, the acpi code begins populating numa nodes with > > memblocks, so probe the value just prior to lock it in. All future > > callers should be providing advice prior to this point. > > > > Suggested-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > Signed-off-by: Gregory Price <gourry@xxxxxxxxxx> > > --- > > drivers/acpi/numa/srat.c | 33 +++++++++++++++++++++++++++++++++ > > 1 file changed, 33 insertions(+) > > ... snip ... > > + /* Align memblock size to CFMW regions if possible */ > > + acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_align_cfmws, NULL); > > + > > + /* > > + * Nodes start populating with blocks after this, so probe the max > > + * block size to prevent it from changing in the future. > > + */ > > + memory_block_probe_max_size(); > > + > > It won't change, but how drivers/base/memory.c will know about the probed > size if architecture does not override memory_block_size_bytes()? > non-arch code should be calling memory_block_size_bytes() to discover the actual size of blocks - and for archs that care about this value, that is when it should be probed. It's up to the arch whether/how to use this information. Many archs ignore it entirely and use MIN_BLOCK_SIZE. basically non-arch code shouldn't care what this value is, and even most arch code shouldn't care. I added this call to probe to lock in the size since I saw that nodes will start populating blocks immediately after this. Possibly the APIs should be marked __init so that the whole interface disappears after init to avoid misuse post-init. Possibly probe() should return -EBUSY if called more than once to enforce a particular probe pattern on the architectures? Open to thoughts here. > > /* fake_pxm is the next unused PXM value after SRAT parsing */ > > for (i = 0, fake_pxm = -1; i < MAX_NUMNODES; i++) { > > if (node_to_pxm_map[i] > fake_pxm) > > -- > > 2.43.0 > > > > -- > Sincerely yours, > Mike.