On 1/14/21 9:54 AM, Jarkko Sakkinen wrote: > On Tue, Jan 12, 2021 at 04:24:01PM -0800, Dave Hansen wrote: >> We need a bit more information here as well. What's the relationship >> between NUMA nodes and sections? How does the BIOS tell us which NUMA >> nodes a section is in? Is it the same or different from normal RAM and >> PMEM? > > How does it go with pmem? I just wanted to point out PMEM as being referred to by the SRAT, but as something which is *not* "System RAM". There might be some overlap in NUMA for PMEM and NUMA for SGX memory since neither is enumerated as "System RAM". ... >> I'm not positive this works. I *thought* these ->node_start_pfn and >> ->node_spanned_pages are really only guaranteed to cover memory which is >> managed by the kernel and has 'struct page' for it. >> >> EPC doesn't have a 'struct page', so won't necessarily be covered by the >> pgdat-> and zone-> ranges. I *think* you may have to go all the way >> back to the ACPI SRAT for this. >> >> It would also be *possible* to have an SRAT constructed like this: >> >> 0->1GB System RAM - Node 0 >> 1->2GB Reserved - Node 1 >> 2->3GB System RAM - Node 0 >> >> Where the 1->2GB is EPC. The Node 0 pg_data_t would be: >> >> pgdat->node_start_pfn = 0 >> pgdat->node_spanned_pages = 3GB > > If I've understood the current Linux memory architecture correctly. > > - Memory is made available through mm/memory_hotplug.c, which is populated > by drivers/acpi/acpi_memhotplug.c. > - drivers/acpi/numa/srat.c provides the conversion API from proximity node to > logical node but I'm not *yet* sure how the interaction goes with memory > hot plugging > > I'm not sure of I'm following the idea of alternative SRAT construciton. > So are you saying that srat.c would somehow group pxm's with EPC to > specific node numbers? Basically, go look at the "SRAT:" messages in boot. Are there SRAT entries that cover all the EPC? For instance, take this SRAT: [ 0.000000] ACPI: SRAT: Node 1 PXM 2 [mem 0x00000000-0xcfffffff] [ 0.000000] ACPI: SRAT: Node 1 PXM 2 [mem 0x100000000-0x82fffffff] [ 0.000000] ACPI: SRAT: Node 0 PXM 1 [mem 0x830000000-0xe2fffffff] If EPC were at 0x100000000, we would be in good shape. It is covered by an SRAT entry that Linux parses as RAM. But, if it were at 0xd0000000, it would be in an SRAT "hole", uncovered by an SRAT entry. In this case, since 'Node 1" spans that hole the "Node 1" pgdat would span this hole. But, if some memory was removed from the system, "Node 1" might no longer span that hole and EPC in this hole would not be assignable to Node 1. Please just make sure that there *ARE* SRAT entries that cover EPC memory ranges.