On Thu, Jan 14, 2021 at 10:35:03AM -0800, Dave Hansen wrote: > On 1/14/21 9:54 AM, Jarkko Sakkinen wrote: > > On Tue, Jan 12, 2021 at 04:24:01PM -0800, Dave Hansen wrote: > >> We need a bit more information here as well. What's the relationship > >> between NUMA nodes and sections? How does the BIOS tell us which NUMA > >> nodes a section is in? Is it the same or different from normal RAM and > >> PMEM? > > > > How does it go with pmem? > > I just wanted to point out PMEM as being referred to by the SRAT, but as > something which is *not* "System RAM". There might be some overlap in > NUMA for PMEM and NUMA for SGX memory since neither is enumerated as > "System RAM". Right. > ... > >> I'm not positive this works. I *thought* these ->node_start_pfn and > >> ->node_spanned_pages are really only guaranteed to cover memory which is > >> managed by the kernel and has 'struct page' for it. > >> > >> EPC doesn't have a 'struct page', so won't necessarily be covered by the > >> pgdat-> and zone-> ranges. I *think* you may have to go all the way > >> back to the ACPI SRAT for this. > >> > >> It would also be *possible* to have an SRAT constructed like this: > >> > >> 0->1GB System RAM - Node 0 > >> 1->2GB Reserved - Node 1 > >> 2->3GB System RAM - Node 0 > >> > >> Where the 1->2GB is EPC. The Node 0 pg_data_t would be: > >> > >> pgdat->node_start_pfn = 0 > >> pgdat->node_spanned_pages = 3GB > > > > If I've understood the current Linux memory architecture correctly. > > > > - Memory is made available through mm/memory_hotplug.c, which is populated > > by drivers/acpi/acpi_memhotplug.c. > > - drivers/acpi/numa/srat.c provides the conversion API from proximity node to > > logical node but I'm not *yet* sure how the interaction goes with memory > > hot plugging > > > > I'm not sure of I'm following the idea of alternative SRAT construciton. > > So are you saying that srat.c would somehow group pxm's with EPC to > > specific node numbers? > > Basically, go look at the "SRAT:" messages in boot. Are there SRAT > entries that cover all the EPC? For instance, take this SRAT: > > [ 0.000000] ACPI: SRAT: Node 1 PXM 2 [mem 0x00000000-0xcfffffff] > [ 0.000000] ACPI: SRAT: Node 1 PXM 2 [mem 0x100000000-0x82fffffff] > [ 0.000000] ACPI: SRAT: Node 0 PXM 1 [mem 0x830000000-0xe2fffffff] Right! > If EPC were at 0x100000000, we would be in good shape. It is covered by > an SRAT entry that Linux parses as RAM. But, if it were at 0xd0000000, > it would be in an SRAT "hole", uncovered by an SRAT entry. In this > case, since 'Node 1" spans that hole the "Node 1" pgdat would span this > hole. But, if some memory was removed from the system, "Node 1" might > no longer span that hole and EPC in this hole would not be assignable to > Node 1. > > Please just make sure that there *ARE* SRAT entries that cover EPC > memory ranges. OK, I'm on page now, thanks. /Jarkko