On 12/16/20 5:50 AM, Jarkko Sakkinen wrote: > Create a pointer array for each NUMA node with the references to the > contained EPC sections. Use this in __sgx_alloc_epc_page() to knock the > current NUMA node before the others. It makes it harder to comment when I'm not on cc. Hint, hint... ;) We need a bit more information here as well. What's the relationship between NUMA nodes and sections? How does the BIOS tell us which NUMA nodes a section is in? Is it the same or different from normal RAM and PMEM? > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index c519fc5f6948..0da510763c47 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -13,6 +13,13 @@ > #include "encl.h" > #include "encls.h" > > +struct sgx_numa_node { > + struct sgx_epc_section *sections[SGX_MAX_EPC_SECTIONS]; > + int nr_sections; > +}; So, we have a 'NUMA node' structure already: pg_data_t. Why don't we just hang the epc sections off there? > +static struct sgx_numa_node sgx_numa_nodes[MAX_NUMNODES]; Hmm... Time to see if I can still do math. #define SGX_MAX_EPC_SECTIONS 8 (sizeof(struct sgx_epc_section *) + sizeof(int)) * 8 * MAX_NUMNODES CONFIG_NODES_SHIFT=10 (on Fedora) #define MAX_NUMNODES (1 << NODES_SHIFT) 12*8*1024 = ~100k. Yikes. For *EVERY* system that enables SGX, regardless if they are NUMA or not.' Trivial is great, this may be too trivial. Adding a list_head to pg_data_t and sgx_epc_section would add SGX_MAX_EPC_SECTIONS*sizeof(list_head)=192 bytes plus 16 bytes per *present* NUMA node. > /** > * __sgx_alloc_epc_page() - Allocate an EPC page > * > @@ -485,14 +511,19 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec > */ > struct sgx_epc_page *__sgx_alloc_epc_page(void) > { > - struct sgx_epc_section *section; > struct sgx_epc_page *page; > + int nid = numa_node_id(); > int i; > > - for (i = 0; i < sgx_nr_epc_sections; i++) { > - section = &sgx_epc_sections[i]; > + page = __sgx_alloc_epc_page_from_node(nid); > + if (page) > + return page; > > - page = __sgx_alloc_epc_page_from_section(section); > + for (i = 0; i < sgx_nr_numa_nodes; i++) { > + if (i == nid) > + continue; Yikes. That's a horribly inefficient loop. Consider if nodes 0 and 1023 were the only ones with EPC. What would this loop do? I think it's a much better idea to keep a nodemask_t of nodes that have EPC. Then, just do bitmap searches. > + page = __sgx_alloc_epc_page_from_node(i); > if (page) > return page; > } > @@ -661,11 +692,28 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high) > ((high & GENMASK_ULL(19, 0)) << 32); > } > > +static int __init sgx_pfn_to_nid(unsigned long pfn) > +{ > + pg_data_t *pgdat; > + int nid; > + > + for (nid = 0; nid < nr_node_ids; nid++) { > + pgdat = NODE_DATA(nid); > + > + if (pfn >= pgdat->node_start_pfn && > + pfn < (pgdat->node_start_pfn + pgdat->node_spanned_pages)) > + return nid; > + } > + > + return 0; > +} I'm not positive this works. I *thought* these ->node_start_pfn and ->node_spanned_pages are really only guaranteed to cover memory which is managed by the kernel and has 'struct page' for it. EPC doesn't have a 'struct page', so won't necessarily be covered by the pgdat-> and zone-> ranges. I *think* you may have to go all the way back to the ACPI SRAT for this. It would also be *possible* to have an SRAT constructed like this: 0->1GB System RAM - Node 0 1->2GB Reserved - Node 1 2->3GB System RAM - Node 0 Where the 1->2GB is EPC. The Node 0 pg_data_t would be: pgdat->node_start_pfn = 0 pgdat->node_spanned_pages = 3GB