Weird. I did check my kernel org last time on Thrusday night but did not get this. I was actually wondering the lack of feedback. Then I had suddenly huge pile of email waiting for me on Monday with bunch emails from around the time you sent this one. On Wed, Mar 03, 2021 at 04:20:03PM -0800, Dave Hansen wrote: > What changed from the last patch? > > On 3/3/21 7:03 AM, Jarkko Sakkinen wrote: > > Background > > ========== > > > > EPC section is covered by one or more SRAT entries that are associated with > > one and only one PXM (NUMA node). The motivation behind this patch is to > > provide basic elements of building allocation scheme based on this premise. > > Just like normal RAM, enclave memory (EPC) should be covered by entries > in the ACPI SRAT table. These entries allow each EPC section to be > associated with a NUMA node. > > Use this information to implement a simple NUMA-aware allocator for > enclave memory. > > > Use phys_to_target_node() to associate each NUMA node with the EPC > > sections contained within its range. In sgx_alloc_epc_page(), first try > > to allocate from the NUMA node, where the CPU is executing. If that > > fails, fallback to the legacy allocation. > > By "legacy", you mean the one from the last patch? :) > > > Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > Signed-off-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx> > > --- > > arch/x86/Kconfig | 1 + > > arch/x86/kernel/cpu/sgx/main.c | 84 ++++++++++++++++++++++++++++++++++ > > arch/x86/kernel/cpu/sgx/sgx.h | 9 ++++ > > 3 files changed, 94 insertions(+) > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index a5f6a3013138..7eb1e96cfe8a 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -1940,6 +1940,7 @@ config X86_SGX > > depends on CRYPTO_SHA256=y > > select SRCU > > select MMU_NOTIFIER > > + select NUMA_KEEP_MEMINFO if NUMA > > This dependency is worth mentioning somewhere. Why do we suddenly need > NUMA_KEEP_MEMINFO? > > > +/* Nodes with one or more EPC sections. */ > > +static nodemask_t sgx_numa_mask; > > + > > +/* > > + * Array with one list_head for each possible NUMA node. Each > > + * list contains all the sgx_epc_section's which are on that > > ^ no "'", please > > > + * node. > > + */ > > +static struct sgx_numa_node *sgx_numa_nodes; > > + > > +/* > > + * sgx_free_epc_page() uses this to find out the correct struct sgx_numa_node, > > + * to put the page in. > > + */ > > +static int sgx_section_to_numa_node_id[SGX_MAX_EPC_SECTIONS]; > > If this is per-section, why not put it in struct sgx_epc_section? Because struct sgx_epc_page does not contain a pointer to struct sgx_epc_section. > > > /* > > @@ -434,6 +451,36 @@ static bool __init sgx_page_reclaimer_init(struct list_head *laundry) > > return true; > > } > > > > +static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid) > > +{ > > + struct sgx_epc_page *page = NULL; > > + struct sgx_numa_node *sgx_node; > > + > > + if (WARN_ON_ONCE(nid < 0 || nid >= num_possible_nodes())) > > + return NULL; > > This has exactly one call-site which plumbs numa_node_id() in here > pretty directly. Is this check worthwhile? Probably not. > > + if (!node_isset(nid, sgx_numa_mask)) > > + return NULL; > > + > > + sgx_node = &sgx_numa_nodes[nid]; > > + > > + spin_lock(&sgx_free_page_list_lock); > > The glocal lock protecting a per-node structure is a bit unsightly. The patch set could introduce additional patch for changing the locking scheme. It's logically a separate change. > > + if (list_empty(&sgx_node->free_page_list)) { > > + spin_unlock(&sgx_free_page_list_lock); > > + return NULL; > > + } > > + > > + page = list_first_entry(&sgx_node->free_page_list, struct sgx_epc_page, numa_list); > > + list_del_init(&page->numa_list); > > + list_del_init(&page->list); > > + sgx_nr_free_pages--; > > + > > + spin_unlock(&sgx_free_page_list_lock); > > + > > + return page; > > +} > > + > > /** > > * __sgx_alloc_epc_page() - Allocate an EPC page > > * > > @@ -446,8 +493,14 @@ static bool __init sgx_page_reclaimer_init(struct list_head *laundry) > > */ > > struct sgx_epc_page *__sgx_alloc_epc_page(void) > > { > > + int current_nid = numa_node_id(); > > struct sgx_epc_page *page; > > > > + /* Try to allocate EPC from the current node, first: */ > > + page = __sgx_alloc_epc_page_from_node(current_nid); > > + if (page) > > + return page; > > + > > spin_lock(&sgx_free_page_list_lock); > > > > if (list_empty(&sgx_free_page_list)) { > > @@ -456,6 +509,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) > > } > > > > page = list_first_entry(&sgx_free_page_list, struct sgx_epc_page, list); > > + list_del_init(&page->numa_list); > > list_del_init(&page->list); > > sgx_nr_free_pages--; > > I would much rather prefer that this does what the real page allocator > does: kep the page on a single list. That list is maintained > per-NUMA-node. Allocations try local NUMA node structures, then fall > back to other structures (hopefully in a locality-aware fashion). > > I wrote you the loop that I want to see this implement in an earlier > review. This, basically: > > page = NULL; > nid = numa_node_id(); > while (true) { > page = __sgx_alloc_epc_page_from_node(nid); > if (page) > break; > > nid = // ... some search here, next_node_in()... > // check if we wrapped around: > if (nid == numa_node_id()) > break; > } > > There's no global list. You just walk around nodes trying to find one > with space. If you wrap around, you stop. > > Please implement this. If you think it's a bad idea, or can't, let's > talk about it in advance. Right now, it appears that my review comments > aren't being incorporated into newer versions. How I interpreted your earlier comments is that the fallback is unfair and this patch set version does fix that. I can buy the above allocation scheme, but I don't think this patch set version is a step backwards. The things done to struct sgx_epc_section are exactly what should be done to it. Implementation-wise you are asking me to squash 4/5 and 5/5 into a single patch, and remove global list. It's a tiny iteration from this patch version and I can do it. > > void sgx_free_epc_page(struct sgx_epc_page *page) > > { > > + int nid = sgx_section_to_numa_node_id[page->section]; > > + struct sgx_numa_node *sgx_node = &sgx_numa_nodes[nid]; > > int ret; > > > > WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); > > @@ -575,7 +631,15 @@ void sgx_free_epc_page(struct sgx_epc_page *page) > > return; > > > > spin_lock(&sgx_free_page_list_lock); > > + > > + /* Enable NUMA local allocation in sgx_alloc_epc_page(). */ > > + if (!node_isset(nid, sgx_numa_mask)) { > > + INIT_LIST_HEAD(&sgx_node->free_page_list); > > + node_set(nid, sgx_numa_mask); > > + } > > + > > list_add_tail(&page->list, &sgx_free_page_list); > > + list_add_tail(&page->numa_list, &sgx_node->free_page_list); > > sgx_nr_free_pages++; > > spin_unlock(&sgx_free_page_list_lock); > > } > > @@ -626,8 +690,28 @@ static bool __init sgx_page_cache_init(struct list_head *laundry) > > { > > u32 eax, ebx, ecx, edx, type; > > u64 pa, size; > > + int nid; > > int i; > > > > + nodes_clear(sgx_numa_mask); > > Is this really required for a variable allocated in .bss? Probably not, I'll check what nodes_clear() does. > > + sgx_numa_nodes = kmalloc_array(num_possible_nodes(), sizeof(*sgx_numa_nodes), GFP_KERNEL); > > This is what I was looking for here, thanks! > > > + /* > > + * Create NUMA node lookup table for sgx_free_epc_page() as the very > > + * first step, as it is used to populate the free list's during the > > + * initialization. > > + */ > > + for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) { > > + nid = numa_map_to_online_node(phys_to_target_node(pa)); > > + if (nid == NUMA_NO_NODE) { > > + /* The physical address is already printed above. */ > > + pr_warn(FW_BUG "Unable to map EPC section to online node. Fallback to the NUMA node 0.\n"); > > + nid = 0; > > + } > > + > > + sgx_section_to_numa_node_id[i] = nid; > > + } > > + > > for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) { > > cpuid_count(SGX_CPUID, i + SGX_CPUID_EPC, &eax, &ebx, &ecx, &edx); > > > > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h > > index 41ca045a574a..3a3c07fc0c8e 100644 > > --- a/arch/x86/kernel/cpu/sgx/sgx.h > > +++ b/arch/x86/kernel/cpu/sgx/sgx.h > > @@ -27,6 +27,7 @@ struct sgx_epc_page { > > unsigned int flags; > > struct sgx_encl_page *owner; > > struct list_head list; > > + struct list_head numa_list; > > }; > > I'll say it again, explicitly: Each sgx_epc_page should be on one and > only one free list: a per-NUMA-node list. > > > /* > > @@ -43,6 +44,14 @@ struct sgx_epc_section { > > > > extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; > > > > +/* > > + * Contains the tracking data for NUMA nodes having EPC pages. Most importantly, > > + * the free page list local to the node is stored here. > > + */ > > +struct sgx_numa_node { > > + struct list_head free_page_list; > > +}; > > I think it's unconscionable to leave this protected by a global lock. > Please at least give us a per-node spinlock proteting this list. I can do it but I'll add a separate commit for it. It's better to make locking scheme changes that way (IMHO). Helps with bisection later on... /Jarkko