Re: [PATCH v3 5/5] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()

Dave Hansen <dave.hansen@xxxxxxxxx> · Wed, 10 Mar 2021 07:44:39 -0800

>>> + * node.
>>> + */
>>> +static struct sgx_numa_node *sgx_numa_nodes;
>>> +
>>> +/*
>>> + * sgx_free_epc_page() uses this to find out the correct struct sgx_numa_node,
>>> + * to put the page in.
>>> + */
>>> +static int sgx_section_to_numa_node_id[SGX_MAX_EPC_SECTIONS];
>>
>> If this is per-section, why not put it in struct sgx_epc_section?
> 
> Because struct sgx_epc_page does not contain a pointer to
> struct sgx_epc_section.

Currently, we have epc_page->section.  That's not changing.  But, with
the free list moving from sgx_epc_section to sgx_numa_node, we need a
way to get from page->node, not just page->section.

We can either add that to:

	struct sgx_epc_section {
		...
+		struct sgx_numa_node *node;
	}

so we can do epc_page->section->node to find the epc_page's free list,
or we could just do:

 struct sgx_epc_page {
- 	unsigned int section;
+ 	unsigned int node;
 	unsigned int flags;
 	struct sgx_encl_page *owner;
 	struct list_head list;
	struct list_head numa_list;
 };

and go from page->node directly.

>>>  	page = list_first_entry(&sgx_free_page_list, struct sgx_epc_page, list);
>>> +	list_del_init(&page->numa_list);
>>>  	list_del_init(&page->list);
>>>  	sgx_nr_free_pages--;
>>
>> I would much rather prefer that this does what the real page allocator
>> does: kep the page on a single list.  That list is maintained
>> per-NUMA-node.  Allocations try local NUMA node structures, then fall
>> back to other structures (hopefully in a locality-aware fashion).
>>
>> I wrote you the loop that I want to see this implement in an earlier
>> review.  This, basically:
>>
>> 	page = NULL;
>> 	nid = numa_node_id();
>> 	while (true) {
>> 		page = __sgx_alloc_epc_page_from_node(nid);	
>> 		if (page)
>> 			break;
>>
>> 		nid = // ... some search here, next_node_in()...
>> 		// check if we wrapped around:
>> 		if (nid == numa_node_id())
>> 			break;
>> 	}
>>
>> There's no global list.  You just walk around nodes trying to find one
>> with space.  If you wrap around, you stop.
>>
>> Please implement this.  If you think it's a bad idea, or can't, let's
>> talk about it in advance.  Right now, it appears that my review comments
>> aren't being incorporated into newer versions.
> 
> How I interpreted your earlier comments is that the fallback is unfair and
> this patch set version does fix that. 
> 
> I can buy the above allocation scheme, but I don't think this patch set
> version is a step backwards. The things done to struct sgx_epc_section
> are exactly what should be done to it.

To me, it's a step backwards.  It regresses in that it falls back to an
entirely non-NUMA aware allocation mechanism.  The global list is
actually likely to be even worse than the per-section searches because
it has a global lock as opposed to the at per-section locks.  It also
has the overhead of managing two lists instead of one.

So, yes, it is *fair* in terms of NUMA node pressure.  But being fair in
a NUMA-aware allocator by simply not doing NUMA at all is a regression.

> Implementation-wise you are asking me to squash 4/5 and 5/5 into a single
> patch, and remove global list. It's a tiny iteration from this patch
> version and I can do it.

Sounds good.