Re: [PATCH RFC] x86/sgx: Add trivial NUMA allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/16/20 5:50 AM, Jarkko Sakkinen wrote:
> Create a pointer array for each NUMA node with the references to the
> contained EPC sections. Use this in __sgx_alloc_epc_page() to knock the
> current NUMA node before the others.

It makes it harder to comment when I'm not on cc.

Hint, hint... ;)

We need a bit more information here as well.  What's the relationship
between NUMA nodes and sections?  How does the BIOS tell us which NUMA
nodes a section is in?  Is it the same or different from normal RAM and
PMEM?

> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index c519fc5f6948..0da510763c47 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -13,6 +13,13 @@
>  #include "encl.h"
>  #include "encls.h"
>  
> +struct sgx_numa_node {
> +	struct sgx_epc_section *sections[SGX_MAX_EPC_SECTIONS];
> +	int nr_sections;
> +};

So, we have a 'NUMA node' structure already: pg_data_t.  Why don't we
just hang the epc sections off there?

> +static struct sgx_numa_node sgx_numa_nodes[MAX_NUMNODES];

Hmm...  Time to see if I can still do math.

#define SGX_MAX_EPC_SECTIONS            8

(sizeof(struct sgx_epc_section *) + sizeof(int)) * 8 * MAX_NUMNODES

CONFIG_NODES_SHIFT=10 (on Fedora)
#define MAX_NUMNODES (1 << NODES_SHIFT)

12*8*1024 = ~100k.  Yikes.  For *EVERY* system that enables SGX,
regardless if they are NUMA or not.'

Trivial is great, this may be too trivial.

Adding a list_head to pg_data_t and sgx_epc_section would add
SGX_MAX_EPC_SECTIONS*sizeof(list_head)=192 bytes plus 16 bytes per
*present* NUMA node.

>  /**
>   * __sgx_alloc_epc_page() - Allocate an EPC page
>   *
> @@ -485,14 +511,19 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
>   */
>  struct sgx_epc_page *__sgx_alloc_epc_page(void)
>  {
> -	struct sgx_epc_section *section;
>  	struct sgx_epc_page *page;
> +	int nid = numa_node_id();
>  	int i;
>  
> -	for (i = 0; i < sgx_nr_epc_sections; i++) {
> -		section = &sgx_epc_sections[i];
> +	page = __sgx_alloc_epc_page_from_node(nid);
> +	if (page)
> +		return page;
>  
> -		page = __sgx_alloc_epc_page_from_section(section);
> +	for (i = 0; i < sgx_nr_numa_nodes; i++) {
> +		if (i == nid)
> +			continue;

Yikes.  That's a horribly inefficient loop.  Consider if nodes 0 and
1023 were the only ones with EPC.  What would this loop do?  I think
it's a much better idea to keep a nodemask_t of nodes that have EPC.
Then, just do bitmap searches.

> +		page = __sgx_alloc_epc_page_from_node(i);
>  		if (page)
>  			return page;
>  	}
> @@ -661,11 +692,28 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
>  	       ((high & GENMASK_ULL(19, 0)) << 32);
>  }
>  
> +static int __init sgx_pfn_to_nid(unsigned long pfn)
> +{
> +	pg_data_t *pgdat;
> +	int nid;
> +
> +	for (nid = 0; nid < nr_node_ids; nid++) {
> +		pgdat = NODE_DATA(nid);
> +
> +		if (pfn >= pgdat->node_start_pfn &&
> +		    pfn < (pgdat->node_start_pfn + pgdat->node_spanned_pages))
> +			return nid;
> +	}
> +
> +	return 0;
> +}

I'm not positive this works.  I *thought* these ->node_start_pfn and
->node_spanned_pages are really only guaranteed to cover memory which is
managed by the kernel and has 'struct page' for it.

EPC doesn't have a 'struct page', so won't necessarily be covered by the
pgdat-> and zone-> ranges.  I *think* you may have to go all the way
back to the ACPI SRAT for this.

It would also be *possible* to have an SRAT constructed like this:

0->1GB System RAM - Node 0
1->2GB Reserved   - Node 1
2->3GB System RAM - Node 0

Where the 1->2GB is EPC.  The Node 0 pg_data_t would be:

	pgdat->node_start_pfn = 0
	pgdat->node_spanned_pages = 3GB






[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux