Re: [PATCH v3 1/2] x86/sgx: Add accounting for tracking overcommit

Jarkko Sakkinen <jarkko@xxxxxxxxxx> · Thu, 20 Jan 2022 15:07:03 +0200

On Tue, 2022-01-18 at 09:57 -0800, Kristen Carlson Accardi wrote:
> When the system runs out of enclave memory, SGX can reclaim EPC pages
> by swapping to normal RAM. This normal RAM is allocated via a
> per-enclave shared memory area. The shared memory area is not mapped
> into the enclave or the task mapping it, which makes its memory use
> opaque (including to the OOM killer). Having lots of hard to find
> memory around is problematic, especially when there is no limit.
> 
> Introduce a global counter that can be used to limit the number of
> pages
> that enclaves are able to consume for backing storage.  This
> parameter
> is a percentage value that is used in conjunction with the number of
> EPC pages in the system to set a cap on the amount of backing RAM
> that
> can be consumed.
> 
> The default for this value is 150, which limits the total number of
> shared memory pages that may be consumed by all enclaves as backing
> pages to 1.5X of EPC pages on the system. For example, on an SGX
> system that has 128MB of EPC, this default would cap the amount of
> normal RAM that SGX consumes for its shared memory areas at 192MB.
> The value of 1.5x the number of EPC pages was chosen because it
> should
> handle the most common case of a few enclaves that don't need much
> overcommit without any impact to user space. In the less common case
> where there are many enclaves, or a few large enclaves which need
> a lot of overcommit due to large EPC memory requirements, the
> reclaimer may fail to allocate a backing page for swapping if the
> limit has been reached. In this case, the page will not be able
> to allocate any new EPC pages. Any ioctl or call to add new EPC
> pages will get -ENOMEM, so for example, new enclaves will fail to
> load, and new EPC pages will not be able to be added.
> 
> The SGX overcommit_percent works differently than the core VM
> overcommit
> limit. Enclaves request backing pages one page at a time, and the
> number
> of in use backing pages that are allowed is a global resource that is
> limited for all enclaves.
> 
> Introduce a pair of functions which can be used by callers when
> requesting
> backing RAM pages. These functions are responsible for accounting the
> page charges. A request may return an error if the request will cause
> the
> counter to exceed the backing page cap.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>
> Tested-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 45
> ++++++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/sgx.h  |  2 ++
>  2 files changed, 47 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c
> b/arch/x86/kernel/cpu/sgx/main.c
> index 2857a49f2335..261e3702aef9 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -43,6 +43,45 @@ static struct sgx_numa_node *sgx_numa_nodes;
>  
>  static LIST_HEAD(sgx_dirty_page_list);
>  
> +/*
> + * Limits the amount of normal RAM that SGX can consume for EPC
> + * overcommit to the total EPC pages * sgx_overcommit_percent / 100
> + */
> +static int sgx_overcommit_percent = 150;
> +
> +/* The number of pages that can be allocated globally for backing
> storage. */
> +static atomic_long_t sgx_nr_available_backing_pages;
> +
> +/**
> + * sgx_charge_mem() - charge for a page used for backing storage
> + *
> + * Backing storage usage is capped by the
> sgx_nr_available_backing_pages.
> + * If the backing storage usage is over the overcommit limit,
> + * return an error.
> + *
> + * Return:
> + * 0:          The page requested does not exceed the limit
> + * -ENOMEM:    The page requested exceeds the overcommit limit
> + */
> +int sgx_charge_mem(void)
> +{
> +       if (!atomic_long_add_unless(&sgx_nr_available_backing_pages,
> -1, 0))
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +
> +/**
> + * sgx_uncharge_mem() - uncharge a page previously used for backing
> storage
> + *
> + * When backing storage is no longer in use, increment the
> + * sgx_nr_available_backing_pages counter.
> + */
> +void sgx_uncharge_mem(void)
> +{
> +       atomic_long_inc(&sgx_nr_available_backing_pages);
> +}
> +
>  /*
>   * Reset post-kexec EPC pages to the uninitialized state. The pages
> are removed
>   * from the input list, and made available for the page allocator.
> SECS pages
> @@ -783,6 +822,8 @@ static inline u64 __init
> sgx_calc_section_metric(u64 low, u64 high)
>  static bool __init sgx_page_cache_init(void)
>  {
>         u32 eax, ebx, ecx, edx, type;
> +       u64 available_backing_bytes;
> +       u64 total_epc_bytes = 0;
>         u64 pa, size;
>         int nid;
>         int i;
> @@ -830,6 +871,7 @@ static bool __init sgx_page_cache_init(void)
>  
>                 sgx_epc_sections[i].node =  &sgx_numa_nodes[nid];
>                 sgx_numa_nodes[nid].size += size;
> +               total_epc_bytes += size;
>  
>                 sgx_nr_epc_sections++;
>         }
> @@ -839,6 +881,9 @@ static bool __init sgx_page_cache_init(void)
>                 return false;
>         }
>  
> +       available_backing_bytes = total_epc_bytes *
> (sgx_overcommit_percent / 100);
> +       atomic_long_set(&sgx_nr_available_backing_pages,
> available_backing_bytes >> PAGE_SHIFT);
> +
>         return true;
>  }
>  
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h
> b/arch/x86/kernel/cpu/sgx/sgx.h
> index 0f17def9fe6f..3507a9983fc1 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -89,6 +89,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page);
>  void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
>  int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
>  struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
> +int sgx_charge_mem(void);
> +void sgx_uncharge_mem(void);
>  
>  #ifdef CONFIG_X86_SGX_KVM
>  int __init sgx_vepc_init(void);

For me this looks cool. I also found out where the charge keyword comes
from while looking at shmem code for doing patches to add the checks that
Dave suggested (shmem_charge(), shmem_uncharge()).

Reviewed-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx>

BR, Jarkko