On Wed, 2022-05-18 at 04:24 +0300, Jarkko Sakkinen wrote: > On Tue, 2022-05-17 at 09:47 -0700, Kristen Carlson Accardi wrote: > > When the system runs out of enclave memory, SGX can reclaim EPC > > pages > > by swapping to normal RAM. These backing pages are allocated via a > > per-enclave shared memory area. Since SGX allows unlimited over > > commit on EPC memory, the reclaimer thread can allocate a large > > number of backing RAM pages in response to EPC memory pressure. > > > > When the shared memory backing RAM allocation occurs during > > the reclaimer thread context, the shared memory is charged to > > the root memory control group, and the shmem usage of the enclave > > is not properly accounted for, making cgroups ineffective at > > limiting the amount of RAM an enclave can consume. > > > > For example, when using a cgroup to launch a set of test > > enclaves, the kernel does not properly account for 50% - 75% of > > shmem page allocations on average. In the worst case, when > > nearly all allocations occur during the reclaimer thread, the > > kernel accounts less than a percent of the amount of shmem used > > by the enclave's cgroup to the correct cgroup. > > > > SGX currently stores a list of mm_structs that are associated with > > an enclave. In order to allow the enclave's cgroup to more > > accurately > > reflect the shmem usage, the memory control group (struct > > mem_cgroup) > > of one of these mm_structs can be set as the active memory cgroup > > prior to allocating any EPC backing pages. This will make any shmem > > allocations be charged to a memory control group associated with > > the > > enclave's cgroup. This will allow memory cgroup limits to restrict > > RAM usage more effectively. > > > > This patch will create a new function - sgx_encl_alloc_backing(). > > This function will be used whenever a new backing storage page > > needs to be allocated. Previously the same function was used for > > page allocation as well as retrieving a previously allocated page. > > Prior to backing page allocation, if there is a mm_struct > > associated > > with the enclave that is requesting the allocation, it will be set > > as the active memory control group. > > > > Signed-off-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx> > > --- > > arch/x86/kernel/cpu/sgx/encl.c | 111 > > ++++++++++++++++++++++++++++++++- > > arch/x86/kernel/cpu/sgx/encl.h | 6 +- > > arch/x86/kernel/cpu/sgx/main.c | 4 +- > > 3 files changed, 115 insertions(+), 6 deletions(-) > > > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c > > b/arch/x86/kernel/cpu/sgx/encl.c > > index 001808e3901c..c3a5e57040bc 100644 > > --- a/arch/x86/kernel/cpu/sgx/encl.c > > +++ b/arch/x86/kernel/cpu/sgx/encl.c > > @@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page > > *encl_page, > > else > > page_index = PFN_DOWN(encl->size); > > > > - ret = sgx_encl_get_backing(encl, page_index, &b); > > + ret = sgx_encl_lookup_backing(encl, page_index, &b); > > if (ret) > > return ret; > > > > @@ -574,7 +574,7 @@ static struct page > > *sgx_encl_get_backing_page(struct sgx_encl *encl, > > * 0 on success, > > * -errno otherwise. > > */ > > -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long > > page_index, > > +static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned > > long page_index, > > struct sgx_backing *backing) > > { > > pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index > > >> 5); > > @@ -601,6 +601,113 @@ int sgx_encl_get_backing(struct sgx_encl > > *encl, unsigned long page_index, > > return 0; > > } > > > > +static struct mem_cgroup * sgx_encl_set_active_memcg(struct > > sgx_encl *encl) > > +{ > > + struct mm_struct *mm = current->mm; > > + struct sgx_encl_mm *encl_mm; > > + struct mem_cgroup *memcg; > > + int idx; > > + > > + /* > > + * If current->mm is NULL, get_mem_cgroup_from_mm() will > > return > > + * the currently active mem_cgroup. This may be the root > > mem_cgroup > > + * if there is no active mem_cgroup set. > > + */ > > + memcg = get_mem_cgroup_from_mm(mm); > > + > > + /* > > + * If we already have an mm, we are not in thread context > > and the > > + * mem_cgroup for the enclave will be charged for any > > allocations. > > + */ > > AFAIK CPU is always executing in some thread. > > This would a lot clearer: > > /* > * If the CPU is not inside a kthread, return the active memcg. > */ > > Please remark of not using "we" in my version. It's better to be > explicit when you can. Thank you for your feedback. I've incorporated it into my next version. Kristen