On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote: > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > Introduce the OOM path for killing an enclave with a reclaimer that is no > longer able to reclaim enough EPC pages. Find a victim enclave, which > will be an enclave with only "unreclaimable" EPC pages left in the > cgroup LRU lists. Once a victim is identified, mark the enclave as OOM > and zap the enclave's entire page range, and drain all mm references in > encl->mm_list. Block allocating any EPC pages in #PF handler, or > reloading any pages in all paths, or creating any new mappings. > > The OOM killing path may race with the reclaimers: in some cases, the > victim enclave is in the process of reclaiming the last EPC pages when > OOM happens, that is, all pages other than SECS and VA pages are in > RECLAIMING_IN_PROGRESS state. The reclaiming process requires access to > the enclave backing, VA pages as well as SECS. So the OOM killer does > not directly release those enclave resources, instead, it lets all > reclaiming in progress to finish, and relies (as currently done) on > kref_put on encl->refcount to trigger sgx_encl_release() to do the > final cleanup. > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > Co-developed-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx> > Signed-off-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx> > Co-developed-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx> > Signed-off-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx> > Cc: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > V5: > - Rename SGX_ENCL_OOM to SGX_ENCL_NO_MEMORY > > V4: > - Updates for patch reordering and typo fixes. > > V3: > - Rebased to use the new VMA_ITERATOR to zap VMAs. > - Fixed the racing cases by blocking new page allocation/mapping and > reloading when enclave is marked for OOM. And do not release any enclave > resources other than draining mm_list entries, and let pages in > RECLAIMING_IN_PROGRESS to be reaped by reclaimers. > - Due to above changes, also removed the no-longer needed encl->lock in > the OOM path which was causing deadlocks reported by the lock prover. > [...] > + > +/** > + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU > + * @lru: LRU that is low > + * > + * Return: %true if a victim was found and kicked. > + */ > +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru) > +{ > + struct sgx_epc_page *victim; > + > + spin_lock(&lru->lock); > + victim = sgx_oom_get_victim(lru); > + spin_unlock(&lru->lock); > + > + if (!victim) > + return false; > + > + if (victim->flags & SGX_EPC_OWNER_PAGE) > + return sgx_oom_encl_page(victim->encl_page); > + > + if (victim->flags & SGX_EPC_OWNER_ENCL) > + return sgx_oom_encl(victim->encl); I hate to bring this up, at least at this stage, but I am wondering why we need to put VA and SECS pages to the unreclaimable list, but cannot keep an "enclave_list" instead? So by looking the patch (" x86/sgx: Limit process EPC usage with misc cgroup controller"), if I am not missing anything, the whole "unreclaimable" list is just used to find the victim enclave when OOM needs to be done. Thus, I don't see why "enclave_list" cannot be used to achieve this. The reason that I am asking is because it seems using "enclave_list" we can simplify the code. At least the patches related to track VA/SECS pages, and the SGX_EPC_OWNER_PAGE/SGX_EPC_OWNER_ENCL thing can be eliminated completely. Using "enclave_list", I guess you just need to put the enclave to the current EPC cgroup when SECS page is allocated. In fact, putting "unreclaimable" list to LRU itself is a little bit confusing because: 1) you cannot really reclaim anything from the list; 2) VA/SECS pages don't have the concept of "young" at all, thus makes no sense to annotate they as LRU. Thus putting VA/SECS to "unreclaimable" list, instead of keeping an "enclave_list" seems won't have any benefit but will only make the code more complicated. Or am I missing anything?