On Mon, 2023-10-09 at 20:04 -0500, Haitao Huang wrote: > On Mon, 09 Oct 2023 18:45:06 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote: > > > On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote: > > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > > > > > Introduce the OOM path for killing an enclave with a reclaimer that is > > > no > > > longer able to reclaim enough EPC pages. Find a victim enclave, which > > > will be an enclave with only "unreclaimable" EPC pages left in the > > > cgroup LRU lists. Once a victim is identified, mark the enclave as OOM > > > and zap the enclave's entire page range, and drain all mm references in > > > encl->mm_list. Block allocating any EPC pages in #PF handler, or > > > reloading any pages in all paths, or creating any new mappings. > > > > > > The OOM killing path may race with the reclaimers: in some cases, the > > > victim enclave is in the process of reclaiming the last EPC pages when > > > OOM happens, that is, all pages other than SECS and VA pages are in > > > RECLAIMING_IN_PROGRESS state. The reclaiming process requires access to > > > the enclave backing, VA pages as well as SECS. So the OOM killer does > > > not directly release those enclave resources, instead, it lets all > > > reclaiming in progress to finish, and relies (as currently done) on > > > kref_put on encl->refcount to trigger sgx_encl_release() to do the > > > final cleanup. > > > > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > > Co-developed-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx> > > > Signed-off-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx> > > > Co-developed-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx> > > > Signed-off-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx> > > > Cc: Sean Christopherson <seanjc@xxxxxxxxxx> > > > --- > > > V5: > > > - Rename SGX_ENCL_OOM to SGX_ENCL_NO_MEMORY > > > > > > V4: > > > - Updates for patch reordering and typo fixes. > > > > > > V3: > > > - Rebased to use the new VMA_ITERATOR to zap VMAs. > > > - Fixed the racing cases by blocking new page allocation/mapping and > > > reloading when enclave is marked for OOM. And do not release any enclave > > > resources other than draining mm_list entries, and let pages in > > > RECLAIMING_IN_PROGRESS to be reaped by reclaimers. > > > - Due to above changes, also removed the no-longer needed encl->lock in > > > the OOM path which was causing deadlocks reported by the lock prover. > > > > > > > [...] > > > > > + > > > +/** > > > + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU > > > + * @lru: LRU that is low > > > + * > > > + * Return: %true if a victim was found and kicked. > > > + */ > > > +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru) > > > +{ > > > + struct sgx_epc_page *victim; > > > + > > > + spin_lock(&lru->lock); > > > + victim = sgx_oom_get_victim(lru); > > > + spin_unlock(&lru->lock); > > > + > > > + if (!victim) > > > + return false; > > > + > > > + if (victim->flags & SGX_EPC_OWNER_PAGE) > > > + return sgx_oom_encl_page(victim->encl_page); > > > + > > > + if (victim->flags & SGX_EPC_OWNER_ENCL) > > > + return sgx_oom_encl(victim->encl); > > > > I hate to bring this up, at least at this stage, but I am wondering why > > we need > > to put VA and SECS pages to the unreclaimable list, but cannot keep an > > "enclave_list" instead? > > > > So by looking the patch (" x86/sgx: Limit process EPC usage with misc > > cgroup > > controller"), if I am not missing anything, the whole "unreclaimable" > > list is > > just used to find the victim enclave when OOM needs to be done. Thus, I > > don't > > see why "enclave_list" cannot be used to achieve this. > > > > The reason that I am asking is because it seems using "enclave_list" we > > can > > simplify the code. At least the patches related to track VA/SECS pages, > > and the > > SGX_EPC_OWNER_PAGE/SGX_EPC_OWNER_ENCL thing can be eliminated > > completely. > > Using "enclave_list", I guess you just need to put the enclave to the > > current > > EPC cgroup when SECS page is allocated. > > > Later the hosting process could migrated/reassigned to another cgroup? > What to do when the new cgroup is OOM? > You addressed in the documentation, no? +Migration +--------- + +Once an EPC page is charged to a cgroup (during allocation), it +remains charged to the original cgroup until the page is released +or reclaimed. Migrating a process to a different cgroup doesn't +move the EPC charges that it incurred while in the previous cgroup +to its new cgroup.