On Thu, 2022-12-08 at 15:21 +0000, Jarkko Sakkinen wrote: > On Fri, Dec 02, 2022 at 10:36:50AM -0800, Kristen Carlson Accardi > wrote: > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > > > Introduce the OOM path for killing an enclave with the reclaimer > > is no longer able to reclaim enough EPC pages. Find a victim > > enclave, > > which will be an enclave with EPC pages remaining that are not > > accessible to the reclaimer ("unreclaimable"). Once a victim is > > identified, mark the enclave as OOM and zap the enclaves entire > > page range. Release all the enclaves resources except for the > > struct sgx_encl memory itself. > > > > Signed-off-by: Sean Christopherson > > <sean.j.christopherson@xxxxxxxxx> > > Signed-off-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx> > > Cc: Sean Christopherson <seanjc@xxxxxxxxxx> > > Why this patch is dependent of all 13 patches before it? > > Looks like something that is orthogonal to cgroups and could be > live by its own. At least it probably does not require all of > those patches, or does it? > > Even without cgroups it would make sense to killing enclaves if > reclaimer gets stuck. > > BR, Jarkko It is dependent first of all of having the LRU struct with the unreclaimable/reclaimable lists. Which means it requires storing the enclave pointer in the page as well. It's dependent on knowing how many pages are available, being able to ignore the age of a page etc. Right now, without cgroups, sgx will be unable to allocate memory when an enclave is created if it cannot reclaim enough memory from the existing in use enclaves. Aside from that though, I don't think that killing enclaves makes sense outside the context of cgroup limits. Without cgroup limits, you have a max number of EPC pages that you can have active at any one time. If an enclave attempts to allocate a new page and the reclaimer can't free up any, how would you decide whether it's ok to kill an entire enclave in order to grant this other enclave the higher priority for getting a page? With a cgroup limit, the system owner explicitly can decide what the limits on usage will be, but without that, you'd have a situation where one new enclave could kill others I would think. Better to just have it the way it is - new page allocations fail if there are not free pages, but you don't kill enclaves that already exist.