Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

"Haitao Huang" <haitao.huang@xxxxxxxxxxxxxxx> · Tue, 17 Oct 2023 23:37:23 -0500

Hi Michal,

On Tue, 17 Oct 2023 13:54:46 -0500, Michal Koutný <mkoutny@xxxxxxxx> wrote:

Hello Haitao.

On Tue, Oct 17, 2023 at 07:58:02AM -0500, Haitao Huang  
<haitao.huang@xxxxxxxxxxxxxxx> wrote:
AFAIK, before we introducing max_write() callback in this series, no  
misc
controller would possibly enforce the limit when misc.max is reduced.  
e.g. I
don't think CVMs be killed when ASID limit is reduced and the cgroup was
full before limit is reduced.

Yes, misccontroller was meant to be simple, current >= max serves to
prevent new allocations.

Thanks for confirming. Maybe another alternative we just keep max_write
non-preemptive. No need to add max_write() callback.

The EPC controller only triggers reclaiming on new allocations or return
NOMEM if no more to reclaim. Reclaiming here includes normal EPC page  
reclaiming and killing enclaves in out of EPC cases. vEPCs assigned to  
guests are basically carved out and never reclaimable by the host.

As we no longer enforce limits on max_write a lower value, user should not  
expect cgroup to force reclaim pages from enclave or kill VMs/enclaves as  
a result of reducing limits 'in-place'. User should always create cgroups,  
set limits, launch enclave/VM into the groups created.

FTR, at some point in time memory.max was considered for reclaim control
of regular pages but it turned out to be too coarse (and OOM killing
processes if amount was not sensed correctly) and this eventually
evolved into specific mechanism of memory.reclaim.
So I'm mentioning this should that be an interface with better semantic
for your use case (and misc.max writes can remain non-preemptive).

Yes we can introduce misc.reclaim to give user a knob to forcefully  
reducing usage if
that is really needed in real usage. The semantics would make force-kill  
VMs explicit to user.

One more note -- I was quite confused when I read in the rest of the
series about OOM and _kill_ing but then I found no such measure in the
code implementation. So I would suggest two terminological changes:

- the basic premise of the series (00/18) is that EPC pages are a
  different resource than memory, hence choose a better suiting name
  than OOM (out of memory) condition,

I couldn't come up a good name. Out of EPC (OOEPC) maybe? I feel OOEPC  
would be hard to read in code though. OOM was relatable as it is similar  
to normal OOM but special kind of memory :-) I'm open to any better  
suggestions.

- killing -- (unless you have an intention to implement process
  termination later) My current interpretation that it is rather some
  aggressive unmapping within address space, so less confusing name for
  that would be "reclaim".

yes. Killing here refers to killing enclave, analogous to killing process,
not just 'reclaim' though. I can change to always use 'killing enclave'  
explicitly.

I think EPC pages to VMs could have the same behavior, once they are  
given
to a guest, never taken back by the host. For enclaves on host side,  
pages
are reclaimable, that allows us to enforce in a similar way to memcg.

Is this distinction between preemptability of EPC pages mandated by the
HW implementation? (host/"process" enclaves vs VM enclaves) Or do have
users an option to lock certain pages in memory that yields this
difference?

The difference is really a result of current vEPC implementation. Because
enclave pages once in use contains confidential content, they need special
process to reclaim. So it's complex to implement host reclaiming guest EPCs
gracefully.

Thanks
Haitao