Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

"Haitao Huang" <haitao.huang@xxxxxxxxxxxxxxx> · Wed, 11 Oct 2023 11:04:53 -0500

On Tue, 10 Oct 2023 19:31:19 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote:

On Tue, 2023-10-10 at 12:05 -0500, Haitao Huang wrote:
On Mon, 09 Oct 2023 21:12:27 -0500, Huang, Kai <kai.huang@xxxxxxxxx>  
wrote:

>
> > > > >
> > > > Later the hosting process could migrated/reassigned to another
> > cgroup?
> > > > What to do when the new cgroup is OOM?
> > > >
> > >
> > > You addressed in the documentation, no?
> > >
> > > +Migration
> > > +---------
> > > +
> > > +Once an EPC page is charged to a cgroup (during allocation), it
> > > +remains charged to the original cgroup until the page is released
> > > +or reclaimed.  Migrating a process to a different cgroup doesn't
> > > +move the EPC charges that it incurred while in the previous  
cgroup
> > > +to its new cgroup.
> >
> > Should we kill the enclave though because some VA pages may be in  
the
> > new
> > group?
> >
>
> I guess acceptable?
>
> And any difference if you keep VA/SECS to unreclaimabe list?

Tracking VA/SECS allows all cgroups, in which an enclave has allocation,
to identify the enclave following the back pointer and kill it as  
needed.

> If you migrate one
> enclave to another cgroup, the old EPC pages stay in the old cgroup
> while the
> new one is charged to the new group IIUC.
>
> I am not cgroup expert, but by searching some old thread it appears  
this
> isn't a
> supported model:
>
> https://lore.kernel.org/lkml/YEyR9181Qgzt+Ps9@xxxxxxxxxxxxxxx/
>

IIUC it's a different problem here. If we don't track the allocated VAs  
in
the new group, then the enclave that spans the two groups can't be  
killed
by the new group. If so, some enclave could just hide in some small  
group
and never gets killed but keeps allocating in a different group?

I mean from the link above IIUC migrating enclave among different  
cgroups simply
isn't a supported model, thus any bad behaviour isn't a big concern in  
terms of
decision making.

If we leave some pages in a cgroup unkillable, we are in the same  
situation of not able to enforce a cgroup limit as that we are are in if  
we don't kill VMs for lower limits.

I think not supporting migration of pages between cgroups should not leave  
a gap for enforcement just like we don't want to have an enforcement gap  
if we let VMs to hold pages once it is launched.

Haitao