Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

"Haitao Huang" <haitao.huang@xxxxxxxxxxxxxxx> · Mon, 26 Feb 2024 15:18:02 -0600

On Mon, 26 Feb 2024 05:36:02 -0600, Huang, Kai <kai.huang@xxxxxxxxx> wrote:

On Sun, 2024-02-25 at 22:03 -0600, Haitao Huang wrote:
On Sun, 25 Feb 2024 19:38:26 -0600, Huang, Kai <kai.huang@xxxxxxxxx>  
wrote:

>
>
> On 24/02/2024 6:00 am, Haitao Huang wrote:
> > On Fri, 23 Feb 2024 04:18:18 -0600, Huang, Kai <kai.huang@xxxxxxxxx>
> > wrote:
> >
> > > > >
> > > > Right. When code reaches to here, we already passed reclaim per
> > > > cgroup.
> > >
> > > Yes if try_charge() failed we must do pre-cgroup reclaim.
> > >
> > > > The cgroup may not at or reach limit but system has run out of
> > > > physical
> > > > EPC.
> > > >
> > >
> > > But after try_charge() we can still choose to reclaim from the  
current
> > > group,
> > > but not necessarily have to be global, right?  I am not sure  
whether I
> > > am
> > > missing something, but could you elaborate why we should choose to
> > > reclaim from
> > > the global?
> > >
> >  Once try_charge is done and returns zero that means the cgroup  
usage
> > is charged and it's not over usage limit. So you really can't  
reclaim
> > from that cgroup if allocation failed. The only  thing you can do  
is to
> > reclaim globally.
>
> Sorry I still cannot establish the logic here.
>
> Let's say the sum of all cgroups are greater than the physical EPC,  
and
> elclave(s) in each cgroup could potentially fault w/o reaching  
cgroup's
> limit.
>
> In this case, when enclave(s) in one cgroup faults, why we cannot
> reclaim from the current cgroup, but have to reclaim from global?
>
> Is there any real downside of the former, or you just want to follow  
the
> reclaim logic w/o cgroup at all?
>
> IIUC, there's at least one advantage of reclaim from the current  
group,
> that faults of enclave(s) in one group won't impact other enclaves in
> other cgroups.  E.g., in this way other enclaves in other groups may
> never need to trigger faults.
>
> Or perhaps I am missing anything?
>
The use case here is that user knows it's OK for group A to borrow some
pages from group B for some time without impact much performance, vice
versa. That's why the user is overcomitting so system can run more
enclave/groups. Otherwise, if she is concerned about impact of A on B,  
she
could lower limit for A so it never interfere or interfere less with B
(assume the lower limit is still high enough to run all enclaves in A),
and sacrifice some of A's performance. Or if she does not want any
interference between groups, just don't over-comit. So we don't really
lose anything here.

But if we reclaim from the same group, seems we could enable a user case  
that
allows the admin to ensure certain group won't be impacted at all, while
allowing other groups to over-commit?

E.g., let's say we have 100M physical EPC.  And let's say the admin  
wants to run
some performance-critical enclave(s) which costs 50M EPC w/o being  
impacted.
The admin also wants to run other enclaves which could cost 100M EPC in  
total
but EPC swapping among them is acceptable.

If we choose to reclaim from the current EPC cgroup, then seems to that  
the
admin can achieve the above by setting up 2 groups with group1 having  
50M limit
and group2 having 100M limit, and then run performance-critical  
enclave(s) in
group1 and others in group2?  Or am I missing anything?

The more important groups should have limits higher than or equal to peak  
usage to ensure no impact.
The less important groups should have lower limits than its peak usage to  
avoid impacting higher priority groups.
The limit is the maximum usage allowed.

By setting group2 limit to 100M, you are allowing it to use 100M. So as  
soon as it gets up and consume 100M, group1 can not even load any enclave  
if we only reclaim per-cgroup and do not do global reclaim.

If we choose to do global reclaim, then we cannot achieve that.

You can achieve this by setting group 2 limit to 50M. No need to  
overcommiting to the system.
Group 2 will swap as soon as it hits 50M, which is the maximum it can  
consume so no impact to group 1.

In case of overcomitting, even if we always reclaim from the same cgroup
for each fault, one group may still interfere the other: e.g., consider  
an
extreme case in that group A used up almost all EPC at the time group B
has a fault, B has to fail allocation and kill enclaves.

If the admin allows group A to use almost all EPC, to me it's fair to  
say he/she
doesn't want to run anything inside B at all and it is acceptable  
enclaves in B
to be killed.

I don't think so. The user just knows group A + B peak usages higher than  
system capacity. And she is OK for them to share some of the pages  
dynamically. So kernel should allow one borrow from the other at a  
particular instance when one group has higher demand. And later doing the  
opposite. IOW, the favor goes both ways.

Haitao