Re: [PATCH v6 00/12] Add Cgroup support for SGX EPC memory

Dave Hansen <dave.hansen@xxxxxxxxx> · Fri, 5 Jan 2024 10:29:05 -0800

There's very little about how the LRU design came to be in this cover
letter.  Let's add some details.

How's this?

Writing this up, I'm a lot more convinced that this series is, in
general, taking the right approach.  I honestly don't see any other
alternatives.  As much as I'd love to do something stupidly simple like
just killing enclaves at the moment they hit the limit, that would be a
horrid experience for users _and_ a departure from what the existing
reclaim support does.

That said, there's still a lot of work do to do refactor this series.
It's in need of some love to make it more clear what is going on and to
making the eventual switch over to per-cgroup LRUs more gradual.  Each
patch in the series is still doing way too much, _especially_ in patch 10.

==

The existing EPC memory management aims to be a miniature version of the
core VM where EPC memory can be overcommitted and reclaimed.  EPC
allocations can wait for reclaim.  The alternative to waiting would have
been to send a signal and let the enclave die.

This series attempts to implement that same logic for cgroups, for the
same reasons: it's preferable to wait for memory to become available and
let reclaim happen than to do things that are fatal to enclaves.

There is currently a global reclaimable page SGX LRU list.  That list
(and the existing scanning algorithm) is essentially useless for doing
reclaim when a cgroup hits its limit because the cgroup's pages are
scattered around that LRU.  It is unspeakably inefficient to scan a
linked list with millions of entries for what could be dozens of pages
from a cgroup that needs reclaim.

Even if unspeakably slow reclaim was accepted, the existing scanning
algorithm only picks a few pages off the head of the global LRU.  It
would either need to hold the list locks for unreasonable amounts of
time, or be taught to scan the list in pieces, which has its own challenges.

tl;dr: An cgroup hitting its limit should be as similar as possible to
the system running out of EPC memory.  The only two choices to implement
that are nasty changes the existing LRU scanning algorithm, or to add
new LRUs.  The result: Add a new LRU for each cgroup and scans those
instead.  Replace the existing global cgroup with the root cgroup's LRU
(only when this new support is compiled in, obviously).