Re: [PATCH v15 12/14] x86/sgx: Turn on per-cgroup EPC reclamation

"Huang, Kai" <kai.huang@xxxxxxxxx> · Thu, 20 Jun 2024 23:53:13 +0000

On Thu, 2024-06-20 at 10:06 -0500, Haitao Huang wrote:
> Hi Kai
> 
> On Thu, 20 Jun 2024 05:30:16 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote:
> 
> > 
> > On 18/06/2024 12:53 am, Huang, Haitao wrote:
> > > From: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>
> > > 
> > > Previous patches have implemented all infrastructure needed for
> > > per-cgroup EPC page tracking and reclaiming. But all reclaimable EPC
> > > pages are still tracked in the global LRU as sgx_epc_page_lru() always
> > > returns reference to the global LRU.
> > > 
> > > Change sgx_epc_page_lru() to return the LRU of the cgroup in which the
> > > given EPC page is allocated.
> > > 
> > > This makes all EPC pages tracked in per-cgroup LRUs and the global
> > > reclaimer (ksgxd) will not be able to reclaim any pages from the global
> > > LRU. However, in cases of over-committing, i.e., the sum of cgroup
> > > limits greater than the total capacity, cgroups may never reclaim but
> > > the total usage can still be near the capacity. Therefore a global
> > > reclamation is still needed in those cases and it should be performed
> > > from the root cgroup.
> > > 
> > > Modify sgx_reclaim_pages_global(), to reclaim from the root EPC cgroup
> > > when cgroup is enabled. Similar to sgx_cgroup_reclaim_pages(), return
> > > the next cgroup so callers can use it as the new starting node for next
> > > round of reclamation if needed.
> > > 
> > > Also update sgx_can_reclaim_global(), to check emptiness of LRUs of all
> > > cgroups when EPC cgroup is enabled, otherwise only check the global LRU.
> > > 
> > > Finally, change sgx_reclaim_direct(), to check and ensure there are free
> > > pages at cgroup level so forward progress can be made by the caller.
> > 
> > Reading above, it's not clear how the _new_ global reclaim works with
> > multiple LRUs.
> > 
> > E.g., the current global reclaim essentially treats all EPC pages equally
> > when scanning those pages.  From the above, I don't see how this is
> > achieved in the new global reclaim.
> > 
> > The changelog should:
> > 
> > 1) describe the how does existing global reclaim work, and then describe
> > how to achieve the same beahviour in the new global reclaim which works
> > with multiple LRUs;
> > 
> > 2) If there's any behaviour difference between the "existing" vs the  
> > "new"
> > global reclaim, the changelog should point out the difference, and  
> > explain
> > why such difference is OK.
> 
> Sure I can explain. here is what I plan to add for v16:
> 
> Note the original global reclaimer has
> only one LRU and always scans and reclaims from the head of this global
> LRU. The new global reclaimer always starts the scanning from the root
> node, only moves down to its descendants if more reclamation is needed
> or the root node does not have SGX_NR_TO_SCAN (16) pages in the LRU.
> This makes the enclave pages in the root node more likely being
> reclaimed if they are not frequently used (not 'young'). Unless we track
> pages in one LRU again, we can not really match exactly the same
> behavior of the original global reclaimer. And one-LRU approach would
> make per-cgroup reclamation scanning and reclaiming too complex.  On the
> other hand, this design is acceptable for following reasons:
> 
> 1) For all purposes of using cgroups, enclaves will need live in
>       non-root (leaf for cgroup v2) nodes where limits can be enforced
>       per-cgroup.

I don't see how it matters.  If ROOT is empty, then it moves to the first
descendant.

> 2) Global reclamation now only happens in situation mentioned above when
>       a lower level cgroup not at its limit can't allocate due to over
>       commit at global level.

Really?  In sgx_reclaim_direct() the code says:

/*
 * Make sure there are some free pages at both cgroup and global levels.
 * In both cases, only make one attempt of reclamation to avoid lengthy
 * block on the caller.
 */

Yeah only one attempt will be made for global level but it is still global
reclaim.

> 3) The pages in root being slightly penalized are not busily used
>       anyway.

The 1) says in practice the root will have no enclaves, thus no EPC at
all.

In other words, in practice the global reclaim will always skip the root
and move to the first descendant.

> 4) In cases that multiple rounds of reclamation is needed, the caller of
>       sgx_reclaim_page_global() can still recall for reclaiming in 'next'
>       descendant in round robin way, each round scans for SGX_NR_SCAN pages
>       from the head of 'next' cgroup's LRU.

"multiple rounds of reclamation" isn't clear enough.  Does it mean
multiple call of sgx_cgroup_reclaim_pages(), or does it mean each trigger
of global reclaim?

The current patch seems to be the former.  See the 'next_cg' is reset to
NULL for each loop of the main loop in ksgxd().

This essentially means each trigger of global reclaim will start from the
ROOT, or in practice the first descendant (based on 1) and 3) above) will
always be the victim of each global reclaim.

I agree it's hard to _EXACTLY_ match the existing global reclaim, but IMHO
we should at least treats all cgroups equally.