Re: [PATCH v6 09/12] x86/sgx: Restructure top-level EPC reclaim function

"Haitao Huang" <haitao.huang@xxxxxxxxxxxxxxx> · Mon, 11 Dec 2023 22:04:48 -0600

Hi Kai

On Mon, 27 Nov 2023 03:57:03 -0600, Huang, Kai <kai.huang@xxxxxxxxx> wrote:

On Mon, 2023-11-27 at 00:27 +0800, Haitao Huang wrote:
On Mon, 20 Nov 2023 11:45:46 +0800, Huang, Kai <kai.huang@xxxxxxxxx>  
wrote:

> On Mon, 2023-10-30 at 11:20 -0700, Haitao Huang wrote:
> > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> >
> > To prepare for per-cgroup reclamation, separate the top-level  
reclaim
> > function, sgx_reclaim_epc_pages(), into two separate functions:
> >
> > - sgx_isolate_epc_pages() scans and isolates reclaimable pages from  
a
> > given LRU list.
> > - sgx_do_epc_reclamation() performs the real reclamation for the
> > already isolated pages.
> >
> > Create a new function, sgx_reclaim_epc_pages_global(), calling  
those two
> > in succession, to replace the original sgx_reclaim_epc_pages(). The
> > above two functions will serve as building blocks for the  
reclamation
> > flows in later EPC cgroup implementation.
> >
> > sgx_do_epc_reclamation() returns the number of reclaimed pages. The  
EPC
> > cgroup will use the result to track reclaiming progress.
> >
> > sgx_isolate_epc_pages() returns the additional number of pages to  
scan
> > for current epoch of reclamation. The EPC cgroup will use the  
result to
> > determine if more scanning to be done in LRUs in its children  
groups.
>
> This changelog says nothing about "why", but only mentions the
> "implementation".
>
> For instance, assuming we need to reclaim @npages_to_reclaim from the
> @epc_cgrp_to_reclaim and its descendants, why cannot we do:
>
> 	for_each_cgroup_and_descendants(&epc_cgrp_to_reclaim, &epc_cgrp) {
> 		if (npages_to_reclaim <= 0)
> 			return;
>
> 		npages_to_reclaim -= sgx_reclaim_pages_lru(&epc_cgrp->lru,
> 					npages_to_reclaim);
> 	}
>
> Is there any difference to have "isolate" + "reclaim"?
>

This is to optimize "reclaim". See how etrack was done in sgx_encl_ewb.
Here we just follow the same design as ksgxd for each reclamation cycle.

I don't see how did you "follow" ksgxd.  If I am guessing correctly, you  
are
afraid of there might be less than 16 pages in a given EPC cgroup, thus  
w/o
splitting into "isolate" + "reclaim" you might feed the "reclaim" less  
than 16
pages, which might cause some performance degrade?

But is this a common case?  Should we even worry about this?

I suppose for such new feature we should bring functionality first and  
then
optimization if you have real performance data to show.

The concern is not about "reclaim less than 16".
I mean this is just refactoring with exactly the same design of ksgxd  
preserved, in that we first isolate as many candidate EPC pages (up  to  
16, ignore the unneeded SGX_NR_TO_SCAN_MAX for now), then does the ewb in  
one shot without anything else done in between. As described in original  
comments for the function sgx_reclaim_pages and sgx_encl_ewb, this is to  
finish all ewb quickly while minimizing impact of IPI.

The way you proposed will work but alters the current design and behavior  
if cgroups is enabled and EPCs of an enclave are tracked across multiple  
LRUs within the descendant cgroups, in that you will have isolation loop,  
backing store allocation loop, eblock loop interleaved with the ewb loop.

> >
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > Co-developed-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>
> > Signed-off-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>
> > Co-developed-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
> > Signed-off-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
> > Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
> > ---
> >
>
> [...]
>
> > +/**
> > + * sgx_do_epc_reclamation() - Perform reclamation for isolated EPC
> > pages.
> > + * @iso:		List of isolated pages for reclamation
> > + *
> > + * Take a list of EPC pages and reclaim them to the enclave's  
private
> > shmem files.  Do not
> > + * reclaim the pages that have been accessed since the last scan,  
and
> > move each of those pages
> > + * to the tail of its tracking LRU list.
> > + *
> > + * Limit the number of pages to be processed up to  
SGX_NR_TO_SCAN_MAX
> > per call in order to
> > + * degrade amount of IPI's and ETRACK's potentially required.
> > sgx_encl_ewb() does degrade a bit
> > + * among the HW threads with three stage EWB pipeline (EWB, ETRACK  
+
> > EWB and IPI + EWB) but not
> > + * sufficiently. Reclaiming one page at a time would also be
> > problematic as it would increase
> > + * the lock contention too much, which would halt forward progress.
>
> This is kinda optimization, correct?  Is there any real performance  
data
> to
> justify this?

The above sentences were there originally. This optimization was  
justified.

I am talking about 16 -> 32.

> If this optimization is useful, shouldn't we bring this
> optimization to the current sgx_reclaim_pages() instead, e.g., just
> increase
> SGX_NR_TO_SCAN (16) to SGX_NR_TO_SCAN_MAX (32)?
>

SGX_NR_TO_SCAN_MAX might be designed earlier for other reasons I don't
know. Currently it is really the buffer size allocated in
sgx_reclaim_pages(). Both cgroup and ksgxd scan 16 pages a time.Maybe we
should just use SGX_NR_TO_SCAN. No _MAX needed. The point was to batch
reclamation to certain number to minimize impact of EWB pipeline. 16 was
the original design.

Please don't leave why you are trying to do this to the reviewers.  If  
you don't
know, then just drop this.

Fair enough. This was my oversight when doing all the changes and rebase.  
Will drop it.

Thanks
Haitao