On Mon, 20 Nov 2023 11:16:42 +0800, Huang, Kai <kai.huang@xxxxxxxxx> wrote:
> >
>
> That's true. I was thinking no need to have them done in separate
calls.
> The caller has to check the return value for epc_cg instance first,
then
> check result of try_charge. But there is really only one caller,
> sgx_alloc_epc_page() below, so I don't have strong opinions now.
>
> With them separate, the checks will look like this:
> if (epc_cg = sgx_get_current_epc_cg()) // NULL means cgroup disabled,
> should continue for allocation
> {
> if (ret = sgx_epc_cgroup_try_charge())
> return ret
> }
> // continue...
>
> I can go either way.
Let's keep this aligned with other _try_charge() variants: return 'int'
to
indicate whether the charge is done or not.
Fine with me if no objections from maintainers.
>
> >
> > > > > struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool
reclaim)
> > > > > {
> > > > > struct sgx_epc_page *page;
> > > > > + struct sgx_epc_cgroup *epc_cg;
> > > > > +
> > > > > + epc_cg = sgx_epc_cgroup_try_charge();
> > > > > + if (IS_ERR(epc_cg))
> > > > > + return ERR_CAST(epc_cg);
> > > > >
> > > > > for ( ; ; ) {
> > > > > page = __sgx_alloc_epc_page();
> > > > > @@ -580,10 +587,21 @@ struct sgx_epc_page
*sgx_alloc_epc_page(void
> > > > > *owner, bool reclaim)
> > > > > break;
> > > > > }
> > > > >
> > > > > + /*
> > > > > + * Need to do a global reclamation if cgroup was not full
but
> > > free
> > > > > + * physical pages run out, causing __sgx_alloc_epc_page()
to
> > > fail.
> > > > > + */
> > > > > sgx_reclaim_pages();
> > > >
> > > > What's the final behaviour? IIUC it should be reclaiming from
the
> > > > *current* EPC
> > > > cgroup? If so shouldn't we just pass the @epc_cg to it here?
> > > >
> > > > I think we can make this patch as "structure" patch w/o actually
> > > having
> > > > EPC
> > > > cgroup enabled, i.e., sgx_get_current_epc_cg() always return
NULL.
> > > >
> > > > And we can have one patch to change sgx_reclaim_pages() to take
the
> > > > 'struct
> > > > sgx_epc_lru_list *' as argument:
> > > >
> > > > void sgx_reclaim_pages_lru(struct sgx_epc_lru_list * lru)
> > > > {
> > > > ...
> > > > }
> > > >
> > > > Then here we can have something like:
> > > >
> > > > void sgx_reclaim_pages(struct sgx_epc_cg *epc_cg)
> > > > {
> > > > struct sgx_epc_lru_list *lru = epc_cg ? &epc_cg->lru :
> > > > &sgx_global_lru;
> > > >
> > > > sgx_reclaim_pages_lru(lru);
> > > > }
> > > >
> > > > Makes sense?
> > > >
The reason we 'isolate' first then do real 'reclaim' is that the actual
reclaim is expensive and especially for eblock, etrack, etc.
> > >
> > > This is purely global reclamation. No cgroup involved.
> >
> > Again why? Here you are allocating one EPC page for enclave in a
> > particular EPC
> > cgroup. When that fails, shouldn't you try only to reclaim from the
> > *current*
> > EPC cgroup? Or at least you should try to reclaim from the
*current*
> > EPC cgroup
> > first?
> >
>
> Later sgx_epc_cg_try_charge will take a 'reclaim' flag, if true,
cgroup
> reclaims synchronously, otherwise in background and returns -EBUSY in
> that case. This function also returns if no valid epc_cg pointer
> returned.
>
> All reclamation for *current* cgroup is done in
sgx_epc_cg_try_charge().
This is fine, but I believe my question above is about where to reclaim
when
"allocation" fails, but not "try charge" fails.
I mean "will be done" :-) Currently no reclaim in try_charge.
And for "reclaim for current cgroup when charge fails", I don't think
its even
necessary in this initial implementation of EPC cgroup. You can just
fail the
allocation when charge fails (reaching the limit). Trying to reclaim
when limit
is hit can be done later.
Yes. It is done later.
Please see Dave and Michal's replies here:
https://lore.kernel.org/lkml/7a1a5125-9da2-47b6-ba0f-cf24d84df16b@xxxxxxxxx/#t
https://lore.kernel.org/lkml/yz44wukoic3syy6s4fcrngagurkjhe2hzka6kvxbajdtro3fwu@zd2ilht7wcw3/
>
> So, by reaching to this point, a valid epc_cg pointer was returned,
> that means allocation is allowed for the cgroup (it has reclaimed if
> necessary, and its usage is not above limit after charging).
I found memory cgroup uses different logic -- allocation first and then
charge:
For instance:
static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
{
......
folio = vma_alloc_zeroed_movable_folio(vma, vmf->address);
if (!folio)
goto oom;
if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL))
goto oom_free_page;
...... }
Why EPC needs to "charge first" and "then allocate"?
EPC allocation can involve reclaiming which is more expensive than regular
RAM reclamation. Also misc only has max hard limit.
Thanks
Haitao
>
> But the system level free count may be low (e.g., limits of all
cgroups
> may add up to be more than capacity). so we need to do a global
> reclamation here, which may involve reclaiming a few pages (from
current
> or other groups) so the system can be at a performant state with
minimal
> free count. (current behavior of ksgxd).
>
I should have sticked to the orignial comment added in code. Actually
__sgx_alloc_epc_page() can fail if system runs out of EPC. That's the
really reason for global reclaim.
I don't see how this can work. With EPC cgroup likely all EPC pages
will go to
the individual LRU of each cgroup, and the sgx_global_lru will basically
empty.
How can you reclaim from the sgx_global_lru?
Currently, nothing in cgroup LRU, all EPCs pages are in global list.