On Wed, 21 Feb 2024 05:00:27 -0600, Huang, Kai <kai.huang@xxxxxxxxx> wrote:
On Wed, 2024-02-21 at 00:44 -0600, Haitao Huang wrote:
[...]
>
> Here the @nr_to_scan is reduced by the number of pages that are
> isolated, but
> not actually reclaimed (which is reflected by @cnt).
>
> IIUC, looks you want to make this function do "each cycle" as what you
> mentioned
> in the v8 [1]:
>
> I tested with that approach and found we can only target number of
> pages
> attempted to reclaim not pages actually reclaimed due to the
> uncertainty
> of how long it takes to reclaim pages. Besides targeting number of
> scanned pages for each cycle is also what the ksgxd does.
>
> If we target actual number of pages, sometimes it just takes too
long.
> I
> saw more timeouts with the default time limit when running parallel
> selftests.
>
> I am not sure what does "sometimes it just takes too long" mean, but
> what I am
> thinking is you are trying to do some perfect but yet complicated code
> here.
I think what I observed was that the try_charge() would block too long
before getting chance of schedule() to yield, causing more timeouts than
necessary.
I'll do some re-test to be sure.
Looks this is a valid information that can be used to justify whatever
you are
implementing in the EPC cgroup reclaiming function(s).
I'll add some comments. Was assuming this is just following the old design
as ksgxd.
There were some comments at the beginning of
sgx_epc_cgrooup_reclaim_page().
/*
* Attempting to reclaim only a few pages will often fail and is
* inefficient, while reclaiming a huge number of pages can result
in
* soft lockups due to holding various locks for an extended
duration.
*/
unsigned int nr_to_scan = SGX_NR_TO_SCAN;
I think it can be improved to emphasize we only "attempt" to finish
scanning fixed number of pages for reclamation, not enforce number of
pages successfully reclaimed.
>
> For instance, I don't think selftest reflect the real workload, and I
> believe
> adjusting the limit of a given EPC cgroup shouldn't be a frequent
> operation,
> thus it is acceptable to use some easy-maintain code but less perfect
> code.
>
> Here I still think having @nr_to_scan as a pointer is
over-complicated.
> For
> example, we can still let sgx_reclaim_pages() to always scan
> SGX_NR_TO_SCAN
> pages, but give up when there's enough pages reclaimed or when the EPC
> cgroup
> and its descendants have been looped:
>
> unsigned int sgx_epc_cgroup_reclaim_pages(struct misc_cg *root)
> {
> unsigned int cnt = 0;
> ...
>
> css_for_each_descendant_pre(pos, css_root) {
> ...
> epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos));
> cnt += sgx_reclaim_pages(&epc_cg->lru);
>
> if (cnt >= SGX_NR_TO_SCAN)
> break;
> }
>
> ...
> return cnt;
> }
>
> Yeah it may reclaim more than SGX_NR_TO_SCAN when the loop actually
> reaches any
> descendants, but that should be rare and we don't care that much, do
we?
>
I assume you meant @cnt here to be number of pages actually reclaimed.
Yes.
This could cause sgx_epc_cgroup_reclaim_pages() block too long as @cnt
may always be zero (all pages are too young) and you have to loop all
descendants.
I am not sure whether this is a valid point.
For example, your change in patch 10 "x86/sgx: Add EPC reclamation in
cgroup
try_charge()" already loops all descendants in below code:
+ if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
+ return -ENOMEM;
I meant looping all descendants for reclamation which is expensive and we
want to avoid. Not just checking emptiness of LRUs.
Anyway, I can see all these can be justification to your
design/implementation.
My point is please put these justification in changelog/comments so that
we can
actually understand.
Yes, will add clarifying comments.
Thanks