On 23/02/2024 6:20 am, Haitao Huang wrote:
On Wed, 21 Feb 2024 05:00:27 -0600, Huang, Kai <kai.huang@xxxxxxxxx> wrote:
On Wed, 2024-02-21 at 00:44 -0600, Haitao Huang wrote:
[...]
>
> Here the @nr_to_scan is reduced by the number of pages that are
> isolated, but
> not actually reclaimed (which is reflected by @cnt).
>
> IIUC, looks you want to make this function do "each cycle" as what you
> mentioned
> in the v8 [1]:
>
> I tested with that approach and found we can only target number of
> pages
> attempted to reclaim not pages actually reclaimed due to the
> uncertainty
> of how long it takes to reclaim pages. Besides targeting number of
> scanned pages for each cycle is also what the ksgxd does.
>
> If we target actual number of pages, sometimes it just takes
too long.
> I
> saw more timeouts with the default time limit when running
parallel
> selftests.
>
> I am not sure what does "sometimes it just takes too long" mean, but
> what I am
> thinking is you are trying to do some perfect but yet complicated code
> here.
I think what I observed was that the try_charge() would block too long
before getting chance of schedule() to yield, causing more timeouts than
necessary.
I'll do some re-test to be sure.
Looks this is a valid information that can be used to justify whatever
you are
implementing in the EPC cgroup reclaiming function(s).
I'll add some comments. Was assuming this is just following the old
design as ksgxd.
There were some comments at the beginning of
sgx_epc_cgrooup_reclaim_page().
/*
* Attempting to reclaim only a few pages will often fail and is
* inefficient, while reclaiming a huge number of pages can
result in
* soft lockups due to holding various locks for an extended
duration.
*/
unsigned int nr_to_scan = SGX_NR_TO_SCAN;
I think it can be improved to emphasize we only "attempt" to finish
scanning fixed number of pages for reclamation, not enforce number of
pages successfully reclaimed.
Not sure need to be this comment, but at somewhere just state you are
trying to follow the ksgxd() (the current sgx_reclaim_pages()), but
trying to do it "_across_ given cgroup and all the descendants".
That's the reason you made @nr_to_scan as a pointer.
And also some text to explain why to follow ksgxd() -- not wanting to
block longer due to loop over descendants etc -- so we can focus on
discussing whether such justification is reasonable.